Machine learning bias and its potential impact on LSST data products (Shamir)

Type: Talk
SessionMachine Learning and Artificial Intelligence
Author: Lior Shamir

Abstract: The size and complexity of the data collected by the Vera Rubin Observatory will require automatic analysis of its photometry and image data to provide useable data products. One of the common approaches to analyze data collected by digital sky surveys is machine learning, and specifically deep neural networks. Such systems have already been applied to data collected by several existing sky surveys. These studies, however, do not always take into consideration the complex biases introduced by machine learning, and therefore the use of these methods can easily lead to biased data products, and consequently false or biased conclusions. These biases are often very difficult to notice, and even highly experienced researchers and machine learning experts can easily be deceived by the nature of these algorithms. Numerous previous experiments, including experiments that are foundational in the field of machine learning, have been shown to be heavily biased by the machine learning algorithms, yet these biases escaped the attention of the experimentalists. Experiments also show that such biases are also dominant when applying deep neural networks to photometric or image data collected by digital sky surveys, and will therefore likely to exist also in Rubin data. Therefore, careful examination should be applied when using machine learning to generate data products, as well as when using such data products to turn them into discoveries.

Career Stage: 
Senior Researcher/Faculty

User login

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
3 + 0 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.