LINCC Frameworks - Scalable analysis across large datasets (Caplar)

Type: Talk
Session: Transient and Variable Star Science II
Author: Neven Caplar

Abstract: Rubin's Year #1 dataset will be O(100TB) in size. Fully exploring such a catalog is a challenge, yet a significant use case for these data is whole-dataset science (statistics, searching, mapping). LINCC Frameworks is an ambitious program that develops state-of-the-art analysis techniques capable of meeting the scale and complexity demanded by the data produced by the Vera C. Rubin Observatory. This presentation introduces LSDB, a sophisticated software package that facilitates large-scale analytics across multiple datasets. LSDB's efficacy lies in its ability to shard expansive datasets and index sources within the Healpix space, leveraging catalog density per index. LSDB seamlessly integrates with the TAPE project for time-domain data analysis, enabling users to parallelize calculations through the DASK framework. I will also present several use cases of the software searching for rare events and anomalies in existing datasets and showcase how it can be extended to Rubin-size datasets. LINCC Frameworks also sponsors incubators, three-month programs to scale up existing software projects. I will discuss how we can help you with your project and explain the opportunities to engage with LSDB/TAPE projects.

Career Stage: 
Post Doc

User login

CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
1 + 2 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.