LINCC Frameworks - Scalable analysis across large datasets (Caplar)
Type: Talk
Session: Transient and Variable Star Science II
Author: Neven Caplar
Abstract: Rubin's Year #1 dataset will be O(100TB) in size. Fully exploring such a catalog is a challenge, yet a significant use case for these data is whole-dataset science (statistics, searching, mapping). LINCC Frameworks is an ambitious program that develops state-of-the-art analysis techniques capable of meeting the scale and complexity demanded by the data produced by the Vera C. Rubin Observatory. This presentation introduces LSDB, a sophisticated software package that facilitates large-scale analytics across multiple datasets. LSDB's efficacy lies in its ability to shard expansive datasets and index sources within the Healpix space, leveraging catalog density per index. LSDB seamlessly integrates with the TAPE project for time-domain data analysis, enabling users to parallelize calculations through the DASK framework. I will also present several use cases of the software searching for rare events and anomalies in existing datasets and showcase how it can be extended to Rubin-size datasets. LINCC Frameworks also sponsors incubators, three-month programs to scale up existing software projects. I will discuss how we can help you with your project and explain the opportunities to engage with LSDB/TAPE projects.