Space astronomy science platforms focus week

UTC
Virtual

Virtual

Zoom link TBC
Nigel Hambly (Institute for Astronomy, University of Edinburgh)
Description

Astronomy has entered an era of "big data" science. Traditional access methods that extract small subsets of data for transfer and exploration to client-side applications no longer scale to the size and ambition of current and future large survey missions, both ground- and space-based.

In common with many international astronomical survey projects, ESA Gaia and Euclid teams are developing code-to-data platform facilities that will enable up-scale exploitation of datasets having data volumes of 10s to 100s of terabytes.  

The aim of this virtual meeting is to provide the community with an opportunity to hear about the latest developments in code-to-data platforms for ESA Gaia and Euclid and to see examples of the kind of science exploitation that is becoming possible using them. In addition there is time allocated in the programme for users to bring their specific usage requirements along to one-on-one surgery sessions with astronomers and developers familiar with the platforms to provide an easy introduction to developing their workflows on the available services. 

The meeting will be held entirely online. While activities are planned over the week it will not be necessary to commit an entire week of time. Attendees will be free to dip in and out of presentations as they prefer, and the surgery days are free for attendees to familiarise and work on platform facilities as their time allows. Registration is free but attendance may be limited on a first-come, first-served basis depending on the level of demand. Early-stage researchers and PhD students are particularly encouraged to attend.

Invited (and confirmed) speakers:

Jos de Bruijne and Pedro Garcia-Lario (ESA), Andre Moitinho (University of Lisbon), Anthony Brown (University of Leiden), Marc Drobek (German Centre for Astrophysics), Andy Taylor (University of Edinburgh), Steven Gough-Kelly (University of Lancashire), Alfred Castro and Sagar Malhotra (University of Barcelona)

Organising team:

Nigel Hambly, Brendan O'Brien, Malcolm Illingworth, Simon Harnqvist, Bob Mann (University of Edinburgh); Mark Taylor (University of Bristol); Xavier Luri (University of Barcelona); Nic Walton (University of Cambridge)

ACKNOWLEDGEMENTS:

 

 

 

 

Compute resources for UK Cloud-based science platform development for ESA Gaia and Euclid are provided by the UK Science and Technology Facilities Council (STFC) under the auspices of the IRIS initiative. Staff resources are provisioned via grants from the STFC and the Horizon Europe SPACIOUS programme.

Nigel Hambly
Registration
Participants
    • 10:00 11:00
      Current facilities: Session block 1

      Gaia DMP, SPACIOUS, ESA DataLabs

      • 10:00
        Introduction 30m

        Science platform developments for space astronomy will be reviewed as a collaboration of initiatives in the context of the focus week. A broad overview that touches on some of the later contributions will be given, and the schedule will be summarised emphasising the opportunities to engage with developers in surgery sessions on the third day, and in the stand-up "bring and share" session on the final day.

        Speaker: Dr Nigel Hambly (Institute for Astronomy, University of Edinburgh)
      • 10:30
        ESA DataLabs: Gaia and Euclid DataLabs, present and planned 30m

        In this presentation, I will introduce ESA Datalabs (https://datalabs.esa.int). ESA Datalabs is a generic science platform in development by ESA that allows scientists bringing their code to ESA science mission data. The platform is operational in “Public Moderated Beta” mode yet has already been instrumental in various projects linked to, for instance, XMM-Newton, JWST, and Euclid. This presentation will focus mostly on applications for Euclid DR1 (October 2026, ~2 PB) and Gaia DR4 (December 2026, ~500 TB).

        Speaker: Dr Jos de Bruijne (ESA)
    • 11:30 12:30
      Current facilities: Session block 2

      Gaia DMP, SPACIOUS, ESA DataLabs

      • 11:30
        The SPACIOUS platform: design features from an end-user perspective 30m
        Speaker: Dr Brendan O'Brien (IfA, University of Edinburgh)
      • 12:00
        Gaia visualisation services on the SPACIOUS platform 30m

        Interactive exploration of Gaia’s two billion row tables requires specialised visual analytics pipelines that precompute multiresolution representations and provide low-latency access on standard user hardware. Building on the Gaia Archive Visualisation Service (GAVS), we have extended this framework within SPACIOUS through a fully containerised, cloud-deployable architecture and a redesigned data-production pipeline operating natively on Parquet. The refactored workflow improves memory efficiency, removes legacy format dependencies (gbin), and enables automated generation of visualisation products. A major enhancement is support for arbitrary user-selected attribute combinations, replacing the fixed histogram and scatter-plot variable sets available in the current GAVS. This required updates to preprocessing, indexing, and metadata handling to maintain fast retrieval across the full parameter space of Gaia tables. We also provide notebook integration (pyGAVS), allowing direct interaction with the visualisation server from Jupyter environments and supporting reproducible, and shareable workflows. With these developments, SPACIOUS provides flexible, scalable, high-performance visual analytics services for Gaia data releases and other large tabular datasets.

        Speaker: Prof. Andre Moitinho (University of Lisbon)
    • 14:00 15:30
      Current facilities: Session block 3

      Gaia DMP, SPACIOUS, ESA DataLabs

      • 14:00
        Gaia data analysis services on the SPACIOUS platform: GUASOM 30m

        This talk provides an overview of the data analysis services available on the platform, focusing particularly on GUASOM, a tool that allows you to visualize Self-Organizing Maps, a powerful clustering technique. This technique groups similar stars according to their nature, acting as a dimensionality reduction technique while preserving the topological order of the input data.

        Speaker: Marco Alvarez
      • 14:30
        SPACIOUS opportunities in the coming year 30m
        Speaker: Prof. Xavier Luri (University of Barcelona)
      • 15:00
        Euclid data releases and their exploitation 30m
        Speaker: Prof. Andy Taylor (University of Edinburgh)
    • 16:00 17:00
      Future facilities: Session block

      Science platforms planned for the future

      • 16:00
        Towards a Gaia Science Exploitation and Data Analysis Platform: Usage Scenarios and Requirements 30m

        I will describe our current plans for a Gaia Science Exploitation Platform, at least from the conceptual design, type of user profiles that we expect, usage scenarios and main science and technical requirements, relying in principle on the functionalities (to be) provided by the ESA DataLabs framework complemented potentially with external cloud resources.

        Speaker: Pedro Garcia Lario (ESA)
      • 16:30
        From terminal to platform: challenges on building up a Cloud Science Platform in an HPC world at DZA 30m
        Speaker: Marc Drobek (German Center for Astrophysics (DZA))
    • 09:30 11:00
      Example usages: Session block 1

      Presentations from existing users on how they are using existing science platforms

      • 09:30
        The Gaia survey selection function and the GaiaUnlimited project 30m

        I will present the GaiaUnlimited project which ran between 2021 and 2024 and was aimed at determining the Gaia survey selection function and providing corresponding data and tools. The complications in determining the Gaia selection function will be summarized and the way this was handled will be reviewed. I will also provide an overview of the various tools developed within this project and show some example science cases addressed with the tools. I will also point out the current computational challenges we face in connection the selection function for Gaia DR4 and the role of SPACIOUS.

        Speaker: Prof. Anthony Brown (University of Leiden)
      • 10:00
        Heteroscedastic Observations of Large-Amplitude Long-Period Variables 30m

        Mira variables have been shown to follow period-age relations and are useful in studying the evolution of the Milky Way. Gaia is a unique facility in its capabilities and contributions to broad areas of Milky Way science. I will present our work on characterising long-period variables in the Gaia archive, the challenges of heteroscedastic observations and how the Gaia UK Data Mining Platform has facilitated a deeper understanding of these sources.

        Speaker: Steven Gough-Kelly (Jeremiah Horrocks Institute, University of Lancashire)
      • 10:30
        Star Clusters in the Gaia Era 30m

        In the past decade, Gaia has doubled the known population of Milky Way star clusters. With more than 1.5 billion stars expected to receive astrometric solutions in Gaia DR4, in addition to epoch photometry and astrometry, platforms like SPACIOUS will be essential for analyzing these increasingly large datasets. In this talk, I will provide an overview of the clustering algorithms widely used in astronomy and recent all-sky searches for open clusters in Gaia data. I will also show an example of recovering some well-known open clusters with HDBSCAN. I will conclude by highlighting key computational challenges in scaling these methods to the full Gaia catalogue—particularly the large-scale merging and deduplication of cluster candidates—and present some preliminary attempts using SPACIOUS.

        Speaker: Dr Sagar Malhotra (University of Barcelona)
    • 11:30 12:30
      Example usages: Session block 2

      Presentations from existing users on how they are using existing science platforms

      • 11:30
        Deriving the Galactic Star Formation History using Gaia data (science) 30m

        We present an application of the SPACIOUS platform (https://spacious.ub.edu) to run the Besançon Galaxy Model Fast Approximate Simulations, a tool aimed at deriving Galactic parameters by comparing observed and simulated catalogs. SPACIOUS allows us to work efficiently with both large observational datasets and our own simulated catalogs. In particular, we can retrieve and handle a Gaia sample of more than 16 million stars, and upload and process our BGM synthetic catalog within the same environment. By iteratively comparing observed color-magnitude diagrams with those generated using BGM FASt following a set of input parameters, we can derive the star formation history and the initial mass function of the Milky Way in the Solar neighbourhood and beyond, two fundamental magnitudes for the description and understanding of the origin and evolution of our Galaxy. The platform is prepared for parallelization through Apache Spark. This enables us to distribute our computations across multiple CPUs, including user-defined functions, without having to implement parallelization manually. As a result, the full workflow—from data access to model evaluation—runs in a scalable and reproducible way.

        Speaker: Dr Marc del Alcázar-Julià (University of Barcelona)
      • 12:00
        Deriving the Galactic Star Formation History using Gaia data (demo) 30m
        Speaker: Dr Alfred Castro (University of Barcelona)
    • 14:00 15:30
      Example usages: contributed talks session 1

      Presentations from existing users on how they are using existing science platforms

      • 14:00
        An ML-based calibration framework for radio astronomy 30m

        Radiometers are central to radio astronomy but suffer from instrumental effects such as impedance mismatches between the antenna and receiver. Traditional calibration schemes like Dicke switching rely on mechanical or thermal reference loads, making them complex and less reliable in space environments. We present a machine learning–based calibration framework that models and removes instrumental distortions using neural networks trained on known signals. This method eliminates the need for active switching, improving stability, reducing mass and power requirements, and enabling simpler, more robust radiometer designs. Applied to experiments targeting the global 21-cm signal, it achieves the precision required for cosmological measurements and next-generation space missions.

        Speaker: Sam Leeney
      • 14:30
        Studies of white dwarf stars from Gaia and the Virtual Observatory 30m

        The Gaia mission has revolutionized our knowledge in many fields of Astronomy. Since the beginning, Gaia and the Virtual Observatory have demonstrated to be a pairing of great value. Our group has extensively exploited this pairing for the study of white dwarf stellar evolution. Here, I will review the studies we have done so far and the main results obtained.

        Speaker: Fran Jimenez-Esteban (CAB (CSIC-INTA) Madrid)
      • 15:00
        The angular momentum spiral of the Milky Way disc in Gaia 20m

        Data from the Gaia mission shows prominent phase-space spirals that are the signatures of disequilibrium in the Milky Way (MW) disc. In this work, we present a novel perspective on the phase-space spiral in angular momentum (AM) space. Using Gaia DR3, we detect a prominent AM spiral in the solar neighbourhood. We demonstrate that the spiral detected in the z − v z phase-space projection can be straightforwardly derived from the AM spiral. We also find that MW stars on orbits with low or even moderate eccentricity follow close-to-elliptical trajectories in AM, which allows us to develop generative models for the spiral analytically, for example by tilting the MW disc at an earlier time. A ‘chi-by-eye’ fit of this simple model successfully produces a winding spiral that varies with Lz that generally matches the winding and amplitude of the data across most Lz bins. Modelling the phase spiral in AM is a promising method to constrain both past perturbations and the MW potential simultaneously. Our AM framework simplifies the interpretation of the spiral and offers a robust approach to modelling disequilibrium in the MW disc using all six dimensions of phase space.

        Speaker: Dr Rashid Yaaqib (University of Edinburgh)
    • 16:00 17:00
      Example usages: contributed talks session 2

      Presentations from existing users on how they are using existing science platforms

    • 09:30 17:30
      Surgery: Session block

      One-on-one surgery sessions, one hour each, available for attendees to sign-up for as they prefer

    • 09:30 11:00
      Bring & share: Session block 1

      Following on from surgery sessions, familiarisation with platform services and (hopefully) some hands-on playing, round off the workshop week with a session for participants to share their usage example(s), experiences, feedback and future requirements.

    • 11:30 12:30
      Workshop summary and open discussion 1h

      A summary of the week and a chance for an open discussion as to science platform design and future requirements

      Speaker: Dr Nigel Hambly (Institute for Astronomy, University of Edinburgh)