WeatherGenerator

/
What we do
/
/
WeatherGenerator

The project will build the WeatherGenerator – the world’s best generative Foundation Model of the Earth system – that will serve as a new Digital Twin for Destination Earth. The WeatherGenerator will be based on representation learning and create a general and
versatile tool that models the dynamics of the Earth system based on a large variety of Earth system data. The WeatherGenerator will be task-independent and will improve results for a wide range of machine learning applications when compared to task specific machine
learning tools. It will also be more resilient for climate applications when the underlying data distributions are changing, and it will lead to a significant reduction in computational costs and faster turnaround times. To achieve this, the project will:
(1) Collect and use the most important datasets of Earth system science including data from Digital Twins of Destination Earth, selected observations, analysis and reanalysis datasets, and output of conventional Earth system models.
(2) Build the WeatherGenerator as a novel representation learning- based machine learning tool that exploits the full potential of Europe’s largest supercomputers.
(3) Engage with the wider community via services and apply the WeatherGenerator for 22 selected applications that can be integrated into the Destination Earth framework.

The applications include global and local predictions, local downscaling, data assimilation, model post-processing, and impact applications in the domains of renewable energy, water, health and food. The project consortium that will build the WeatherGenerator consists of experts in machine learning, supercomputing and Earth system sciences, and includes industry, SMEs, and leading operational weather centers. The WeatherGenerator will lead to key innovations in weather and climate science and machine learning to enable Europe to establish and defend leadership with respect to machine-learning based Earth system modelling.

Duration
48 Months from 01/02/2025 to 31/01/2029
Funded by
  • European Research Executive Agency (REA)

Coordinating organization
  • ECMWF - European Centre for Medium-Range Weather Forecasts

CMCC Scientific Leader
CMCC Project manager
CMCC Institutes

CMCC Divisions
CMCC Centers

General aims

1. Collect and use the leading datasets of Earth system science including selected observations, analysis and reanalysis datasets, output of conventional simulations of Earth system models, and data generated in the Digital Twin experiments of DestinE.
2. Build the WeatherGenerator based on representation learning that can use the full potential of Europe’s largest supercomputers during training – the pre-exascale and exascale machines of EuroHPC as well as the Alps supercomputer – to create a Foundation Model of the Earth system.
3. Apply the WeatherGenerator in a wide range of application domains including global and limited area simulations for weather and climate, local downscaling, data assimilation, model post-processing, data
compression, and impact applications in the domains of renewable energy, water, health and food.
4. Engage with the wider scientific community, commercial entities and DestinE stakeholders to generate impact and uptake of the WeatherGenerator, and provide services to enable external groups within Europe to apply the WeatherGenerator for their applications.

CMCC role
Since WeatherGenerator is divided into different themes, CMCC is involved in the following themes and tasks:

Theme 1, WP1: CMCC contributes to the following tasks:
Task 1.2: Dataset collection: The high-quality datasets that are used to train the atmosphere and land components of the WeatherGenerator are collected and curated.
Task 1.3: Dataset optimisation: This task will apply known data volume reduction methods such as filtering, compression, and auto-encoding, adapted for use in tasks in WP3 and WP4 where efficient access to local subsets of the data is required.

Theme2, WP3-WP: CMCC contributes to the following tasks:
Task 3.2: Development of encoder architecture and training: The encoder of WeGen-Atmo provides the “assimilation engine” that combines the different data streams into a consistent representation. Different options will be explored for the embedding networks of the different data streams, including linear layers, graph neural networks and small transformers.
Task 3.3: Development of decoder architecture and training: Three architectures will be investigated and compared for the decoder: a transformer-based approach with an ensemble of prediction heads, a conditional diffusion model, and a conditional consistency model.
Task 3.5: HPC implementation and optimisation: In this task, we will adapt the model architecture and training of WeGen-Atmo for HPC machines. We will develop model parallelism (for example attention heads distributed across nodes) and data parallelism (for example different nodes per data stream)
Task 3.8: Evaluation and interpretation: Scientific evaluation of the WeatherGenerator is critical for its effectiveness and its acceptance by domain scientists. It will build on existing methodologies from Earth system science but adapted to the challenges and peculiarities of a data-driven Earth system model. In particular, the scientific evaluation will include the use of methods from explainable AI adapted to the very large model sizes of the WeatherGenerator.
Task 4.5: Advanced training and fine-tuning protocols: In this task, training strategies to fine-tune the WeatherGenerator for applications will be explored and implemented. This includes reinforcement learning-based training, for example for unbiased, very long-term predictions on multi-year time scale. We will also develop the online learning strategy for model output of the Extreme and Climate Digital Twins of DestinE, with a focus on preventing catastrophic forgetting while keeping the training as efficient as possible
Task 4.9: Evaluation and Interpretation: In this task, we will continue the evaluation of the capabilities of the WeatherGenerator comparing task-specific machine learning solutions as well as conventional methods to those supported and enabled by the WeatherGenerator with a specific focus on Data Assimilation and the evaluation of extreme events such as heatwaves, severe precipitation events, tropical cyclones and droughts.

Theme 3: WP5+WP6, CMCC contributes to the following tasks:
Task 5.7: Climate-prediction application – Months 1-12: provide an ensemble of downscaled forecasts with more realistic extremes in a decadal forecast range (AP13). Long-term projections (such as AMIP) and climate prediction (such as the decadal prediction systems DCPP) will be used as input to the AIFS. The outcome will be used as a baseline for the development of this application with the WeatherGenerator. The assessment of the extreme events will be performed in hindcast mode against the DCPP predictions and observations.
Task 5.8: Arctic sea ice application – Months 1-12: This task defines a machine learning application for predicting Arctic sea-ice (AP14), enhancing our understanding of its dynamics. A specific tail neural network will be trained by using satellite data and model-based data of the Arctic region.
Task 6.1: Introduce WeatherGenerator output data into application workflows
Task 6.2: Develop machine learning solutions based on tail networks

Expected results

  • A new machine-learned Digital Twin that will enrich DestinE services
  • Improved skill and improved efficiency when using the WeatherGenerator for many different machine learning applications in Earth system science
  • Better predictions for weather across forecast lead-times from days, to weeks, to seasons
  • More trustworthiness in machine learned predictions for climate as the WeatherGenerator has been stress tested in many areas of
  • Earth sciences including for multi-year predictions
  • Curated datasets for the training of largescale machine learning applications for weather and climate that combine observations, model output, DestinE Digital Twins and (re-)analysis data
  • Better results and cheaper production of renewable energy and in particular windpower and hydrology
  • Better predictions for floods, heatwaves and crop failures when using the WeatherGenerator
  • Published open-source code, data and publications that make the WeatherGenerator available for the community, and summarise the success stories when using the WeatherGenerator in real-world applications

Partners
FZJ – FORSCHUNGSZENTRUM JULICH GMBH
MetNor – METEOROLOGISK INSTITUTT
MPG – MAX-PLANCK-GESELLSCHAFT ZUR FORDERUNG DER WISSENSCHAFTEN EV
KNMI – KONINKLIJK NEDERLANDS METEOROLOGISCH INSTITUUT-KNMI
METEO-FRANCE – METEO-FRANCE
SMHI – SVERIGES METEOROLOGISKA OCH HYDROLOGISKA INSTITUT
MET – OFFICE MET OFFICE
ESCIENCE – STICHTING NETHERLANDS ESCIENCE CENTER
BULUTTAN – BULUTTAN METEOROLOJI VE TEKNOLOJI AS
KAJO – KAJO SRO
LT – LATEST THINKING GMBH
STATKRAFT – STATKRAFT ENERGI AS
ETH Zürich – EIDGENOESSISCHE TECHNISCHE HOCHSCHULE ZUERICH
MeteoSwiss – EIDGENOESSISCHES DEPARTEMENT DES INNERN


LinkedIn

Start typing and press Enter to search

Shopping Cart