DeepMind open-sourced a dataset and trained model snapshot for Deep Generative Models of Rainfall (DGMR), an AI system for short-term precipitation forecasts. In evaluations conducted by 58 expert meteorologists comparing it to other existing methods, DGMR was ranked first in accuracy and usefulness in 89% of test cases.
The model and several experiments were described in an article published in Nature. DeepMind developed DGMR in collaboration with the UK Met Office to perform nowcasting: short-term, high-resolution predictions of precipitation. Using a deep learning technique called generative models, DGMR learns to generate “radar movies”: given a short series of radar images of rainfall, it learns to predict future radar images, thus predicting the amount and location of future precipitation. According to DeepMind:
We think this is an exciting area of research and we hope our paper will serve as a foundation for new work by providing data and verification methods that make it possible to both provide competitive verification and operational utility. We also hope this collaboration with the Met Office will promote greater integration of machine learning and environmental science, and better support decision-making in our changing climate.
Nowcasts are often used for assisting decision-making in many areas, such as air traffic control and energy management; thus, their accuracy has economic and safety implications. Current methods, such as STEPS and PySTEPS, often use numeric approaches to solve physics equations that describe weather behavior. These systems model the uncertainty of their predictions by producing ensembles of predictions. More recently, researchers have developed deep learning models that are trained on datasets of radar observations; however, the DeepMind team note that these models have limited operational usefulness, as they are “unable to provide simultaneously consistent predictions across multiple spatial and temporal aggregations.”
The DGMR model is based on a conditional generative adversarial network (GAN). The generator network takes in four observed radar frames as context and generates output predictions for the next 18 frames. The generator is trained along with two discriminator networks which learn to tell the difference between real radar data and generated data; one discriminator focuses on spatial consistency within frames, and the other on temporal consistency across a sequence of frames. The entire system is trained on historical data from radar observations in the UK, from the years 2016 to 2019. The trained model can generate a prediction in “just over a second” using a single NVIDIA V100 GPU.
To evaluate DGMR’s performance, DeepMind compared it to three baseline models: PySTEPS, UNet, and MetNet. Besides the general ranking of accuracy and value, a group of expert meteorologists also judged the models’ predictions for a single “meteorologically challenging event”. In this case study, 93% indicated DGMR’s results as their first choice. The DeepMind team also evaluated the models on several metrics, including critical success index (CSI), radially averaged power spectral density (PSD), and continuous ranked probability score (CRPS); on these metrics, DGMR compared “competitively” to the baselines.
AI models for weather forecasting is an active research area. InfoQ previously covered an AI model for predicting electrical outages caused by storms, as well as a model for solving partial differential equations, which could be used for modeling climate. Google recently announced MetNet-2, which “substantially improves on the performance” of MetNet.
In a discussion about DGMR on Reddit, one commenter questioned the usefulness of the approach. Another pointed out,
The GAN is basically just hallucinating plausible details on top of the L1 prediction, but the fact is, this still leads to a higher predictive skill and value! Is the method really garbage if it has higher predictive performance on multiple metrics than other leading deep networks and statistical baselines? Furthermore, there is a ton of research into avoiding GAN mode-dropping that can be integrated into this baseline approach. That seems like a pretty promising way to gain even more performance!
The trained DGMR model and dataset are available on GitHub.