# Introduction to Earth2Studio 

In this notebook, we will introduce the Earth2Studio Python package and run a example that will demonstrate how to run a simple inference workflow to generate a basic determinstic forecast using one of the built in models of Earth-2 Inference Studio.


#### Contents of the Notebook

- [Earth2Studio](#Earth2Studio)
- [Simple Deterministic Inference](#Simple-Deterministic-Inference)
    - [Set Up](#Set-Up)
    - [Execute the Workflow](#Execute-the-Workflow)
    - [Post Processing](#Post-Processing)

#### Learning Outcomes

- Earth2Studio Key Features
- How to instantiate a built in prognostic model
- Creating a data source and IO object
- Running a simple built in workflow
- Post-processing results

## Earth2Studio

Earth2Studio is a Python package built to empower researchers, scientists, and enthusiasts in the fields of weather and climate science with the latest artificial intelligence models/capabilities. With an intuitive design and comprehensive feature set, this package serves as a robust toolkit for exploring this AI revolution in the weather and climate science domain.

### Package Design

The goal of this package is to enable the use to extrapolate and build beyond what is implemented in it. The design philosophy of Earth2Studio embodies a modular architecture where the inference workflow acts as a flexible adhesive, seamlessly binding together various specialized software components with well-defined interfaces. Each component within the package serves a distinct purpose in typical inference workflows.

<div style="display: flex; justify-content: center; gap: 10px;">
  <figure style="text-align: center;">
    <img src="https://raw.githubusercontent.com/openhackathons-org/End-to-End-AI-for-Science/main/workspace/python/jupyter_notebook/Earth2Studio/images/arch.png" style="width: 100%; height: auto;">
    <figcaption>Model architecture overview.</figcaption>
  </figure>
</div>

By viewing the inference workflow as a dynamic connector, Earth2Studio facilitates effortless integration of these components, allowing researchers to easily swap out or augment functionalities to suit their specific needs. We recognize that many users will have their own custom workflow needs, thus encourage users to use the provided features as a starting point to build their own.

<div style="display: flex; justify-content: center; gap: 10px;">
  <figure style="text-align: center;">
    <img src="https://raw.githubusercontent.com/openhackathons-org/End-to-End-AI-for-Science/main/workspace/python/jupyter_notebook/Earth2Studio/images/samples.png" style="width: 100%; height: auto;">
  </figure>
</div>

Significant importance is placed on the interface that enables the connection between the components and the workflow. These are simple python protocols that all variants of a particular component must share. This not only enables a consistent API but also the generalization of workflows.

### Key Features

While Earth2Studio contains a large collection of general utilities, functions and tooling the following six are considered the core. For more information on these features, see the dedicated documentation for each.

- **Built-in Workflows**: Multiple built-in inference workflows to accelerate your development and research.
- **Prognostic Models**: Support for the latest AI weather forecast models offered under a coherent interface.
- **Diagnostic Models**: Diagnostic models for mapping to other quantities of interest.
- **Datasources**: Datasources to connect on-prem and remote data stores to inference workflows.
- **IO**: Simple, yet powerful IO utilities to export data for post-processing.
- **Statistical Operators**: Statistical methods to fuse directly into your inference workflow for more complex uncertainty analysis.


## Simple Deterministic Inference

<div style="display: flex; justify-content: center; gap: 10px;">
  <figure style="text-align: center;">
    <img src="https://raw.githubusercontent.com/openhackathons-org/End-to-End-AI-for-Science/main/workspace/python/jupyter_notebook/Earth2Studio/images/deterministic.png" style="width: 100%; height: auto;">
  </figure>
</div>

### **Set Up**
All workflows inside Earth2Studio require constructed components to be  handed to them. In this example, let's take a look at the most basic: `earth2studio.run.deterministic`.



Let us look at the built in Models, Datasource and IO Backends that are avaialable with Earth2Studio `0.2.0`release. 


#### Prognostic Model: 

Prognostic models are a class of models that perform time-integration. Thus are typically used to generate forecast predictions.

The list of Prognostic Models available as of `0.2.0` are: 
- **models.px.DLWP** : Deep learning weather prediction (DLWP) prognostic model.
- **models.px.FCN** : FourCastNet global prognostic model.
- **models.px.FengWu** :  FengWu (operational) weather model consists of single auto-regressive model with a time-step size of 6 hours.
- **models.px.FuXi** : FuXi weather model consists of three auto-regressive U-net transfomer models with a time-step size of 6 hours.
- **models.px.Pangu24** : Pangu Weather 24 hour model.
- **models.px.Pangu6** : Pangu Weather 6 hour model.
- **models.px.Pangu3** : Pangu Weather 3 hour model.
- **models.px.Persistence** : Persistence model that generates a forecast by applying the identity operator on the initial condition and indexing the lead time by 6 hours.
- **models.px.SFNO** : Spherical Fourier Operator Network global prognostic model.


#### Data source : 

Data sources used for downloading, caching and reading different weather / climate data APIs into Xarray data arrays. Used for fetching initial conditions for inference and validation data for scoring.

The list of Datasources available as of `0.2.0` are: 
- **data.ARCO** : Analysis-Ready, Cloud Optimized (ARCO) is a data store of ERA5 re-analysis data currated by Google.
- **data.CDS** : The climate data source (CDS) serving ERA5 re-analysis data.
- **data.GFS** : The global forecast service (GFS) initial state data source provided on an equirectangular grid.
- **data.HRRR** : High-Resolution Rapid Refresh (HRRR) is a North-American weather forecast model with hourly data-assimilation developed by NOAA.
- **data.IFS** : The integrated forecast system (IFS) initial state data source provided on an equirectangular grid.
- **data.IMERG** : The Integrated Multi-satellitE Retrievals (IMERG) for GPM.
- **data.Random(domain_coords)** : A randomly generated normally distributed data.
- **data.WB2ERA5** : ERA5 reanalysis data with several derived variables on a 0.25 degree lat-lon grid from 1959 to 2023 (incl) to 6 hour intervals on 13 pressure levels.
- **data.WB2ERA5_121x240** : ERA5 reanalysis data with several derived variables down sampled to a 1.5 degree lat-lon grid from 1959 to 2023 (incl) to 6 hour intervals on 13 pressure levels.
- **data.WB2ERA5_32x64** : ERA5 reanalysis data with several derived variables down sampled to a 5.625 degree lat-lon grid from 1959 to 2023 (incl) to 6 hour intervals on 13 pressure levels.
- **data.WB2Climatology** : Climatology provided by WeatherBench2,
- **data.DataArrayFile** : A local xarray dataarray file data source.
- **data.DataSetFile** : A local xarray dataset file data source.

#### IO Backend: 

The IO Backends for used for saving the inference results for further post processing.

The list of IO Backends available as of `0.2.0` are: 
- **io.KVBackend** : A key-value (dict) backend.
- **io.NetCDF4Backend** : A backend that supports the NetCDF4 format.
- **io.XarrayBackend** : An xarray backed IO object.
- **io.ZarrBackend** : A backend that supports the zarr format.


For this example, we will be using the following:

- **Prognostic Model**: Use the built in FourCastNet Model :`earth2studio.models.px.FCN`.
- **Datasource**: Pull data from the GFS data api :`earth2studio.data.GFS`.
- **IO Backend**: Let's save the outputs into a Zarr store :`earth2studio.io.ZarrBackend`.

In [None]:
import os

os.environ['EARTH2STUDIO_CACHE'] = os.getcwd() + "/outputs/cache"
os.makedirs("outputs", exist_ok=True)
from dotenv import load_dotenv
load_dotenv()

from earth2studio.data import GFS
from earth2studio.io import ZarrBackend
from earth2studio.models.px import FCN

# Prognostic Model - Load the default model package which downloads the check point from NGC
package = FCN.load_default_package()
model = FCN.load_model(package)

# Data Source - Create the data source
data = GFS()

# IO Backend - Create the IO handler, store in memory
io = ZarrBackend()

### **Execute the Workflow**

With all components initialized, running the workflow is a single line of Python code. 
Workflow will return the provided IO object back to the user, which can be used to
then post process. Let us look at the API for Determinstic inference

```python 

def deterministic(
    time: list[str] | list[datetime] | list[np.datetime64],
    nsteps: int,
    prognostic: PrognosticModel,
    data: DataSource,
    io: IOBackend,
    output_coords: CoordSystem = OrderedDict({}),
    device: torch.device | None = None,
) -> IOBackend:
    """Built in deterministic workflow.
    This workflow creates a determinstic inference pipeline to produce a forecast
    prediction using a prognostic model.

    Parameters
    ----------
    time : list[str] | list[datetime] | list[np.datetime64]
        List of string, datetimes or np.datetime64
    nsteps : int
        Number of forecast steps
    prognostic : PrognosticModel
        Prognostic model
    data : DataSource
        Data source
    io : IOBackend
        IO object
    output_coords: CoordSystem, optional
        IO output coordinate system override, by default OrderedDict({})
    device : torch.device, optional
        Device to run inference on, by default None

    Returns
    -------
    IOBackend
        Output IO object
    """
```

For the forecast we will predict for 20 forecast steps which is 5 days.

In [None]:
import earth2studio.run as run

nsteps = 20 # Each step has a lead time of 6 hours. 
io = run.deterministic(["2024-01-01"], nsteps, model, data, io)

print(io.root.tree())

### **Post Processing**
The last step is to post process our results. Cartopy is a great library for plotting
fields on projections of a sphere. Here we will just plot the temperature at 2 meters
(t2m) 1 day into the forecast.

Notice that the Zarr IO function has additional APIs to interact with the stored data.



In [None]:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt

forecast = "2024-01-01"
variable = "t2m"
step = 4  # lead time = 4 x 6 = 24 hrs

plt.close("all")
# Create a Robinson projection
projection = ccrs.Robinson()

# Create a figure and axes with the specified projection
fig, ax = plt.subplots(subplot_kw={"projection": projection}, figsize=(10, 6))

# Plot the field using pcolormesh
im = ax.pcolormesh(
    io["lon"][:],
    io["lat"][:],
    io[variable][0, step],
    transform=ccrs.PlateCarree(),
    cmap="Spectral_r",
)

# Set title
ax.set_title(f"{forecast} - Lead time: {6*step}hrs")

# Add coastlines and gridlines
ax.coastlines()
ax.gridlines()
plt.savefig("outputs/01_t2m_prediction.jpg")

Let us now create a simple GIF that would go through all the steps using the below script.

Kindly note, the below script would take approximately 10 minutes to show the output. 

In [None]:
import numpy as np
import matplotlib.animation as animation
from IPython.display import HTML

forecast = "2024-01-01"
variable = "t2m"
num_timesteps = 20  # Number of time steps to create GIF

# Create a Robinson projection
projection = ccrs.Robinson()

# Create a figure and axes with the specified projection
fig, ax = plt.subplots(subplot_kw={"projection": projection}, figsize=(10, 6))

# Create a function to update the frame
def update(frame):
    ax.clear()  # Clear the axis for new plot
    im = ax.pcolormesh(
        io["lon"][:],
        io["lat"][:],
        io[variable][0, frame],
        transform=ccrs.PlateCarree(),
        cmap="Spectral_r",
    )
    ax.set_title(f"{forecast} - Lead time: {6 * frame } hrs")
    ax.coastlines()
    ax.gridlines()
    return im,

# Create an animation
ani = animation.FuncAnimation(fig, update, frames=num_timesteps, blit=False)

# Save as GIF & Display them - Kindly note, this cell takes around 10 minutes or more to display the output
ani.save('outputs/t2m_prediction_animation.gif')
print("Animation of 20 Timesteps")
HTML(ani.to_html5_video())

# Important: Free up GPU Memory!

Run the below cell to free up GPU memory after training the model before moving to the next notebook.

In [None]:
import os
os._exit(00)

Now we had looked at the plot of t2m at the 4th step ( Each step is 6 hours in FCN Model ). In the Next notebook, let us extend this by adding a Diagnostic Model with FCN.

--- 

Don't forget to check out additional [Open Hackathons Resources](https://www.openhackathons.org/s/technical-resources) and join our [OpenACC and Hackathons Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.

---

# Licensing

Copyright Â© 2023 OpenACC-Standard.org.  This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials may include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.
