Introduction: Access to PO.DAAC datasets in the cloud

PO.DAAC is in the process of moving its data holdings to the cloud. The Cloud Data page at PO.DAAC offers access to cloud-based datasets as well as resources to help guide data users in discovering, accessing, and utilizing cloud data.

The Cloud Datasets section provides a listing page for cloud-archived datasets, with more tools/services integration. The Resources section shares information, updates, data recipes, and other materials that help support the user in discovering, accessing and using datasets from and within the Earthdata Cloud. The Migration section offers information on the transition timeline and datasets, what to expect, and migration-specific FAQs and tutorials. For questions on what this transition means, please see the FAQ section.

During this transition to the cloud, this Cloud Data page will be evolving and continuously updated with new content and data - please check back regularly.

Below are several guides meant to help users get started or continue exploring with NASA Earthdata Cloud data, tools and services

1. What is the NASA Earthdata Cloud?


In the new paradigm, the data storage, and DAAC-provided tools and services built on top of the data are co-located in the Earthdata Cloud (hosted in AWS cloud). So what does this mean to you, the user of the data?

  • PO.DAAC will provide the same level of service to users, while handling large volumes of data, by leveraging the scalability capability of the cloud.

  • PO.DAAC will provide services that are co-located with the data in the cloud to minimize the amount of data downloaded, allowing you to select and access only the data you are interested in, making the data more analysis ready - whether the next step in your workflow is to download and analyze/do your work, or the next step is working in the cloud.

  • Users are not required to move their workflows to the cloud in order to access PO.DAAC data hosted in Earthdata Cloud (in AWS); users do have to update any existing PO.DAAC data access end-points, to point to the new Earthdata Cloud end-points, in order to access the data.

    • The Earthdata Cloud end-points can be found on the respective cloud-enabled dataset landing page, under Data Access

    • Traditional end-points include whole-file download, OPeNDAP, and virtual directory browsing

  • Data download will continue to be freely available to users, from the Earthdata Cloud archive

  • While data download from the Earthdata Cloud archive is freely available, in some cases it may be beneficial for users to move their science and application workflows to the cloud. With the dawn of Big Data era upon us, the cloud offers a scalable and effective way to address storage, network, and data movement concerns while offering a tremendous amount of flexibility to the user. Particularly if working with large data volumes, data access and processing would be more efficient if workflows are taking place in the cloud, "next to the data", which avoids having to download large data volumes.

2. Access Pathways

Three pathway examples to interact and access data (and services) from and within the NASA Earthdata Cloud, are illustrated in the diagram below:

  • Working locally, after downloading data to your local machine, servers, or cluster (green arrows and icons)

  • Within the Cloud: Set up your own AWS EC2 cloud instance, or virtual machine, in the cloud next to the data* (orange arrows and icons)

  • Within the Cloud: Through shareable cloud environments, such as Binder or JupyterHub, set up in an AWS cloud region* (blue arrows and icons)

Note that each of these may have a range of cost models.


*PO.DAAC and other EOSDIS data are being stored in the us-west-2 region of AWS cloud: we recommend setting up your cloud computing environment in the same region as the data for free and easy in-cloud access.

 

 

 

COSTING - What is free and what do I have to budget for, now that data is archived in the cloud?

  • Downloading data from the Earthdata Cloud archive in AWS, to your local computer environment or local storage (e.g. servers) is and will continue to be free for the user. 

  • Accessing the data directly in the cloud (from us-west-2 S3 region) is free. Users will need a NASA Earthdata Login account and AWS credentials to access, but there is no cost associated with these authentication steps, which are in place for security reasons.

  • Accessing data in the cloud via EOSDIS or DAAC cloud-based tools and services such as the CMR API, Harmony API, OPenDAP API (from us-west-2 S3 region) is free to the user. Having the tools and services “next to the data” in the cloud enables DAACs to support data reduction and transformation, more efficiently, on behalf of the user, so users only access the data they need.

  • Cloud computing environments (i.e. virtual machines in the cloud) for working with data in the cloud (beyond direct or via services provided access) such as data analysis or running models with the data, is user responsibility, and should be considered in budgeting. I.e. User would need to set up a cloud compute environment (such as an EC2 instance or JupyterLab) and is responsible for any storage and computing costs. 

    • This means that even though direct data access in the cloud is free to the user, they would first need to have a cloud computing environment/machine to execute the data access step from, and then continue their analysis. 

    • Depending on whether that cloud environment is provided by the user themselves, user’s institution, community hubs like Pangeo or NASA Openscapes JupyterLab sandbox, this element of the workflow may require user accountability, budgeting and user financial maintenance.

 

3. Getting Started

The following are conceptual roadmaps for users getting started with NASA Earth Observations cloud-archived data. They outline the steps the user would take to get set up using Earthdata Cloud data, both for in-cloud and local (e.g. laptop) workflows. If you are a current user of PO.DAAC data, changing your data access end-point to the new PO.DAAC Earthdata Cloud end-point is an important step to updating your workflow to continue accessing these data. Please also see our Migration section for more details.

 

In-Cloud Workflow

Published Google Slide

 

Local Workflow (i.e. download data)

Published Google Slide

 

4. Tools and Services Roadmap

Below is a practical guide for learning about and selecting helpful tools or services for a given use case, focusing on how to find and access cloud-archived data from local compute environment (e.g. laptop) or from a cloud computing workspace, with accompanying example tutorials. Once you follow your desired pathway, click on the respective blue notebook icon to get to the tutorial. Note: these pathways are not exhaustive, there are many ways to accomplish these common steps, but these are some of our recommendations.

Published Google Slide

 

5. Tools and Services Cheat Sheet

The following is a practical reference guide for users who are starting to take the conceptual pieces and explore and implement in their own workflows. It can serve as a reference guide for selecting from available tools to enable and implement the 2. Access Pathway(s) above.

Published Google Slide

 

Terminology References

Published Google Slide

 

 

Published Google Slide