A Data Management Infrastructure for Climate Modeling Research
A. Chervenak, C. Kesselman
University of Southern California/Information Sciences Institute

I. Foster, S. Tuecke, W. Allcock, V. Nefedova, D. Quesnel
Argonne National Laboratory

B. Drach, D. Williams
Lawrence Livermore National Laboratory

A. Sim, A. Shoshani
Lawrence Berkeley National Laboratory

Description:
We will demonstrate our infrastructure for secure, high-performance data transfer and replication for large-scale climate modeling data sets. Climate modeling data sets typically consist of many files, ranging to many gigabytes in size. These files may be replicated at various locations. When a climate modeling researcher requests a particular view of the data, we initiate a secure transfer of the relevant files from the data replica that offers the best performance.

Our application includes several components. First, a user specifies at a high level the characteristics of the desired data (for example, precipitation amounts for a certain time period and region). A
metadata infrastructure maps between these high-level attributes and file names. Next, we use a replication management infrastructure to find all physical locations of the desired files. We select among these physical locations by consulting performance and information services such as the Network Weather Service and the Globus Metacomputing Directory Service to predict relative performance of transfers from each location. Once a particular physical replica is selected, we initiate secure, high-performance data transfer between the source and destination sites. Finally, the desired data is presented graphically to the user.

This project is joint work by three groups. Researchers at Lawrence Berkeley National Laboratory created a request manager that calls low-level services and selects among replicas. Scientists at Lawrence
Livermore National Laboratory provided the user interface and visualization output for the application as well the metadata service that maps between high-level attributes and files. Finally, the Globus project team at USC Information Sciences Institute and Argonne National Laboratory provided basic grid services, including replica management, information services, and secure, efficient data transfer.