|
CUMULVS and Globus: Opening New Doors for Visualization, Computational
Steering, and Fault Tolerance of High-Performance Scientific Simulations
James Arthur Kohl
Oak Ridge National Laboratory
Hosted by Nicholas Karonis
10:30 AM, September 21, 1999
Building 221, Room A-216
|
| Abstract |
Scientific simulation continues to proliferate as an alternative to
expensive physical prototypes and laboratory experiments. Such software-based research and
development provides a cost-effective means for exploring a wide range of input datasets
and variations in physical parameters. In conjunction with ubiquitous network connectivity
these online experiments also provide a platform for collaboration with remotely located
researchers - a feat not possible with traditional physical prototypes or experiments.
Much infrastructure is required to enable the development of these advanced computer
simulations. Teams of scientists need to observe the ongoing progress of a simulation and
share in its control. The user environment must withstand or recover from system faults or
failures. Efficient handling of these issues requires expertise in computer science and a
level of effort that the application scientist is not typically willing to expend. The
CUMULVS system (Collaborative User-Migration, User Library for Visualization and Steering)
provides an infrastructure for interacting with parallel and distributed simulations
on-the-fly. Using CUMULVS, a team of geographically distributed researchers can each
dynamically attach their own front-end viewer program to the same running simulation. With
their viewers they can collaboratively monitor and control the simulation via interactive
visualization and computational steering functions. The visual feedback from the
simulation can provide insight to alter the course of the computation and steer it toward
the desired solution. CUMULVS also provides a simple user-directed checkpointing mechanism
to save the state of the simulation program periodically, for task migration or recovery
from failures. Given semantic information (as provided by the application) CUMULVS can
migrate and restart tasks across heterogeneous system architectures.
CUMULVS is being ported to work with the Globus system, on top of the Nexus communication
substrate. This will increase the usefulness and applicability of CUMULVS by making it
available to a larger user base. Nexus offers a rich and powerful interface for
high-performance message-passing, and a callback mechanism for fault tolerance. The data
management offered by the Metacomputing Directory Service (MDS) in Globus can be applied
for application discovery. Together, CUMULVS, Nexus and MDS will add capabilities to
Globus to support state-of-the-art simulation science.
|