mcs | publications | abstracts

2005 Abstracts of MCS Reports and Preprints

Reports

S. J. Benson and Y. Ye, "DSDP5 User Guide - Software for Semidefinite Programming," Technical Memorandum ANL/MCS-TM-277, September 2005. DSDP implements the dual-scaling algorithm for semidefinite programming.  The source code if this interior-point solver, written entirely in ANSI C, is freely available.  The solver can be used as a subroutine library, as a function within the MATLAB environment, or as an executable that reads and writes to files.  Initiated in 1997, DSDP has developed into an efficient and robust general-purpose solver for semidefinite programming.  Although the solver is written with semidefinite programming in mind, it can also be used for linear programming and other constraint cones.
D. Buntinas and W. Gropp, "Understanding the Requirements Imposed by Programming Model Middleware on a Common Communication Subsystem," Technical Memorandum ANL/MCS-TM-284, July 2005. In high-performance parallel computing, most programming-model middleware libraries and runtime systems use a communication subsystem to abstract the lower-level network layer.  The functionality required of a communication subsystem depends largely on the middleware libraries, and runtime systems typically implement their own communication subsystems that are specially tuned for the middleware, rather than use an existing communication subsystem. This situation leads to duplicated effort and prevents different middleware libraries from being used by the same application in hybrid programming models.  In this report we describe features required by various middleware libraries as well as some desirable features that would make it easier to port a middleware library to the communication subsystem and allow the middleware to make use of high-performance features provided by some networking layers.  We show that none of the communication subsystems that we evaluate support all of the features.
E. M. Gertz and J. D. Griffin, "Support Vector Machine Classifiers for Large Data Sets," Technical Memorandum ANL/MCS-TM-289, October 2005. This report concerns the generation of support vector machine classifiers for solving the pattern recognition problem in machine learning. Several methods are proposed based on interior-point methods for convex quadratic programming. Software implementations are developed by adapting the object-oriented packaging OOQP to the problem structure and by using the software package PETSc to perform time-intensive computations in a distributed setting.  Linear systems arising from classification problems with moderately large numbers of features are solved by using two techniques---one a parallel direct solver, the other a Krylov-subspace method incorporating novel preconditioning strategies. Numerical results are provided, and computational experience is discussed.
D. Buntinas, G. Mercier, and W. Gropp, "Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem," Technical Memorandum ANL/MCS-TM-292, November 2005. This paper presents a new low-level communication subsystem called Nemesis.  Nemesis has been designed and implemented to be scalable and efficient both in the intranode communication context using shared-memory and in the internode communication case using high-performance networks and is natively multimethod-enabled.  Nemesis has been integrated in MPICH2 as a CH3 channel and delivers better performance than other dedicated communication channels in MPICH2.  Furthermore, the resulting MPICH2 architecture outperforms other MPI implementations in point-to-point benchmarks.
Preprints
P. F. Fischer and C. P. Tzanos, "Filtered Simulations of Turbulence in a Reactor Rod Bundle Flow," Preprint ANL/MCS-P1213-0105, January 2005.

Although RANS (Reynolds averaging of the Navier-Stokes equations) turbulence models provide adequately accurate solutions for many engineering problems at a reasonable computational cost, there is no single RANS model of universal applicability. Moreover, in many important applications the RANS model predictions for flow distributions, heat transfer coefficients, and thermal mixing can be significantly off. Examples of such applications are flows in pressurized water reactor rod bundles with mixing vanes and flows in gas cooled reactors where buoyancy effects are significant. Because direct numerical simulation (DNS) of turbulence for large Reynolds numbers and for system-size scales of interest are not still computationally practical, LES (large eddy simulation) and coarse DNS models of turbulence, which are computationally less demanding than DNS, hold the promise to remedy the shortcomings of the RANS models in many important applications.  LES is predicated on a scale separation mechanism, usually in the form of a filter, to isolate the resolved (simulated) scales from the subgrid scales (SGS). An essential part of LES is to account for the effects of the missing SGS terms; these typically appear as enhanced diffusion in the Navier-Stokes equations.

G. von Laszewski and M. Sosonkin, "A Grid Certificate Authority for Community and Ad-hoc Grids," Preprint ANL/MCS-P1216-0105, January 2005. One of the most important issues in Grid computing is to provide a secure environment that allows administrators to contribute their resources and users to utilize them.  Currently diverse methods are required to obtain certificates for the different Grids.  In this paper we showcase a prototype of a tool that simplifies the tasks associated with maintaining a Grid certificate authority and simplifies the application process for the user to interact with multiple certificate authorities.
W. McCune, R. Padmanabhan, M. A. Rose, and R. Veroff, "Automated Discovery of Single Axioms for Ortholattices," Algebra Universalis 52 (2004), pp. 541-549.  Also Preprint ANL/MCS-P1218-0105, January 2005. We present short single axioms for ortholattices, orthomodular lattices, and modular ortholattices, all in terms of the Sheffer stroke.  The ortholattice axioms is the shortest possible.  We also give multiequation bases in terms of the Sheffer stroke and in terms of join, meet, and complementation.  Proofs are omitted but are available in an associated technical report and on the Web.  We used computers extensively to find candidates, reject candidates, and search for proofs that candidates are single axioms.
N. Desai, A. Lusk, R. Bradshaw, and E. Lusk, "MPISH: A Parallel Shell for MPI Programs," Preprint ANL/MCS-P1219-0105, January 2005. While previous work has shown MPI to provide capabilities for system software, actual adoption has not widely occurred.  We discuss process management shortcomings in MPI implementations and their impact on MPI usability for system software and management tasks.  We introduce MPISH, a parallel shell designed to address these issues.
M. M. Strout, B. Kreaseck, and P. Hovland, "Data-Flow Analysis for MPI Programs," Preprint ANL/MCS-P1220-0105, January 2005. Message passing via MPI is widely used in single-program, multiple-data (SPMD) parallel programs.  Data-flow analysis frameworks that respect the semantics of message-passing SPMD programs are needed to obtain more accurate and in some cases correct analysis results for such programs.  We qualitatively evaluate various approaches for performing data-flow analysis on SPMD MPI programs and present a method for performing interprocedural data-flow analysis on the MPI-ICFG representation.  The MPI-ICFG is an interprocedural control-flow graph (ICFG) augmented with communication edges between possible send and receive pairs.  We discuss in detail two analyses that potentially benefit from propagating information over communication edges: reaching constants and activity analysis.  Constants can be shared in SPMD programs without communicating them: therefore, performing reaching constants over the MPI-ICFG is useful mainly for illustrative purposes.  Activity analysis is a domain-specific analysis used to reduce the computation and storage requirements for automatically differentiated MPI programs.  Our experimental results show that activity analysis performed over the MPI-ICFG has a convergence rate comparable to a more conservative version of the analysis performed on an ICFG.  Also, using the MPI-ICFG data-flow analysis framework improves the precision of activity analysis and significantly reduces memory requirements for the automatically differentiated versions of some parallel benchmarks, including some of the NAS Parallel Benchmarks.
B. Norris and I. Veljkovic, "Performance Monitoring and Analysis Components in Adaptive PDE-Based Simulations," Preprint ANL/MCS-P1221-0105, January 2005. Newton-Krylov nonlinear and linear solver components for performance monitoring using the TAU performance tools.  To reduce the performance monitoring and component adaptation overhead, we employ two databases.  The first is created and destroyed during runtime and stores performance data for code segments of interest, as well as various application-specific performance events in the currently running application instance.  The second database is persistent and contains performance data from various applications and different instances of the same application.  It can also contain performance information derived through offline analysis of raw data.  We describe a prototype implementation of this infrastructure and show how adaptive linear solver algorithms are employed in a driven cavity flow simulation code.
   
   
A. P. Craig, R. L. Jacob, B. Kauffman, T. Bettge, J. Larson, E. Ong, C. Ding, Y. He, "Cp16: The New Extensible, High-Performance Parallel Coupler for the Community Climate System Model," Preprint ANL/MCS-P1222-0205, February 2005. Coupled climate models are large, multiphysics applications designed to simulate the Earth's climate and predict the response of the climate to any changes in the forcing or boundary conditions.  The Community Climate System Model (CCSM) is a widely used state-of-the-art climate model that has released several versions to the climate community over the past ten years.  Like many climate models, CCSM employs a coupler, a functional unit that coordinates the exchange fo data between parts of the climate system such as the atmosphere and ocean.  This paper describes the new coupler, cp16, contained in the latest version of CCSM, CCSM3.  Cp16 introduces distributed-memory parallelism to the coupler, a class library for important coupler functions, and a strandardized interface for component models.  Cp16 is implemented entirely in Fortran90 and uses the Model Coupling Toolkit as the base for most of its classes.  Cp16 gives improved performance over previous versions and scales well on multiple platforms.
A. Zagaris, H. G. Kaper, T. J. Kaper, "Two Perspectives on Reduction of Ordinary Differential Equations," Preprint ANL/MCS-P1223-0205, February 2005. This article is concerned with general nonlinear evolution equations x' = g(x) in RN involving multiple time scales, where fast dynamics take the orbits close to an invariant low-dimensional manifold and slow dynamics take over as the state approaches the manifold.  Reduction techniques offer a systematic way to identify the slow manifold and reduce the original equation to an autonomous equation on the slow manifold.  The focus in this article is on two particular reduction techniques, namely, computational singular perturbation (CSP) proposed by Lam and Goussis [Twenty-Second Symposium (International) on Combustion, The University of Washington, Seattle, Washington, August 14-19, 1988, (The Combustion Institute, Pittsburgh, 1988), pp. 931-941] and the zero-derivative principle (ZDP) proposed recently by Gear and Kevrekidis [Constraint-defined Manifolds: A Legacy-code Approach to Low-dimensional Computation, SIAM J. Sci. Comp., to appear].  It is shown that the tangent bundle to the state space offers a unifying framework for CSP and ZDP.  Both techniques generate coordinate systems in the tangent bundle that are natural for the approximation of the slow manifold.  Viewed from this more general perspective, both CSP and ZDP generate, at each iteration, approximate normal forms for the system under examination.
E. Lusk, N. Desai, R. Bradshaw, A. Lusk, R. Butler, "An Interoperability Approach to System Software, Tools, and Libraries for Clusters," Preprint ANL/MCS-P1224-0205, February 2005. Systems software for clusters typically derives from a multiplicity of sources: the kernel itself, software associated with a particular distribution, site-specific purchased or open-source software, and assorted home-grown tools and procedures that attempt glue everything together to meet the needs of the users and administrators of a particular cluster.  Whether a cluster is a general-purpose resource serving multiple users or dedicated to a single application, getting everything to work together is a challenge.  The challenge is partially met by special software distributions for clusters such as OSCAR or ROCKS.  Here we discuss another approach (although it is not inconsistent with existing distributions), in which a small number of concepts are deployed to facilitate the customized integration of various software tools for cluster management, operation, and user jobs.  The concepts include (1) a component approach to basic system software such as schedulers, queue managers, process managers, and monitors; (2) a software development kit for constructing networks of system software components, either from scratch or by wrapping ``foreign" software; and  (3) the use of explicit parallelism in building system tools for high performance.  We illustrate this approach with a description of a mid-sized general-purpose cluster operated entirely by software built this way.
R. Jacob, J. Larson, and E. Ong, "MxN Communication and Parallel Interpolation in CCSM3 Using the Model Coupling Toolkit," Preprint ANL/MCS-P1225-0205, Feburary 2005. The Model Coupling Toolkit (MCT) is a software library for constructing parallel coupled models from individual parallel models.  MCT was created to address the challenges of creating a parallel coupler for the Community Climate System Model (CCSM).  Each of the submodels that make up CCSM is a separate parallel application with its own domain decomposition, running on its own set of processors.  This application contains multiple instances of the MxN problem, the problem of transferring data between two parallel programs running on disjoint sets of processors.  CCSM also requires efficient data transfer to facilitate its interpolation algorithms.  MCT was created as a generalized solution to handle these and other common functions in parallel coupled models.  Here we describe MCT's implementation of the data transfer infrastructure needed for a parallel coupled model.  The performance of MCT scales satisfactorily as processors are added to the system.  However, the types of decompositions used in the submodels can affect performance.  MCT's infrastructure provides a flexible and high-performing set of tools for enabling interoperability between parallel applications.
U. Naumann and J. Utke, "Source Templates for the Automatic Generation of Adjoint Code through Static Call Graph Reversal," Preprint ANL/MCS-P1226-0205, February 2005. We present a new approach to the automatic generation of adjoint codes using automatic differentiation by source transformation.  Our method relies on static checkpointing techniques applied to an extended version of the program's call graph.  A code template is provided to implement a control structure governing the execution of the adjoint and augmented forward versions of each subroutine in the program.  These code variants are generated automatically by algorithms that are independent of the programming language of the original code.  The major advantage of this new approach is its flexibility with respect to various reversal schemes.
D. Sulakhe, A. Rodriguez, M. D'Souza, M. Wilde, V. Nefedova, I. Foster, N. Maltsev, "GNARE: An Environment for Grid-Based High-Throughput Genome Analysis," Proc. CCGRID05, IEEE, pp. 455-462, 2005.Also Preprint ANL/MCS-P1227-0205, February 2005.

Recent progress in genomics and experimental biology has brought exponential growth of the biological information available for computational analysis in public genomics databases. However, applying the potentially enormous scientific value of this information to the understanding of biological systems requires computing and data storage technology of an unprecedented scale. The Grid, with its aggregated and distributed computational and storage infrastructure, offers an ideal platform for high-throughput bioinformatics analysis. To leverage this platform, we have developed the Genome Analysis Research Environment (GNARE) – a scalable computational system for the high-throughput analysis of genomes, which provides an integrated database and computational backend for data-driven bioinformatics applications. GNARE efficiently automates the major steps of genome analysis, including acquisition of data from multiple genomic databases; data analysis by a diverse set of bioinformatics tools; and storage of results and annotations.

High-throughput computations in GNARE are performed by using distributed heterogeneous Grid computing resources such as Grid2003, TeraGrid, and the DOE Science Grid. Multistep genome analysis workflows involving massive data processing, the use of application-specific tools and algorithms, and updating of an integrated database to provide interactive Web access to results are all expressed and controlled by a “virtual data” model that transparently maps computational workflows to distributed Grid resources.

This paper describes how Grid technologies such as Globus, Condor, and the Gryphyn Virtual Data System were applied in the development of GNARE. We focus on our approach to Grid resource allocation and to the use of GNARE as a computational framework for developing bioinformatics applications.

H. Lee, M. Hereld, R. Stevens, W. van Drongelen, "Epileptiform Activity Patterns in Coupled Neuronal Networks, Preprint ANL/MCS-P1228-0205, February 2005.

We evaluate occurrence of seizure-like activity in computational network models based on the histology of mammalian cortex. In the simulations, seizure onset activity in a network (active patch) is connected to a larger patch of cells (passive patch). The following scenarios were investigated: (1) the active- and passive patches are contiguous with all connectivity intact; (2) active and passive populations are separated (≥1mm interdistance) with long-range connectivity dominating between them.  We find that linking the active patch with a passive patch of sufficient excitatory susceptibility reduces pathological network bursting behavior otherwise present in the active patch, even over a separation of several mm.  In addition, the balance of excitatory and inhibitory strength in the passive patch plays a crucial role in its ability to restrain network bursting in the active patch: unexpectedly, if the activity in the passive patch is too low, then the active patch can drive network bursting in both patches.

M. Hereld, R. Stevens, J. Teller, W. van Drongelen, H. Lee, "Large Neural Simulations on Large Parallel Computers," Preprint ANL/MCS-P1229-0205, February 2005. Simulations of biologically realistic neurons in large densely connected networks pose many problems to application programmers, particularly on distributed memory computers.  We discuss simulations of hundreds of thousands to millions of cells in a model of neocortex in the context of new computing platforms with many tens of thousands to hundreds of thousands of processing elements.  We are developing a performance model for this simulation so that we can gauge its performance on these platforms in terms of memory usage, time to setup and execute the simulation, and to estimate the practical limits to the size and simulation timescale available to us in our simulation experiments.  Recent results from runs on a BlueGene/L computer, which could ultimately scale to over one hundred thousand processors, are described.
J. Utke, U. Naumann, M. Fagan, N. Tallent, M. Strout, P. Heimbach, C. Hill, and C. Wunsch, "OpenAD/F: A Modular, Open-Source Tool for Automatic Differentiation of Fortran Codes," Preprint ANL/MCS-P1230-0205, February 2005. The OpenAD/F tool allows the evaluation of derivatives of functions defined by a Fortran program.  The derivative evaluation is performed by a Fortran code resulting from the analysis and transformation of the original program that defines the function of interest.  OpenAD/F has been designed with a particular emphasis on modularity, flexibility, and the use of open source components.  While the code transformation follows the basic principles of automatic differentiation, the tool implements new algorithmic approaches at various levels, for example, for basic block preaccumulation and call graph reversal.  Unlike most other automatic differentiation tools, OpenAD/F uses components provided by the Open=AD framework, which supports a comparatively easy extension of the code transformations in a language-independent fashion.  It uses code analysis results implemented in the OpenAnalysis component.  The interface to the language-independent transformation engine is an XML-based format, specified through an XML schema.  The implemented transformation algorithms allow efficient derivative computations utilizing locally optimized cross-country sequences of vertex, edge, and face elimination steps.  Specifically, for the generation of adjoint codes, OpenAD/F supports various code reversal schemes with hierarchical checkpointing at the subroutine level.  As an example from geophysical fluid dynamics a nonlinear time-dependent scalable, yet simple, barotropic ocean model is considered.  OpenAD/F's reverse mode is applied to compute sensitivities of some of the model's transport properties with respect to gridded fields such as bottom topography as independent (control) variables.
K. Keahey, I. Foster, T. Freeman, X. Zhang, and D. Galron, "Virtual Workspaces in the Grid," Preprint ANL/MCS-P1231-0205, February 2005. This paper introduces the concept of the virtual workspace, a configurable execution environment that can be used to describe, dynamically deploy, and manage a customizable execution environment in the Grid.  We describe workspaces, show how they fit into the Grid architecture, and discuss our initial experiences using this system with applications.
R. Thakur, W. Gropp, B. Toonen, "Optimizing the Synchronization Operations in MPI One-Sided Communication," Preprint ANL/MCS-P1232-0205, February 2005. One-sided communication in MPI requires the use of one of three different synchronization mechanisms, which indicate when the one-sided operation can be started and when the operation is completed.  Efficient implementation of the synchronization mechanisms is critical to achieving good performance with one-sided communication.  Our performance measurements, however, indicate that in many MPI implementations, the synchronization functions add significant overhead, resulting in one-sided communication performing much worse than point-to-point communication for short- and medium-sized messages.  In this paper, we describe our efforts to minimize the overhead of synchronization in our implementation of one-si9ded communication in MPICH2.  We describe our optimizations for all three synchronization mechanisms defined in MPI: fence, post-start-complete-wait, and lock-unlock.  Our performance results demonstrate that, for short messages, MPICH2 performs six times faster than LAM for fence synchronization and 50% faster for post-start-complete-wait synchronization, and it performs more than twice as fast as Sun MPI for all three synchronization methods.
M. Min, "Disctoninuous Galerkin Method Based on Quadrilateral Mesh for Maxwell's Equations," Preprint ANL/MCS-P1233-0205, February 2005. Discontinuous Galerkin (DG) method based on quadrilateral spectral element discretizations are applied to the electromagnetic wave time-domain simulations in free space.  The 2D Maxwell's equations in transverse-magnetic mode are described in conservation form.  Numerical flux is used for the communication at the interface between elements and boundary condition.  Computational results on the field distribution are demonstrated, including h- and p-convergence in maximum norm with this method.  This work is our first step toward two- and three-dimensional nanophotonic simulations using this higher order method.
L. Yang, J. M. Schopf, and I. Foster, "Improving Parallel Data Transfer Times Using Predicted Variances in Shared Networks," Preprint ANL/MCS-P1234-0305, March 2005. It is increasingly common to use multiple distributed storage systems as a single data store within which large datasets may be replicated.  Thus, we face the problem of how to access replicated data efficiently.  Multiple-source parallel transfers can reduce access times by transferring data from several replicas in parallel.  However, we then face the problem of deciding which data to fetch from which replicas.  We propose a Tuned Conservative scheduling technique that uses predicted means and variances for network performance to make data selection decisions.  This stochastic scheduling technique adjusts the amount of data fetched on a link according to not only the link performance but the expected variance in that performance.  We incorporate our technique into the striped GridFTP server from the Globus Toolkit, and demonstrate that the technique can produce data transfer times that are significantly faster and less variable than those of other techniques.
R. Ross, R. Latham, W. Gropp, R. Thakur, B. Toonen, "Implementing MPI-IO Atomic Mode Without File System Support," Preprint ANL/MCS-P1235-0305, March 2005. The ROMIO implementation of the MPI-IO standard provides a portable infrastructure for use on top of any number of different underlying storage targets.  These different targets vary widely in their capabilities, and in some cases, additional effort is needed within ROMIO to support the complete MPI-IO semantics.  One aspect of the interface that can be problematic to implement is the MPI-IO atomic mode.  This mode requires enforcing strict consistency semantics.  For some file systems, native locks may be used to enforce these semantics, but not all file systems have lock support.  In this work, we describe two algorithms for implementing efficient mutex locks using MPI-1 and MPI-2 capabilities.  We then show how these algorithms may be used to implement a portable MPI-IO atomic mode for ROMIO.  We evaluate the performance of these algorithms and show that they impose little additional overhead on the system.  Because of the low-overhead nature of these algorithms, they are likely useful in a variety of situations where distributed locks are needed in the MPI-2 environment.
H. G. Kaper and S. Tipei, "DISSCO: A Unified Approach to Sound Synthesis and Composition," Preprint ANL/MCS-P1236-0305, March 2005. DISSCO (Digital Instrument for Sound Synthesis and Composition) represents a unified and comprehensive approach to sound synthesis and composition---unified in the sense that its components share a common formal approach and use similar tools, comprehensive in the sense that they deliver a final product (a musical "event") that does not require further processing.  DISSCO consists of two parts: LASS, a C++ Library for Additive Sound Synthesis, and CMOD, a C++ Composition Module.  Release 1.0 of DISSCO is available as open-source software.  This article discusses some underlying ideas of music composition and sound synthesis and describes implementation details in the context of LASS and CMOD.
M. Hereld and R. L. Stevens, "Pixel-Aligned Warping for Multi-Projector Tiled Displays," Preprint ANL/MCS-P1237-0305, March 2005. Multi-projector tiled displays offer a scalable path to high-resolution display systems, often in a large format.  Such displays have been employed in a range of applications from scientific visualization to collaborative virtual environments.  Images projected onto these systems have always looked less sharp than expected owing to the image warping that is required to achieve geometrical alignment of the partial images to form the seamless whole.  Similar degradation of image quality accompanies keystone correction techniques and image warping onto non-planar surfaces used in single projector applications.  In this paper we discuss the underlying cause of the degradation of apparent image resolution and we introduce a new class of warping techniques, called pixel-aligned warping, that tend to preserve image sharpness at the expense of strict adherence to underlying geometrical constraints that define the desired warp.  We demonstrate how such techniques can be used to significantly improve the apparent sharpness of the final image.
T. Disz, M. Kubal, R. Olson, R. Overbeek, R. Stevens, "Challenges in Large Scale Distributed Computing: Bioinformatics," Preprint ANL/MCS-P1238-0305, March 2005. The amount of genomic data available for study is increasing at a rate similar to that of Moore's Law.  This deluge of data is challenging bioinformaticians to develop newer, faster, and better algorithms for analysis and examination of this data.  The growing availability of large scale computing grids coupled with high-performance networking is challenging computer scientists to develop better, faster methods of e3xploiting parallelism in these biological computations and deploying them across computing grids. In this paper, we describe two computations that are required to be run frequently and which require large amounts of computing resource to complete in a reasonable time.  The data for these computations are very large and the sequential computational time can exceed thousands of hours.  We show the importance and relevance of these computations, the nature of the data and parallelism and we show how we are meeting the challenge of efficiently distributing and managing these computations in the SEED project.
L. F. Diachin, P. Knupp, T. Munson, S. Shontz, "A Comparison of Two Optimization Methods for Mesh Quality Improvement," Preprint ANL/MCS-P1239-0305, March 2005. We compare inexact Newton and coordinate descent optimization methods for improving the quality of a mesh by repositioning the vertices, where the overall quality is measured by the harmonic mean of the mean-ratio metric.  The effects of problem size, element size, heterogeneity, and various vertex displacement schemes on the performance of these algorithms are assessed for a series of tetrahedral meshes.
L. yang, J. M. Schopf, and I. Foster, "Improving Parallel Data Transfer Times Using Predicted Variances in Shared Networks," Preprint ANL/MCS-P1240-0305, March 2005. It is increasingly common to use multiple distributed storage systems as a single data store within which large datasets may be replicated.  Thus, e face the problem of how to access replicated data efficiently.  Multiple-source parallel transfers can reduce access times by transferring data from several replicas in parallel.  However, we then face the problem of deciding which data to fetch from which replicas.  We propose a Tuned Conservative scheduling technique that uses predicted means and variances for network performance to make data selection decisions.  This stochastic scheduling technique adjusts the amount of data fetched on a link according to not only the link performance but the expected variance in that performance.  We incorporate our technique into the striped GridFTP server from the Globus Toolkit, and demonstrate that the technique can produce data transfer times that are significantly faster and less variable than those of other techniques.
J. N. Lyness, "Extrapolation Expansions for Hanging-Chad-type Galerkin Integrals with Plane Triangular Elements," Preprint ANL/MCS-P1241-0305, March 2005. Applications of three-dimensional Galerkin boundary element methods require the numerical evaluation of many four-dimensional integrals.  In this paper we explore the possibility of using extrapolation quadrature.  To do so, one needs appropriate error functional expansions.  The treatment here is limited to integration over a region T1 × T2, where T1 and T2 are planar triangular elements in a hanging-chad configuration; that is, they have one vortex in common but are otherwise disjoint.  We derive error expansions for product trapezoidal rules valid for integrands having an |r12|-1 factor.  This factor gives rise to a weak singularity at the common vertex.
M. Anitescu, P. Tseng, S. Wright, "Elastic-Mode Algorithms for Mathematical Programs with Equilibrium Constraints: Global Convergence and Stationarity Properties," Preprint ANL/MCS-P1242-0405, April 2005. The elastic-mode formulation of the problem of minimizing a nonlinear function subject to equilibrium constraints has appealing local properties in that, for a finite value of the penalty parameter, local solutions satisfying first- and second-order necessary optimality conditions for the original problem are also first- and second-order points of the elastic-mode formulation.  Here we study global convergence properties of methods based on this formulation, which involve generating an (exact or inexact) first- or second-order point of the formulation, for nondecreasing values of the penalty parameter.  Under certain regularity conditions on the active constraints, we establish finite or asymptotic convergence to points having a certain stationarity property (such as strong stationarity, M-stationarity, or C-stationarity).  Numerical experience with these approaches is discussed.  In particular, our analysis and the numerical evidence show that exact complementarity can be achieved finitely even when the elastic-mode formulation is solved inexactly.
S. Leyffer and T. Munson, "Solving Multi-Leader-Follower Games," Preprint ANL/MCS-P1243-0405, April 2005. Multi-leader-follower games arise when modeling competition between two or more dominant firms and lead in a natural way to equilibrium problems with equilibrium constraints (EPECs).  We examine a variety of nonlinear optimization and nonlinear complementarity formulations of EPECs.  We distinguish two broad cases: problems where the leaders can cost-differentiate and problems with price-consistent followers.  We demonstrate the practical viability of our approach by solving a range of medium-sized test problems.
J. N. Lyness and T. Sorevik, "Five-Dimensional K-Optimal Lattice Rules," Preprint ANL/MCS-P1244-0105, January 2005. A major search program is described that has been used to determine a set of five-dimensional K-optimal lattice rules of enhanced trigonometric degrees up to 12.  The program involved a distributed search, in which approximately 190 CPU years were shared between more than 1,400 computers in many parts of the world.
D. Rosenberg, A. Fournier, P. Fischer, and N. Pouquet, "Geophysical-Astrophysical Spectral-Element Adaptive Refinement (GASpAR): Object-oriented h-adaptive code for geophysical fluid dynamics simulation," Preprint ANL/MCS-P1245-0405, April 2005. We present an object-oriented geophysical and astrophysical spectral-element adaptive refinement (GASpAR) code for application to turbulent flows.  Like most spectral-element codes, GASpAR combines finite-element efficiency with spectral-method accuracy.  It is also designed to be flexible enough for a range of geophysics and astrophysics applications where turbulence or other complex multiscale problems arise.  For extensibility and flexibility the code is designed in an object-oriented manner.  The computational core is based on spectral-element operators, which are represented as objects.  The formalism accommodates both conforming and non-conforming elements and their associated data structures for handling interelement communications in a parallel environment.  Many aspects of this code are a synthesis of existing methods; however, we focus on a new formulation of dynamic adaptive refinement (DARe) of nonconforming h-type.  This paper presents the code and its algorithms, we do not consider parallel efficiency metrics or performance.  As a demonstration of the code we offer several two-dimensional test cases that we propose as standard test problems for comparable DARe codes.  The suitability of these test problems for turbulent flow simulation is considered.
M. Anitescu, "Mathematical Programs with Equilibrium Constraints and Applications to Control," Preprint ANL/MCS-P1247-0405, April 2005. We discuss recent advances in mathematical programs with equilibrium constraints (MPECs).  We describe the challenges posed by these problems and the current algorithmic solutions.  We emphasize in particular the use of the elastic mode approach.  We also present initial investigations in applications of MPECs to control problems.
J. M. Schopf, M. D'Arcy, N. Miller, L. Pearlman, I. Foster, C. Kesselman, "Monitoring and Discovery in a Web Services Framework: Functionality and Performance of the Globus Toolkit's MDS4," Preprint ANL/MCS-P1248-0405, April 2005.

The Globus Toolkit’s Monitoring and Discovery System, (MDS) defines and implements mechanisms for service and resource discovery and monitoring in distributed environments. We introduce here MDS4, the new monitoring and discovery system component included in Globus Toolkit version 4. MDS4 is distinguished from previous similar systems by its extensive use of interfaces and behaviors defined in the new WS-Resource Framework and WS-Notification specifications, and by its deep integration into essentially every component of the Globus Toolkit. We describe the MDS4 architecture and the relevant Web Service interfaces and behaviors to allow users to discover resources and services, monitor resource and service states, receive updates on current status, and visualize monitoring results. We also describe how MDS4 can be used to implement large-scale distributed monitoring and distributed systems, and present preliminary experimental results that provide insights into the performance that can be achieved via the use of these mechanisms.

I. Foster, "Service-Oriented Science," Preprint ANL/MCS-P1249-0405, April 2005. New information architectures enable new approaches to publishing and accessing valuable data and programs.  So-called service-oriented architectures define standard interfaces and protocols that allow developers to encapsulate information tools as services that clients can access without knowledge of, or control over, their internal workings.  Thus, tools formerly only accessible to the specialist can be made available to all, previously manual data processing and analysis tasks can be automated by having services access services.  Such service-oriented approaches to science are already being applied successfully, in some cases at substantial scales, but significant further effort is required before these approaches are applied routinely across many disciplines.  Grid technologies can accelerate the development and adoption of service-oriented science, by enabling a separation of concerns between discipline-specific content and domain-independent software and hardware infrastructure.
K. Amin, G. von Laszewski, M. Hategan, R. Al-Ali, O. Rana, D. Walker, "An Abstraction Model for a Grid Execution Framework," Preprint ANL/MCS-P1250-0405, April 2005. Computational Grids have been identified as one of the paradigms revolutionizing the discipline of distributed computing.  The contributions within the Grid community have resulted in new Grid technologies and continuous improvements to Grid standards and protocols.  Though crucial to the success of the Grid approach, such an incremental evolution of Grid standards has become a primary cause of frustration for scientific and commercial application communities aspiring to adopt the Grid paradigm.  Motivated by our rich experience and the need to decouple the application development and the Grid technology development processes, we propose an abstraction-based Grid middleware layer as part of the Java CoG Kit.  In this paper, we showcase our abstraction model and verify its extensibility by integrating it with an advanced quality-of-service-based execution framework. 
M. Anitescu and W. J. Layton, "Sensitivities in Large Eddy Simulation and Improved Estimates of Turbulent Flow Functionals," SIAM J. Sci. Comput. 29, no. 4 (2007), pp. 1650-1667. Also Preprint ANL/MCS-P1251-0505, May 2005. We consider a prototypical problem: simulate velocity and pressure in a turbulent flow using large eddy simulation (LES) and then use the results to estimate the force the underlying turbulent flow exerts on an immersed body.  For eddy viscosity-type LES models we develop the appropriate continuous sensitivity equation and show how it can improve the functional estimate by using the sensitivity with respect to the user-selected length scale to incorporate the effects of the underlying unresolved small-scale fluctuations on the functionals sought.
G. von Laszewski, "The Grid-Idea and Its Evolution," Preprint ANL/MCS-P1253-0505, May 2005. In this paper we review the essence of the Grid-Idea.  Specifically, we explore the changing definition of the Grid and follow its evolution over the past decade.  This evolution is motivated by the gradual expansion of management issues that must be addressed to make production Grids a reality and to meet user requirements for increased functionality.  Additionally, we focus on the evolutionary p0ath of the Globus Toolkit taken to address the increasing needs of the community.  We also discuss the evolutionary inclusion of commodity technologies as illustrated by the Java Commodity Grid Project.
C. S. Verma, P. F. Fischer, S. E. Lee, and F. Loth, "An All-Hex Meshing Strategy for Bifurcation Geometries in Vascular Flow Simulation," Preprint ANL/MCS-P1254-0505, May 2005. We develop an automated all-hex meshing strategy for bifurcation geometries arising in subject-specific computational hemodynamics modeling.  The key components of our approach are the use of a natural coordinate system, derived from solutions to Laplace's equation, that follows the tubular vessels (arteries, veins, or grafts) and the use of a tripartitioned-based mesh topology that leads to balanced high-quality meshes in each of the branches.  The method is designed for situations where the required number of hexahedral elements is relatively small (~ 1000-4000), as is the case when spectral elements are employed in simulations at transitional Reynolds numbers or when finite elements are employed in viscous dominated regimes.
M. M. Strout, J. Mellor-Crummey, P. Hovland, "Representation-Independent Program Analysis," Preprint ANL/MCS-P1255-0605, June 2005. Program analysis has many applications in software engineering and high-performance computation, such as program understanding, debugging, testing, reverse engineering, and optimization.  A ubiquitous compiler infrastructure does not exist; therefore, program analysis is essentially reimplemented for each compiler infrastructure.  The goal of the OpenAnalysis toolkit is to separate analysis from the intermediate representation (IR) in a way that allows the orthogonal development of compiler infrastructures and program analysis.  Separation of analysis from specific IRs will allow faster development of compiler infrastructures, the ability to share and compare analysis implementations, and in general quicker breakthroughs and evolution in the area of program analysis.  This paper presents how we are separating analysis implementations from IRs with analysis-specific, IR-independent interfaces.  Analysis-specific IR interfaces for alias/pointer analysis algorithms and reaching constants illustrate that an IR interface designed for language dependence is capable of providing enough information to support the implementation of a broad range of analysis algorithms and also represent constructs within many imperative programming languages.
G. von Laszewski and D. Kodeboyina, "A Repository Service for Grid Workflow Components," Preprint ANL/MCS-P1256-0605, June 2005. As part of the Java CoG Kit we have defined a sophisticated workflow framework.  This workflow framework projects an integrated approach towards executing tasks in Grid and non-Grid environments.  One of the services needed is a convenient service to store, retrieve, and modify workflow components defined by the community similar to systems such as the comprehensive perl archive network.  The availability of such a service will not only allow the definition of components useful for the greater grid community, but it will also be possible that it can be reused to support dynamically changing workflows managed by collaborative groups.  In this paper, we present a simple extensible framework to design, build, and deploy a workflow repository service.  This repository is intended to be used in ad-hoc Grids or in community Grids.
K. Amin and G. von Laszewski, "Quality Assured Ad Hoc Grids," Preprint ANL/MCS-P1257-0605, June 2005. This paper presents an integrated architecture for ad hoc Grids developed within the Java CoG Kit project.  It provides an overview of the key component frameworks that collectively build the ad hoc Grid architecture.  Further. it outlines a formal model that can be formally evaluated.  The paper also presents an enhancement to the Java CoG Kit to address requirements posed by ad hoc Grids.  It integrates into the Java CoG Kit commodity technologies such as Jxta and ClassAds.
D. E. Stewart and M. Anitescu, "Optimal Control of Systems with Discontinuous Differential Equations," Preprint ANL/MCS-P1258-0605, June 2005. In this paper we discuss the problem of verifying and computing optimal controls of systems whose dynamics is governed by differential systems with discontinuous right hand side.  In our work, we are motivated by optimal control of mechanical systems with Coulomb friction, which exhibit such right-hand side.  Notwithstanding the impressive development of nonsmooth and set-valued analysis, these systems have not been closely studied either computationally or analytically.  First, we show that even when the solution crosses and does not stay on the discontinuity, differentiating the results of a simulation gives gradients that have errors of a size independent of the step-size.  This means that the strategy of "optimize the discretization" will usually fail for problems of this kind.  We approximate the discontinuous right-hand side for the differential equations or inclusions by a smooth right-hand side.  For these smoothed approximations, we show that the resulting gradients approach the true gradients provided the start and end points of the trajectory do not lie on the discontinuity, and that using Euler's method where the step size is "sufficiently small" in comparison with the smoothing parameter.  Numerical results are presented for a crude model of car racing which involves Coulomb friction and slip showing that this approach is practical and can handle problems of considerable complexity.
G. von Laszewski and M. Hategan, "Workflow Concepts of the Java CoG Kit," J. of Grid Comput. 3, nos. 304, Sept. 2005.  Also Preprint ANL/MCS-P1259-0605, June 2005. Many scientific simulations and experiments require the coordination of numerous tasks posed by interdisciplinary research teams.  Grids can provide access to the necessary high-end resources to conduct such tasks.  The complex tasks and their interactions must be supported through convenient tools.  To address this issue, we introduce a number of Grid abstractions that make the development of Grid middleware-independent tools possible and allow for the integration of a number of commodity tools.  Our vision is implemented through an integrated approach based on a layered architecture that bridges the gap between Grid middleware and scientific applications.  Our abstractions include specialized services, a Grid workflow engine and language, and Grid faces---graphical abstractions that can be employed in science portals and standalone applications.
T. Munson, "Optimizing the Quality of Mesh Elements," Preprint ANL/MCS-P1260-0605, June 2005. Discretization methods are common techniques for computing approximate solutions to partial differential equations.  These methods decompose the given domain into a finite set of elements, triangles or tetrahedrons, for example, to produce a mesh used within the approximation scheme.  Several factors affect the accuracy of the solution obtained: the degree of the approximation scheme and the number of elements in the mesh, and the quality of the mesh.  Optimizing the quality of the mesh prior to computing the approximate solution can improve the condition number of the linear systems solved, reduce the time taken to compute the solution, and increase the numerical accuracy.
Y. Zhao, M. Wilde, I. Foster, J. Voeckler, J. Dobson, T. Jordan, E. Quigg, "Virtual Data Grid Middleware Services for Data-Intensive Science," Preprint ANL/MCS-P1261-0605, June 2005. The GriPhyN Virtual Data System provides a suite of components and services for data-intensive sciences that enables scientists to systematically and efficiently describe, discover, and share large scale data and computation resources.  We describe the design and implementation of such middleware services in terms of a virtual data system interface called Chiron, and present virtual data integration examples from the QuarkNet education project and from functional-MRI-based neuroscience research.  The Chiron interface also serves as an online "educator" for virtual data applications.
R. Ross, R. Thakur, A. Choudhary, "Achievements and Challenges for I/O in Computational Science," J. of Physics: Conference Series 16 (2005), pp. 501-509.  Also Preprint ANL/MC-P1262-0605, June 2005. The data access needs of computational science applications continue to grow, both in terms of the rates of I/O necessary to match compute capabilities, and in terms of the features required of I/O systems.  Particularly, wide-area access to data, and moving data between systems, has become a priority.  Key achievements in I/O for computational science have enabled many applications to effectively use I/O resources.  However, growing application requirements challenge I/O system developers to create solutions that will make I/O systems easier to use, improve performance, and increase the manageability.  In this work we outline the recent achievements and current status of I/O systems for computational science.  We then enumerate key challenges for I/O systems in the near future and discuss ongoing efforts that address these challenges.
H. Zhang, B. Smith, M. Sternberg, and P. Zapol, "SIPs: Shift-and-Invert Parallel Spectral Transformations," ACM TOMS 33, no. 2 (2007), pp. 1-19.  Also Preprint ANL/MCS-P1263-0605, June 2005. SIPs is a new efficient and robust software package implementing multiple shift-and-invert spectral transformations on parallel computers.  Built on top of SLEPc and PETSc, it can compute very large number of eigenpairs for sparse generalized Hermitian eigenvalue problems.  The development of SIPs is motivated by applications in nanoscale materials modeling, in which the growing size of the matrices and the pathological eigenvalue distribution challenge the efficiency and robustness of the solver.  In this paper, we develop a parallel eigevnalue algorithm that is based on the idea of distributed spectrum slicing.  We describe SIPs' object-oriented software design and implementation techniques, and demonstrate its numerical performance on an advanced distributed computer.
P. D. Hovland, B. Norris, M. M. Strout, S. Bhowmick, and J. Utke, "Sensitivity Analysis and Design Optimization through Automatic Differentiation," Preprint ANL/MCS-P1264-0605, June 2005. Automatic, or algorithmic, differentiation (AD) is a technique for transforming a program or subprogram that can be interpreted as computing a mathematical function, including arbitrarily complex simulation codes, into one that computes the derivatives of that function.  We describe the implementation and application of automatic differentiation tools.  We highlight recent advances in the combinatorial algorithms and compiler technology that underlie successful implementation of automatic differentiation tools.  We discuss applications of automatic differentiation in design optimization and sensitivity analysis.  We also describe ongoing research in the design of language-independent source transformation infrastructures and memory management for automatic differentiation algorithms.
L. Wos and M. Spinks, "The Arrival of Automated Reasoning," Preprint ANL/MCS-P1265-0605, June 2005. For some, the object of automated reasoning is the design and implementation of a program that offers sufficient power to enable one to contribute new and significant results to mathematics and to logic, as well as elsewhere.  One measure of success rests with the number and quality of the results obtained with the assistance of the program in focus.  A less obvious measure (heavily in focus here) rests with the ability of a novice, in the domain under investigation, to make significant contributions to one or more fields of science by relying heavily on a given reasoning program.  For example, if one who is totally unfamiliar with the area of study but skilled in automated reasoning can discover with an automated reasoning program impressive proofs, previously unknown axiom dependencies, and far more, then the field of automated reasoning has indeed arrived.  This article details such---how one novice, with much experience with W. McCune's program OTTER but no knowledge of the domains under investigation, obtained startling results in the study of areas of logic that include the BCSK logic and various extensions of that logic.  Among those results was the discovery of a variety weaker than has been studied from twhat we know, a variety that appears to merit serious study, as, for example, does the study of semigroups when compared with that of the study of groups.  A quite different result concerns the discovery of a most unexpected dependency in two extensions of the BCSK logic.
W. Gropp and R. Thakur, "An Evaluation of Implementation Options for MPI One-Sided Communication," Preprint ANL/MCS-P1266-0605, June 2005. MPI defines one-sided communication operations---put, get, and accumulate---together with three different synchronization mechanisms that define the semantics associated with the initiation and completion of these operations.  In this paper, we analyze the requirements imposed by the MPI Standard on any implementation of one-sided communication.  We discuss options for implementing the synchronization mechanisms and analyze the cost associated with each.  An MPI implementer can use this information to select the implementation method that is best suited (has the lowest cost) for a particular machine environment.  We also report on experiments that we ran on a Linux cluster and a Sun SMP to determine the gap between the performance that could be achievable and what is actually achieved with MPI
R. Thakur, R. Ross, R. Latham, "Implementing Byte-Range Locks Using MPI One-Sided Communication," Preprint ANL/MCS-P1267-0605, June 2005. We present an algorithm for implementing byte-range locks using MPI passive-target one-sided communication.  This algorithm is useful in any scenario in which multiple processes of a parallel program need to acquire exclusive access to a range of bytes.  One application of this algorithm is for implementing MPI-IO's atomic-access mode in the absence of atomicity guarantees from the underlying file system.  Another application is for implementing data sieving, a technique for optimizing noncontiguous writes by doing an atomic read-modify-write of a large, contiguous block of data.  This byte-range locking algorithm can be used instead of POSIX fcntl file locks on file systems that do not support fcntl locks, on file systems where fcntl locks are unreliable, and on file systems where fcntl locks perform poorly.  Our performance results demonstrate that the algorithm has low overhead and significantly outperforms fcntl locks on NFS file systems on a Linux cluster and on a Sun SMP.
P. F. Fischer, "Anistropic Diffusion in a Toroidal Geometry," J. Phys. Conf. Series 16 (2005), pp. 446-455.  Also Preprint ANL/MCS-P1268-0605, June 2005. As part of the Department of Energy's applications oriented SciDAC project, three model problems have been proposed by the Center for Extended Magnetohydrodynamics Modeling to test the potential of numerical algorithms for challenging magnetohydrodynamics (MHD) problems that are required for future fusion development.  The first of these, anisotropic diffusion in a toroidal geometry, is considered in this note.
N. Desai, E. Lusk, R. Bradshaw, "MPISH2: Unix Integration for MPI Programs," Preprint ANL/MCS-P1269-0705, July 2005. While MPI is the most common mechanism for expressing parallelism, MPI programs remain poorly integrated in Unix environments.  We introduce MPISH2, an MPI process manager analogous to serial Unix shells.  It provides better integration capabilities for MPI programs by providing a uniform execution mechanism for parallel and serial programs, exposing return codes and standard I/O stream information.
D. Buntinas and W. Gropp, "Designing a Common Communication Subsystem," Preprint ANL/MMCS-P1270-0705, July 2005. Communication subsystems are used in high-performance parallel computing systems to abstract the lower network layer.  By using a communication subsystem, an upper middleware library or runtime system can be more easily ported to different interconnects.  However, by abstracting the network layer, the designer will typically make the communication subsystem more specialized for that particular middleware library, and less general, making it ineffective for supporting middleware for other programming models.  In previous work we analyzed the requirements of various programming model middleware and the communication subsystems that support them.  We found that although there are no mutually exclusive requirement, none of the existing communication subsystems could efficiently support the programming model middleware we considered.  In this paper, we describe our design of a common communication subsystem, called CCS, that can efficiently support those programming model middleware.
R. Latham, R. Ross, R. Thakur, B. Toonen, "Implementing MPI-IO Shared File Pointers without File System Support," Preprint ANL/MCS-P1272-0705, July 2005. The ROMIO implementation of the MPI-IO standard provides a portable infrastructure for use on top of any number of different underlying storage targets.  These targets vary widely in their capabilities, and in some cases additional effort is needed within ROMIO to support all MPI-IO semantics.  The MPI-2 standard defines a class of file access routines that use a shared file pointer.  These routines require communication internal to the MPI-IO implementation in order to allow processes to atomically update this shared value.  We discuss a technique that leverages MPI-2 one-sided operations and can be used to implement this concept without requiring any features from the underlying file system.  We then demonstrate through a simulation that our algorithm adds reasonable overhead for independent accesses and very small overhead for collective accesses.
C. Falzone, A. Chan, E. Lusk, and W. Gropp, "Collective Error Detection for MPI Collective Operations," Preprint ANL/MCS-P1273-0705, July 2005. An MPI profiling library is a standard mechanism for intercepting MPI calls by applications.  Profiling libraries are so named because they are commonly used to gather performance data on MPI programs.  Here we present a profiling library whose purpose is to detect user errors in the use of MPI's collective operations.  While some errors can be detected locally (by a single process), other errors involving the consistency of arguments passed to MPI collective functions must be tested for in a collective fashion.  While the idea of using such a profiling library does not originate here, we take the idea further than it has been taken before (we detect more errors) and offer an open-source library that can be used with any MPI implementation.  We describe the tests carried out, provide some details of the implementation, illustrate the usage of the library, and present performance tests.
B. Allcock, A. Chervenak, I. Foster, C. Kesselman, M. Livny, "Data Grid Tools: Enabling Science on Big Distributed Data," Preprint ANL/MCS-P1274-0705, July 2005. A particularly demanding and important challenge that we face as we attempt to construct the distributed computing machinery required to support SciDAC goals is the efficient, high-performance, reliable, secure, and policy-aware management of large-scale data movement.  This problem is fundamental to diverse application domains including experimental physics (high energy physics, nuclear physics, light sources), simulation science (climate, computational chemistry, fusion, astrophysics), and large-scale collaboration.  In each case, highly distributed user communities require high-speed access to valuable data, whether for visualization or analysis.  The quantities of data involved (terabytes to petabytes), the scale of the demand (hundreds or thousands of users, data-intensive analyses, real-time constraints), and the complexity of the infrastructure that must be managed (networks, tertiary storage systems, network caches, computers, visualization systems) make the problem extremely challenging.  Data management tools developed under the auspices of the SciDAC Data Grid Middleware project have become the de facto standard for data management in projects worldwide.  Day in and day out, these tools provide the "plumbing" that allows scientists to do more science on an unprecedented scale in production environments.
W. Allcock, J. Bresnahan, R. Kattimuthu, M. Link, C. Dumitrescu, I. Raicu, I. Foster, "The Globus Striped GridFTP Framework and Server," Preprint ANL/MCS-P1275-0705, July 2005. The GridFTP extensions to the File Transfer protocol define a general-purpose mechanism for secure, reliable, high-performance data movement.  We report here on the Globus striped GridFTP framework, a set of client and server libraries designed to support the construction of data-intensive tools and applications.  We describe the design of both this framework and a striped GridFTP server constructed within the framework.  We show that this server is faster than other FTP servers in both single-process and striped configurations, achieving, for example, speeds of 27.3 Gbit/s memory-to-memory and 17 Gbit/s disk-to-disk over a 60 millisecond round trip time, 30 Gbit/s network.  In another experiment, we show that this server can support 1800 concurrent clients without excessive load.  We argue that this combination of performance and modular structure make the Globus GridFTP framework both a good foundation on which to build tools and applications, and a unique testbed for the study of innovative data management techniques and network protocols.
B. Norris, "Software Architecture Issues in Scientific Component Development," Preprint ANL/MCS-P1276-0705, July 2005. Commercial component-based software engineering practices, such as the CORBA component model, Enterprise JavaBeans, and COM, are well established in the business computing community.  These practices present an approach for managing the increasing complexity of scientific software development, which has motivated the Common Component Architecture (CCA), a component specification targeted at high-performance scientific application development.  The CCA is an approach to component development that is minimal in terms of the complexity of component interface requirements and imposes a minimal performance penalty.  While the lightweight specification has enabled the development of a number of high-performance scientific components in several domains, the software design process for developing component-based scientific codes is not yet well defined.  This fact, coupled with the fact that component-based approaches are still new to the scientific community, may lead to an ad hoc design process, potentially resulting in code that is harder to maintain, extend, and test and may negatively affect performance.  We explore some concepts and approaches based on widely accepted software architecture design principles and discuss their potential application in the development of high-performance scientific component applications.  We particularly emphasize those principles and approaches that contribute to making CCA-based applications easier to design, implement, and maintain, as well as enabling dynamic adaptivity with the goal of maximizing performance.
S. Leyffer, "The Return of the Active Set Method," Preprint ANL/MCS-P1277-0805, August 2005. For solving nonlinear optimization problems, two competing iterative approaches are available: active set methods and interior-point methods.  Current implementations of interior methods often outperform active set methods in terms of speed.  On the other hand, active set methods are more robust and better suited for warm starts, which are important for solving integer optimization problems.  Consequently, we have recently become interested in new active set approaches, which are reviewed in this note.
D. Negrut, R. Rampalli, G. Ottarsson, A. Sajdak, "On the use of the HHT Method in the Context of Index 3 Differential Algebraic Equations of Multibody Dynamics," Preprint ANL/MCS-P1278-0805, August 2005. The paper presents theoretical and implementation aspects related to a numerical integrator used for the simulation of large mechanical systems with flexible bodies and contact/impact.  The proposed method is based on the Hilber-Hughes-Taylor (HHT) implicit formula and is tailored to answer the challenges posed by the numerical solution of index 3 Differential Algebraic Equations that govern the time evolution of a multibody system.  One of the salient attributes of the algorithm is the good conditioning of the Jacobian matrix associated with the implicit integrator.  Error estimation, integration step-size control, and nonlinear system stopping criteria are discussed in detail.  The simulations of an engine model, a model with contacts, and a model with flexible bodies confirm a 2 to 3/, X speed-up factor when compared with the integrators currently available in MSC.ADAMS, a commercial mechanical system simulation package.
S. Bhowmick, D. Kaushik, L. McInnes, B. Norris, P. Raghavan, "Parallel Adaptive Solvers in Compressible PETSc-FUN3D Simulations," Preprint ANL/MCS-P1279-0805, August 2005. We consider parallel, three-dimensional transonic Euler flow using the PETSc-FUN3D application, which employs pseudo-transient Newton-Krylov methods.
D. Negrut, M. Anitescu, T. Munson, and P. Zapol, "Simulating Nanoscale Processes in Solids using DFT and the Quasicontinuum Method," Preprint ANL/MCS-P1280-0805, August 2005. A framework is proposed for the investigation of chemical and mechanical properties of nanostructures.  The methodology is based on a two-step approach to compute the electronic density distribution in and around a nanostructure, and then the equilibrium configuration of its nuclei.  The Electronic Problem embeds interpolation and coupled cross-domain optimization techniques through a process called electronic reconstruction.  In the second stage of the solution, the Ionic Problem  repositions the nuclei of the nanostructure given the electronic density in the domain.  The new ionic configuration is the solution of a nonlinear system based on a first-order optimality condition when minimizing the total energy associated with the nanostructure.  The overall goal is a substantial increase in the dimension of the nanostructures that can be simulated by using approaches that include accurate DFT computation.  This increase stems from the fact that during the solution of the Electronic Problem, expensive DFT calculations are limited to a small number of subdomains.  For the Ionic Problem, computational gains result from approximating the position of the nuclei in terms of a reduced number of representative nuclei following the quasicontinuum paradigm.
L. Yang, J. M. Schopf, C. L. Dumitrescu, and I. Foster, "Statistical Data Reduction for Efficient Application Performance Monitoring," Preprint ANL/MCS-P1281-0805, August 2005. There is a growing need for systems that can monitor and analyze app0lication performance data automatically in order to delivery reliable and sustained performance to applications.  However, the continuously growing complexity of high performance computer systems and applications makes this process difficult.  We introduce a statistical data reduction method that can be used to guide the selection of system metrics that are both necessary and sufficient to describe observed application behavior, thus reducing the instrumentation perturbation and data volume to be managed.  To evaluate our strategy, we applied it to one  CPU-bound Grid application using cluster machines and GridFTP data transfer in a wide area testbed.  A comparative study shows that our strategy produces better results than other techniques.  It can reduce the number of system metrics to be managed by about 80%, while still capturing enough information for performance predictions.
J. G. Kim, E. C. Hunke, and W. H. Lipscomb, "A Sensitivity Analysis and Parameter Tuning Scheme for Global Sea Ice Modeling," Preprint ANL/MCS-P1282-0805, August 2005. Automatic differentiation (AD) is used to perform a multiple parameter sensitivity analysis for the Los Alamos sea ice model (CICE).  Numerical experiments are run using 6-hourly, 1997 forcing data with a two-hour time step, and the AD-based sensitivity scheme is validated by comparison with derivatives calculated using the conventional finite difference approach.  Twenty-two thermodynamic and dynamic parameters are selected for simultaneous analysis.  Of these, the most important for controlling the simulated average sea ice thickness is ice density; albedos and emissivity predominate in summer, while ice thickness is most sensitive to the ice conductivity in winter.  The ice-ocean drag parameter and maximum ice salinity significantly affect the simulation year-round.  Gradient information computed buy the AD-based sea ice code is then used in an experiment designed to assess the efficacy of this technique for tuning the parameters against observational data.  Preliminary results, obtained with a bound-constrained minimization method and with simulated observational data, show that satisfactory convergence is obtained.
L. McInnes, B. Norris, I. Veljkovic, "Computational Quality of Service in Parallel CFD," Preprint ANL/MCS-P1283-0805, August 2005. This paper presents an overview of new infrastructure for automated performance gathering and analysis of high-performance components, which is a key facet of our CQoS research, with emphasis on using these capabilities in parallel CFD simulations, such as flow in a driven cavity and compressible Euler flow.
T. S. Munson and P. D. Hovland, "The FeasNewt Benchmark," Preprint ANL/MCS-P1284-0805, August, 2005. We describe the FeasNewt mesh-quality optimization benchmark.  The performance of the code is dominated by three phases---gradient evaluation, Hessian evaluation and assembly, and sparse matrix-vector products---that have very different mixtures of floating-point operations and memory access patterns. The code includes an optional runtime data- and iteration-reordering phase, making it suitable for research on irregular memory access patterns.  Mesh-quality optimization (or ``mesh smoothing'') is an important ingredient in the solution of nonlinear partial differential equations (PDEs) as well as an excellent surrogate for finite-element or finite-volume PDE solvers.
W. E. Allcock, "Programming with the Globus Toolkit GridFTP Client Library," Preprint ANL/MCS-P1285-0705, July 2005. The Globus Toolkit GridFTP client library hides the developer from the complexities of the protocol and makes client development relatively easy.  It is, however, an asynchronous programming model unfamiliar to many people.
K. Amin, G. von Laszewski, M. Sosonkin, A. R. Mikler, and M. Hategan, "Ad Hoc Grid Security Infrastructure," Preprint ANL/MCS-P1286-0905, September 2005. This paper describes an ad hoc Grid security infrastructure developed as a part of the Java CoG Kit project.  It supports several requirements specific to the sporadic nature of ad hoc Grids.  It focuses on identity management, identity verification, and authorization control in spontaneous Grid collaborations without pre-established policies or environments.  It adopts established community standards, with modifications where needed.  This paper also discusses the integratino of eh ad hoc Grid security infrastructure in an ad hoc Grid implementation.  The implementation supports secure collaboration in ad hoc Grids using commodity technologies such as the java CoG Kit, JXTA, GSI, and XACML.
R. Padmanabhan, W. McCune, and R. Veroff, "Lattice Laws Forcing Distributivity Under Unique Complementation," Preprint ANL/MCS-P1287-0905, September 2005. We give several new lattice identities valid in nonmodular lattices such that a uniquely complemented lattice satisfying any of these identities is necessarily Boolean.  Since some of these identities are consequences of modularity as well, these results generalize the classical result of Birkhoff and von Neumann that every uniquely complemented modular lattice is Boolean.  In particular, every uniquely complemented lattice in M V V(M5), the least nonmodular variety, is Boolean.
N.Desai, R.Bradshaw, S.Matott, S.Bittner, S.Coghlan, R.Evard, C.Lueninghoener, T.Leggett, J.-P.Navarro, G.Rackow, C.Stacey, T.Stacey, "A Case Study in Configuration Management Tool Deployment," Preprint ANL/MCS-P1288-0905, September 2005. While configuration management systems are generally regarded as useful, their deployment process is not well understood or documented.  In this paper, we present a case study in configuration management tool deployment.  We describe the motivating factors and both the technical considerations and the social issues involved in this process.  Our discussion includes an analysis of the overall effect on the system management model and the tasks performed by administrators.
S. J. Benson and Y. Ye, "DSDP5: Software for Semidefinite Programming," Preprint ANL/MCS-P1289-0905, September 2005. DSDP implements the dual-scaling algorithm for semidefinite programming.  The source code for this interior-point algorithm, written entirely in ANSI C, is freely available.  The solver can be used as a subroutine library, as a function within the Matlab environment, or as an executable that reads and writes to data files.  Initiated in 1997, DSDP has developed into an efficient and robust general-purpose solver for semidefinite programming.  Its features include a convergence proof with polynomially bounded worst-case complexity, primal and dual feasible solutions when they exist, certificates of infeasibility when solutions do not exist, initial points that can be feasible or infeasible, relatively low memory requirements for an interior-point method, sparse and low-rank data structures, extensibility that allows applications t customize the solver and improve its performance, a subroutine library that enables it to be linked to larger applications, scalable performance for large problems on parallel architectures, and a well-documented interface and examples of its use.  The package has been used in many applications and tested for efficiency, robustness, and ease of use.
S. Leyffer, "A Note on Multiobjective Optimization and Complementarity Constraints," Preprint ANL/MCS-P1290-0905, September 2005. We propose a new approach to convex nonlinear multiobjective optimization that captures the geometry of the Pareto set by generating a discrete set of Pareto points optimally.  We show that the problem of finding an optimal representation of the Pareto surface can be formulated as a mathematical program with complementarity constraints.  The complementarity constraints arise from modeling the set of Pareto points, and the objective maximizes some quality measure of this discrete set.  We present encouraging numerical experience on a range of test problem collected from the literature.
I. Foster, V. Nefedova, L. Liming, R. Ananthakrishnan, R. Madduri, L. Pearlman, O. Mulmo, M. Ahsant, "Streamlining Grid Operations: Definition and Deployment of a Portal-based User Registration Service," Preprint ANL/MCS-P1291-0905, September 2005.

Manual management of public key credentials can be a significant and often off-putting obstacle to Grid use, particularly for casual users. We describe the Portal-based User Registration Service (PURSE), a set of tools for automating user registration, credential creation, and credential management tasks. PURSE provides the system developer with a set of customizable components, suitable for portal integration, that can be used to address the full lifecycle of Grid credential management. We describe the PURSE design and its use in portals for two systems, the Earth System Grid data access system and the Swegrid computational Grid. In both cases, the user is entirely freed from the need to create or manage public key credentials, thus simplifying the Grid experience and reducing opportunities for error. We argue that this capturing of common use cases in a reusable “solution” can be a model for how Grid ease-of-use can be addressed in other domains as well.

 

S. J. Benson, "Parallel Semidefinite Programming and Combinatorial Optimization," Preprint ANL/MCS-P1292-0905, September 2005. The use of semidefinite programming in combinatorial optimization continues to grow.  This growth can be attributed to at least three factors:  new semidefinite relaxations that provide tractable bounds to hard combinatorial problems, algorithmic advances in the solution of semidefinite programs (SDP), and the emergence of parallel computing.
M. Yan, G. Leaf, H. Kaper, R. Camley, and M. Grimsditch, "Spin Wave Modes in a Cobalt Square Vortex," Preprint ANL/MCS-P1293-1005, October 2005. Spin wave modes in a thin submicron cobalt square with a closure domain structure are obtained by using a micromagnetic equation of motion approach.  In addition to modes with amplitude over the whole sample, some low-frequency modes, localized at the center, corners, and diagonals of the square, are also found.  In analogy with the modes found in a circular vortex, the nonlocalized modes can be broadly classified into radial-like and azimuthal-like modes, and their frequencies can be understood qualitatively in terms of the dispersion relation of spin wave modes of an unconfined film.  Other modes that can be interpreted as the combination of radial and azimuthal modes are also observed.
X. Zhang, J. L. Freschl, and J. M. Schopf, "Scalability Analysis of Three Monitoring and Information Systems: MDS2, R-GMA, and Hawkeye," Preprint ANL/MCS-P1294-1005, October 2005. Monitoring and information system (MIS) implementations provide data about available resources and services within a distributed system, or Grid.  A comprehensive performance evaluation of an MIS can aid in detecting potential bottlenecks, advise in deployment, and help improve future system development.  In this paper, we analyze and compare the performance of three implementations in a quantitative manner:  the Globus Toolkit® Monitoring and Discovery Service (MDS2), the European DataGrid Relational Grid Monitoring Architecture (R-GMA), and the Condor project's Hawkeye.  We use the NetLogger toolkit to instrument the main service components of each MIS and conduct four sets of experiments to benchmark their scalability with respect to the number of users, the number of resources, and the amount of data collected.  Our study provides quantitative measurements comparable across all systems.  We also find performance bottlenecks and identify how they relate to the design goals, underlying architectures, and implementation technologies of the corresponding MIS, and we present guidelines for deploying monitoring and information systems in practice.

 

 

M. G. Knepley and D. A. Karpeev, "Flexible Representation of Computational Meshes," Preprint ANL/MCS-P1295-1005, October 2005. A new representation of computational meshes is proposed in terms of a covering relation defined by discrete topological objects we call sieves.  Fields over a mesh are handled locally by using the notion of refinement, dual to covering, and are later reassembled.  In this approach fields are modeled by sections of a fiber bundle over a sieve.  This approach cleanly separates the topology of the mesh from its geometry and other value-storage mechanisms.  With these abstractions, finite element calculations are expressed using algorithms that are independent of mesh dimension, global topology, element shapes, and the finite element itself.  Extensions and other applications are discussed.
N. Maltsev, E. Glass, D. Sulakhe, A. Rodriguez, M. H. Syed, T. Bompada, Y. Zhang, and M. D'Souza, "PUMA2 - Grid-Based High-Throughput Analysis of Genomes and Metabolic Pathways," Nucleic Acids Research 34 (2006), pp. D369-D372.  Also Preprint ANL/MCS-P1296-1005, October 2005. The PUMA2 system (available at http://compbio.mcs.anl.gov/puma2) is an interactive, integrated bioinformatics environment for high-throughput genetic sequence analysis and metabolic reconstructions from sequence data. PUMA2 provides a framework for comparative and evolutionary analysis of genomic data and metabolic networks in the context of taxonomic and phenotypic information. Grid infrastructure is used to perform computationally intensive tasks. PUMA2 currently contains precomputed analysis of 213 prokaryotic, 22 eukaryotic, 650 mitochondrial, and 1493 viral genomes and automated metabolic reconstructions for more than 200 organisms. Genomic data is annotated with information integrated from over 20 sequence, structural, and metabolic databases and ontologies. PUMA2 supports both automated and interactive expert-driven annotation of genomes, using a variety of publicly available bioinformatics tools. It also contains a suite of unique PUMA2 tools for automated assignment of gene function, evolutionary analysis of protein families, and comparative analysis of metabolic pathways. PUMA2 allows users to submit batch sequence data for automated functional analysis and construction of metabolic models. The results of these analyses are made available to the users in the PUMA2 environment for further interactive sequence analysis and annotation.
G. Leaf, H. Kaper, M. Yan, V. Novosad, P. Vavassori, R. E. Camley, M. Grimsditch, "Dynamic Origin of Stripe Domains," Preprint ANL/MCS-P1299-1005, October 2005. We investigate stripe domain formation in nanometer sized Co bars.  The magnetic equilibrium states and the magnetic spin wave frequencies are obtained from micromagnetic-like simulations.  We find that the lowest frequency standing wave mode has the same spatial structure as the stripe domains at remanence, and it goes soft at the field where the stripe domains emerge.  We show, therefore, that the final domain structure at remanence, which is not the configuration with lowest energy, is predicted from a high-field analysis of the frequencies of the standing spin waves.
M. Anitescu, D. Negrut, P. Zapol, A. El-Azab, "A Note on the Regularity of Reduced Models Obtained by Quasi-continuum-like Approaches," Preprint ANL/MCS-P1303-1105, November 2005. This work investigates model reduction techniques that are based on a quasi-continuum-like approach.  These techniques reduce a large optimization problem to either a system of nonlinear equations or another optimization problem that are expressed in a smaller number of degrees of freedom.  The reduction is based on the observations that many of the