mcs | publications | abstracts

2002 Abstracts of MCS Reports and Preprints

Reports

C. J. Anderson, R. W. Arritt, E. S. Takle, Z. Pan, W. J. Gutowski, Jr., F. O. Otieno, R. da Silva, D. Caya, S.-C. Chen, J. H. Christensen, D. Lüthi, M. A. Gaertner, C. Gallardo, F. Giorgi, S.-Y. Hong, C. Jones, H.-M. H. Juang, J. J. Katzfey, W. M. Lapenta, R. Laprise, J. W. Larson, G. E. Liston, J. L. McGregor, R. A. Pielke, Sr., J. O. Roads, J. A. Taylor, "Hydrological Processes in Regional Climate Model Simulations of the Central United States Flood of June-July 1993," J. Hydrometeor. 4 (2003), pp. 584-598.  Also Climate and Global Change Report ANL/CGC-012-0402, April 2002.  paper.pdf, figures.pdf, table1.pdf, tables2,3.pdf 

 

Regional climate model (RCM) simulations of hydroclimate for the central U.S. are sensitive to RCM design, yet comparison of RCM results under common experimental conditions is rare.  Thus, the degree of and sources for inter-model variability are not well known.  We have compared 60-d simulations of 1993 June-July from thirteen RCM simulations to each other and observations.  Boundary data and initial conditions were supplied by the Project to Intercompare Regional Climate Simulations (PIRCS) experiment 1b.  We have examined water vapor conservation and precipitation characteristics in each RCM for a 100x100 subregion of the Upper Mississippi River Basin (UMRB), containing the region of maximum 60-d accumulated precipitation in all RCMs and station reports.  Results showed that gross features of hydroclimate were well simulated in all RCMs.  Specifically, the RCMs produced positive precipitation minus evaporation (P-E>0), and RCM recycling ratios were within the range estimated from observations.  The range of P-E in RCMs enveloped the range of estimates of observed P-E, but most RCMs produced P-E below the estimated observed range.  We found sensitivity of RCM E to radiation parameterization, including clouds, but inter-model variability of E was spread evenly about estimates of observed E suggesting little, if any, common errors of E among the simulations.  In contrast, most RCMs produced P that was below the range of P from observations; thus a common dry bias of the simulations accounted for the low values of simulated P-E compared to observations.  Daily cycles of terms in the water vapor conservation equation revealed that P in most RCMs is driven by tghe dynamics of atmospheric circulation.  In most simulations, nocturnal maxima of P and C (convergence) occurred simultaneously, consistent with observations of P and climatological studies of water vapor conservation.  Three of the four driest RCMs had maximum P in the afternoon, while the time of maximum C was variable, suggesting that in these RCMs, afternoon destabilization by insolation strongly influenced the precipitation process......
   
P. D. Hovland and B. Norris, "Users' Guide to ADIC 1.1," Technical Memorandum ANL/MCS-TM-225, July 2002. This guide describes the use of the Automatic Differentiation in C (ADIC) system. ADIC is a suite of tools and libraries that automates the process of generating derivatives for scientific programs. In the context of solving PDEs, optimizations, sensitivity analysis, and inverse problems, researchers often require the derivatives f /x of a function f expressed as a program with respect to some input parameter(s) x. Automatic differentiation (AD) techniques augment the program with derivative computation by applying the chain rule of calculus to elementary operations in an automated fashion. ADIC uses sophisticated compiler techniques to augment the input C programs with derivative computation capability in an automatic fashion. It also provides a finer control of derivative code generation process via control scripts and pragmas. Another significant capability of ADIC is its component architecture, AIF, that allows ADIC's capability to be extended via plug-in modules.

For more information about ADIC, see http://www.mcs.anl.gov/adic.

I. R. Judson, "Access Grid Node Minimum Requirements," Technical Memorandum ANL/MCS-TM-257, April 2002. The Access Grid is a group-to-group collaborative system developed at Argonne National Laboratory.  The system is designed to support high-fidelity, high-bandwidth interactions.  This document specifies the minimum requirements for a space to be considered an Access Grid Node.
S. A. Mickelson, J. A. Taylor, and M. Dvorak, "Simplifying the Task of Generating Climate Simulations and Visualizations," Global Change Report ANL/CGC-013-0402, April 2002. To fully exploit the use of the MM5 modeling system, the scientist must spend several months studying the system, thus losing valuable research time.  To solve this problem, we have created a graphical user interface, called Espresso, that allows users to change these values without having to examine the code.  This approach dramatically increases the usability of the model and speeds productivity.  We have also modified Vis5D to run on tiled displays.  Using such displays allows us to view our climate data at much higher resolution.
J. Taylor, M. Dvorak, and S. Mickelson, "Developing Grid Based Infrastructure for Climate Modeling," Global Change Report ANL/CGC-010-0102, January 2002. In this paper we discuss the development of a high performance climate modeling system as an example of the application of Grid based technology to climate modeling.  The climate simulation system at Argonne currently includes a scientific modeling interface (Espresso) written in Java which incorporates Globus middleware to facilitate climate simulations on the Grid.  The climate modeling system also includes a high performance version of MM5v3.4 modified for long climate simulations on our 512 processor Linux cluster (Chiba City), an interactive web based tool to facilitate analysis and collaboration via the web, and an enhanced version of the Cave5D software capable of visualizing large climate data sets.  We plan to incorporate other climate modeling systems such as the Fast Ocean Atmosphere Model (FOAM) and the National Center for Atmospheric Research's (NCAR) Community Climate Systems Model (CCSM) within Espresso to facilitate their application on computational grids.
M. Dvorak, J. Taylor, and S. Mickelson, "Designing a Flexible Grid Enabled Scientific Modeling Interface," Global Change Report ANL/CGC-011-0102, January 2002 (pdf) (postscript) The Espresso Scientific Modeling Interface (Espresso) is a scientific model productivity tool developed for climate modelers.  Espresso was designed to be an extensible interface to both scientific models and Grid resources.  It also aims to be a contemporary piece of software that relies on Globus.org's Java CoG Kit for a Grid toolkit, Sun's Java 2 API and is configured using XML.  This article covers the design and implementation of Espresso's Grid functionality and how it interacts with existing scientific models.  We give specific examples of how we have designed Espresso to perform climate simulations using the PSU/NCAR MM5 atmospheric model.  Plans to incorporate the CCSM and FOAM climate models are also discussed.

Preprints

P. Hovland, B. Norris, and B. Smith, "Making Automatic Differentiation Truly Automatic: Coupling PETSc with ADIC," Preprint ANL/MCS-P922-0102, January 2002. Despite its name, automatic differentiation (AD) is often far from an automatic process.  Often one must specify independent and dependent variables, indicate the derivative quantities to be computed, and perhaps even provide information about the structure of the Jacobians or Hessians being computed.  However, when AD is used in conjunction with a toolkit with well-defined interfaces, many of these issues do not arise.  We describe recent research into coupling the ADIC automatic differentiation tool with PETSc, a toolkit for the parallel numerical solution of PDEs.  This research leverages the interfaces and objects of PETSc to make the AD process very nearly transparent.
   
L. Wos, "A Spectrum of Applications of Automated Reasoning," Preprint ANL/MCS-P923-0102, January 2002. The likelihood of an automated reasoning program being of substantial assistant for a wide spectrum of applications rests with the nature of the options and parameters it offers on which to base needed strategies and methodologies.  This article focuses on such a spectrum, featuring W. McCune's program OTTER, discussing widely varied successes in answering open questions, and touching on some of the strategies and methodologies that played a key role.  The applications include finding a first proof, discovering single axioms, locating improved axiom systems, and simplifying existing proofs.  The last application is directly pertinent to the recently found (by R. Thiele) Hilbert's twenty-fourth problem---which is extremely amenable to attack with the appropriate automated reasoning program---a problem concerned with proof simplification.  The methodologies include those for seeking shorter proofs and for finding proofs that avoid unwanted lemmas or classes of term, a specific option for seeking proofs with smaller equational or formula complexity, and a different option to address the variable richness of a proof.  The type of proof one obtains with the use of OTTER is Hilbert-style axiomatic, including details that permit one sometimes to gain new insights.  We include questions still open and challenges that merit consideration.
   
L. Wos, "The Flowering of Automated Reasoning," Preprint ANL/MCS-P924-0102, January 2002. This article celebrates the obvious joy the role automated reasoning now plays for mathematics and logic.  Simultaneously, this article evidences the realization of a dream thought impossible just four decades ago by almost all.  But there were believers, including Joerg Siekmann to whom this article is dedicated in honor of his sixtieth birthday.  Indeed, today (in the year 2001), a researcher can enlist the aid of an automated reasoning program often with the reward of a new proof or a better proof in some significant aspect.  The contributions to mathematics and logic made with an automated reasoning assistant are many, diverse, often significant, and of the type Hilbert would indeed have found most pleasurable.  The proofs discovered by W. McCune's OTTER (as well as other programs) are Hilbert-style axiom twenty-fourth problem (recently unearthed by Rudiger Thiele), which focuses on the completion of simpler proofs.  In that regard, as well as others, I offer challenges and open questions, frequently providing appropriate clauses to provide a beginning.
   
U. Naumann, "Reducing the Memory Requirement in Reverse Mode Automatic Differentiation by Solving TBR Flow Equations," Preprint ANL/MCS-P925-0102, January 2002. The fast computation of gradients in reverse mode Automatic Differentiation (AD) requires the generation of adjoint versions of every statement in the original code.  Due to the resulting reversal of the control flow, certain intermediate values have to be made available in reverse order to compute the local partial derivatives.  This can be achieved by storing these values or by recomputing them when they become required.  In any case one is interested in minimizing the size of this set.  Following an extensive introduction of the "To-Be-Recorded" (TBR) problem we will present flow equations for propagating the TBR status of variables in the context of reverse mode AD of structured programs.
S. Vazhkudai, J. M. Schopf, I. Foster, "Predicting the Performance of Wide Area Data Transfers," Preprint ANL/MCS-P927-0102, Jan. 2002. As Data Grids become more commonplace, large data sets are being replicated and distributed to multiple sites, leading to the problem of determining which replica can be accessed most efficiently.  The answer to this question can depend on many factors, including physical characteristics of the resources and the load behavior on the CPUs, networks, and storage devices that are part of the end-to-end path linking possible sources and sinks.  We develop a predictive framework that combines (1) integrated instrumentation that collects information about the end-to-end performance of past transfers, (2) predictors to estimate future transfer times, and (3) a data delivery infrastructure that provides users with access to both the raw data and our predictions.  We evaluate the performance of our predictors by applying them to log data collected from a wide area testbed.  These preliminary results provide insights into the effectiveness of using predictors in this situation.
   
G. von Laszewski, E. Blau, M. Gletzinger, J. Gawor, P. lane, S. Martin, M. Russell, "Software, Component, and Service Deployment in Computational Grids," Preprint ANL/MCS-P928-0102, January 2002. Grids comprise an infrastructure that enables scientists to use a diverse set of distributed remote services and resources as part of complex scientific problem-solving processes.  We analyze some of the challenges involved in deploying software and components transparently in Grids.  We report on three practical solutions used by the Globus Project.  Lessons learned from this experience lead us to believe that it is necessary to support a variety of software and component deployment strategies.  These strategies are  based on the hosting environment.
   
E. T. Ong, J. W. Larson, R. L. Jacob, "A Real Application of the Model Coupling Toolkit," Preprint ANL/MCS-P929-0102, January 2002. The high degree of computational complexity of atmosphere and ocean general circulation models, land-surface models, and dynamical sea-ice models makes coupled climate modeling a grand-challenge problem in high-performance computing.  On distributed-memory parallel computers, a coupled model comprises multiple message-passing-parallel models, each of which must exchange data among themselves or through a special component called a coupler.  The Model Coupling Toolkit (MCT) is a set of Fortran90 objects that can be used to easily create low bandwidth parallel data exchange algorithms and other functions of a parallel coupler.  In this paper we describe the MCT, how it was employed to implement some of the important functions found in the flux coupler for the Parallel Climate Model (PCM), and compare the performance of MCT-based PCM functions with their PCM counterparts.
B. Norris, S. Balay, S. Benson, L. Freitag, P. Hovland, L. McInnes, B. Smith, "Parallel Components for PDEs and Optimization: Some Issues and Experiences," Preprint ANL/MCS-P932-0202, Feb. 2002. High-performance simulations in computational science often involve the combined software contributions of multidisciplinary teams of scientists, engineers, mathematicians, and computer scientists.  One goal of component-based software engineering in large-scale scientific simulations is to help manage such complexity by enabling better interoperability among codes developed by different groups.  This paper discusses recent work on building component interfaces and implementations in parallel numerical toolkits for mesh manipulations, discretization, linear algebra, and optimization.  We consider several motivating applications involving partial differential equations and unconstrained minimization to demonstrate this approach and evaluate performance.
   
U. Naumann, S. Forth. M. Tadjouddine, J. Phyce, "A Standard Interface for Elimination Sequences in Automatic Differentiation," Preprint ANL/MCS-P934-0205, February 2002. We propose a standard format for exchanging linearized computational graphs and vertex, edge, and face elimination sequences between various AD software tools.
   
L. Hascoët, U. Naumann, and V. Pascual, "TBR Analysis in Reverse-Mode Automatic Differentiation," Future Generation Comput. Syst. 21(8) (2005), pp. 1401-1417.  Also Preprint ANL/MCS-P936-0202, February 2002. The automatic generation of adjoints of mathematical models that are implemented as computer programs is receiving an increased attention in the scientific and engineering communities.  Reverse-mode automatic differentiation is of particular interest for large-scale optimization problems.  It allows the computation of gradients at a small constant multiple of the cost for evaluating the objective function itself, that is independent of the number of input parameters.  Source-to-source transformation tools for automatic differentiation are available to generate adjoint codes based on the adjoint version of every statement can be be built by applying simple differentiation rules.  Therefore, a reversal of the control flow of the original program becomes necessary.  To guarantee correctness, certain values that are computed and overwritten in the original program must be made available in the adjoint program.  They can be determined by performing a static data flow analysis, the so-called TBR analysis.  Overestimation of this set must be kept minimal to get efficient adjoint codes.  For many real-world applications the applicability of source-to-source transformation tools for automatic differentiation cannot be achieved without this efficiency.
   
I. Foster, "The Grid: A New Infrastructure for 21st Century Science," Physics Today (to appear).  Also Preprint ANL/MCS-P937-0202, February 2002. In this article, I review the technology trends that motivate the Grid, explain the types of resource sharing enabled by Grids and their significance for science, and identify key supporting technologies.
T. Iliescu and P. Fischer, "Large Eddy Simulation of Turbulent Channel Flows by the Rational LES Model," Preprint ANL/MCS-P938-0302, March 2002. The rational large eddy simulation (RLES) model is applied to turbulent channel flows.  This approximate deconvolution model is based on a rational (subdiagonal Padé) approximation of the Fourier transform of the Gaussian filter and is proposed as an alternative to the gradient (also known as the nonlinear or tensor-diffusivity) model.  We used a spectral element code to perform large eddy simulations of incompressible channel flows at Reynolds numbers based on the friction velocity and the channel half-width Ret = 180 and Ret = 395.  We compared the RLES model with the gradient model.  The RLES results showed a clear improvement over those corresponding to the gradient model, comparing well with the fine direct numerical simulation.  For comparison, we also present results corresponding to a classical subgrid-scale eddy-viscosity model such as he standard Smagorinsky model.
   
L. Freitag, T. Leurent, P. Knupp, D. Melander, "MESQUITE Design: Issues in the Development of a Mesh Quality Improvement Toolkit, Preprint ANL/MCS-P940-0302, March 2002. Poor mesh quality is known to adversely affect both solution efficiency and accuracy.  There has been considerable research on a wide variety of mesh improvement algorithms, but the impact of these algorithms on applications has been limited because they are typically embedded in particular meshing software packages.  To rectify this situation, we are developing a stand-alone mesh quality improvement toolkit called MESQUITE.  In this paper, we describe the motivation goals, and design of MESQUITE and give some computational results using the underlying algorithms that show the benefit of such a package.
   
D. Brown and L. Freitag, "Creating Interoperable Meshing and Discretization Software: Teh Terascale Simulation Tools and Technology Center," Preprint ANL/MCS-P941-0302, March 2002. We present an overview of the technical objectives of the Terascale Simulation Tools and Technologies center.  The primary goal of this multi-institution collaboration is to develop technologies that enable application scientists to easily use multiple mesh and discretization strategies within a single simulation on terascale computers.  The discussion focuses on our efforts to create interoperable mesh generation tools, high-order discretization techniques, and adaptive meshing strategies.
   
N. T. Karonis, B. Toonen, I. Foster, "MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface," Preprint ANL/MCS-P942-0402, April 2002. Application development for distributed computing "Grids'' can benefit from tools that variously hide or enable application-level management of critical aspects of the heterogeneous environment. As part of an investigation of these issues, we have developed MPICH-G2, a Grid-enabled implementation of the Message Passing Interface (MPI) that allows a user to run MPI programs across multiple computers, at the same or different sites, using the same  commands that would be used on a parallel computer. This library extends the Argonne MPICH implementation of MPI to use services provided by the Globus Toolkit for authentication, authorization, resource allocation, executable staging, and I/O, as well as for process creation, monitoring, and control. Various performance critical operations, including startup and collective operations, are configured to exploit network topology information. The library also exploits MPI constructs for performance management; for example, the MPI communicator construct is used for application-level discovery of, and adaptation to, both network topology and network quality-of-service mechanisms. We describe the MPICH-G2 design and implementation, present performance results, and review application experiences, including record-setting distributed simulations.
U. Naumann, "Optimal Accumulation of Jacobian Matrices by Elimination Methods on the Dual Computational Graph," Preprint ANL/MCS-P943-0402, April 2002. The accumulation of the Jacobian matrix F' of a vector function 
F: R[sup n] -> R[sup m] can be regarded as a transformation of its 
linearized computational graph into a subgraph of the directed complete bipartite graph K[sub n,m].  This transformation can be performed by applying different elimination techniques that may lead to varying costs for computing F'.  This paper introduces face elimination as the basic technique for accumulating Jacobian matrices by using a minimal number of arithmetic operations.  Its superiority over both edge and vertex elimination methods is shown. The intention is to establish the conceptual basis for the ongoing development of algorithms for optimizing the computation of Jacobian matrices. 
U. Naumann, "On Optimal Jacobian Accumulation for Single Expression use Programs," Preprint ANL/MCS-P944-0402, April 2002. ADIFOR and ADIC, the widely used software tools for Automatic Differentiation, use assignment-level reverse mode to compute local gradients of scalar assignment.  This pre-accumulation often results in very efficient forward-mode derivative code.  Scalar assignments belong to the class of Single Expression Use (SEU) programs.  There, the values of all intermediate variables are read exactly once.  Based on several theoretical results, we derive an algorithm for generating optimal Jacobian code for SEU programs.  The number of floating-point operations performed during the accumulation of the Jacobian is minimized.
G. von Laszewski, "Grid Computing: Enabling a Vision for Collaborative Research," Preprint ANL/MCS-P945-0302, March 2002. In this paper we provide a motivation for Grid computing based on a vision to enable a collaborative research environment.  Our vision goes beyond the connection of hardware resources.  We argue that with an infrastructure such as the Grid, new modalities for collaborative research are enabled.  We provide an overview showing why Grid research is difficult, and we present a number of management-related issues that must be addressed to make Grids a reality.  We list projects that provide solutions to subsets of these issues.
E. D. Dolan, R. Fourer, J. J. Moré, T. S. Munson, "The NEOS Server for Optimization: Version 4 and Beyond," SIAM News 35 (2002), pp. 5, 8-9.  Also Preprint ANL/MCS-P947-0202, February 2002. We describe developments associated with Version 4 of the NEOS Server and note that these developments have led to an exponential growth in the number of job submissions.  We also provide an overview of some of the research and educational uses for the NEOS Server and discuss future research challenges.
 N. T. Karonis, B. de Supinski, I. Foster, W. Gropp, E. Lusk, S. Lacour, "A Multilevel Approach to Topology-Aware Collective Operations in Computational Grids," Preprint ANL/MCS-P948-0402, April 2002. The efficient implementation of collective communication operations has received much attention. Initial efforts produced "optimal'' trees based on network communication models that assumed equal point-to-point latencies between any two processes. This assumption is violated in most practical settings, however, particularly in heterogeneous systems such as clusters of SMPs and wide-area "computational Grids,'' with the result that collective operations perform suboptimally. In response, more recent work has focused on creating topology-aware  trees for collective operations that minimize communication across slower channels (e.g., a wide-area network). While these efforts have significant communication benefits, they all limit their view of the network to only two layers. We present a strategy based upon a multilayer view of the network. By creating multilevel topology-aware} trees we take advantage of communication cost differences at every level in the network. We used this strategy to implement topology-aware versions of several MPI collective operations in MPICH-G2, the Globus Toolkit™-enabled version of the
popular MPICH implementation of the MPI standard.  Using information about topology provided by MPICH-G2, we construct these multilevel topology-aware trees automatically during execution.  We present results demonstrating the advantages of our multilevel approach by comparing it to the default (topology-unaware) implementation provided by MPICH and a topology-aware two-layer implementation.
S. Vazhkudai and J. M. Schopf, "Predicting Sporadic Grid Data Transfers," Preprint ANL/MCS-P949-0402, April 2002. The increasingly common practice of (1) replicating datasets and (2) using resources as distributed data stores in Grid environments has led to the problem of determining which replica can be accessed most efficiently.  Due to diverse performance characteristics and load variations of several components in the end-to-end path linking these various locations, selecting a replica location from among many requires accurate prediction information of end-to-end data transfer times between the sources and sinks.  In this paper, we present a prediction system that is based on combining end-to-end application throughput observations and network load variations, drawing from their merits of capturing whole system performance and variations in load patterns respectively.  We develop a set of regression models to derive predictions that characterize the effect of network load variations on file transfer times.  We apply these techniques to the GridFTP data movement tool, part of the Globus Toolkit™, and observe performance gains of up to 10% in prediction accuracy when compared to approaches based on past system behavior in isolation.
P. D. Hovland, U. Naumann, B. Norris, "An XML-Based Platform for Semantic Transformation of Numerical Programs," Preprint ANL/MCS-P950-0402, April 2002. We describe a simple component architecture for the development of tools for mathematically based semantic transformations of scientific software.  This architecture consists of compiler-based, language-specific front- and back-ends for source transformation, loosely coupled with one or more language-independent "plug-in" transformation modules.  The coupling mechanism between the front- and back-ends and transformation modules is provided by the XML Abstract Interface Form (XAIF).  XAIF provides an abstract, language-independent representation of language constructs common in imperative languages, such as C and Fortran.  We describe the use of this architecture in the construction of tools for automatic differentiation (AD) of programs written in Fortran 77 and ANSI C.  The XAID is particularly well suited for performing the source transformations needed for AD.  Differentiation modules typically operate within the scope of statements or basic blocks, working at a level where procedural languages are very similar.  Thus, it is possible to specify a common interface format for mathematically-based semantic transformations that need not represent the union of all languages.
W. Allcock, J. Bester, J. Bresnahan, I. Foster, J. Gawor, J. A. Insley, J. M. Link, M. E. Papka, "GridMapper: A Tool for Visualizing the Behavior of Large-Scale Distributed Systems," Preprint ANL/MCS-P951-0502, May 2002. Grid applications can combine the use of compute, storage, network, and other resources.  These resources are often geographically distributed, adding to application complexity and thus the difficulty of understanding application performance.  We present GridMapper, a tool for monitoring and visualizing the behavior of such distributed systems.  GridMapper builds on basic mechanisms for registering, discovering, and accessing performance information sources, as well as for mapping from domain names to physical locations.  The visualization system itself then supports the automatic layout of distributed sets of such sources and animation of their activities.  We use a set of examples to illustrate how the system can provide valuable insights into the behavior and performance of a range of different applications.
   
G. von Laszewski, I. Foster, J. Gawor, A. Schreiber, C. J. Pena, "InforGram: A Grid Service that Supports Both Infomation Queries and Job Execution," Preprint ANL/MCS-P952-0502, May 2002. The research described in this paper is performed as part of the Globus Project.  It introduces a new Grid service called InfoGram that combines the ability of serving as information service and as a job execution service.  Previously, both services were architected and implemented within the Globus Toolkit as two different services with different wire protocols.  Our service demonstrates a significant simplification of the architecture while treating job submissions and information queries alike.  The advantage of our service is that it provides backwards compatibility to existing Grid services, while t the same time providing forwards compatibility to the emerging Web services world.  Part of the work conducted within this effort is already reused by the current Open Grid Services Architecture prototype implementation.
   
L. Freitag, P. Knupp, T. Munson, S. Shontz, "A Comparison of Optimization Software for Mesh Shape-Quality Improvement Problems," Preprint ANL/MCS-P953-0502, May 2002. Simplicial mesh shape-quality can be improved by optimizing an objective function based on tetrahedral shape measures.  If the objective function is formulated in terms of all elements in a given mesh rather than a local patch, one is confronted with a large-scale, nonlinear, constrained numerical optimization problem.  We investigate the use of six general-purpose state-of-the-art solvers and two custom-developed methods to solve the resulting large-scale problem.  The performance of each method is evaluated in terms of robustness, time to solution, convergence properties, and scalability on seveal two- and three-dimensional test cases.
I. Foster, J. Vöckler, M. Wilde, Y. Zhao, "Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation," Preprint ANL/MCS-P954-0502, May 2002. Much scientific data is not obtained from measurements but rather derived from other data by the application of computational procedures.  We hypothesize that explicit representation of these procedures can enable documentation of data provenance, discovery of available methods, and on-demand data generation (so-called "virtual data").  To explore this idea, we have developed the Chimera virtual data system, which combines a virtual data catalog, for representing data derivation procedures and derived data, with a virtual data language interpreter that translates user requests into data definition and query operations on the database.  We couple the Chimera system with distributed "Data Grid" services to enable on-demand execution of computation schedules constructed from database queries.  We have applied this system to two challenge problem, the reconstruction of simulated collision event data from a high-energy physics experiment, and the search of digital sky survey data for galactic clusters, with promising results.
W. McCune, R. Padmanabhan, R. Veroff, "Yet Another Single Law for Lattices," Algebra Universalis 50, no. 2 (2003), pp. 165-169.  Also Preprint ANL/MCS-P955-0402, April 2002. In this note we show that the equational theory of all lattices is defined by a single absorption law.  The identity of length 29 with 8 variables is shorter than previously known such equations defining lattices.
W. Gropp, "Building Library Components That Can Use Any MPI Implementation," Preprint ANL/MCS-P956-0502, May 2002. The Message Passing Interface (MPI) standard for programming parallel computers is widely used for building both programs and libraries.  Two of the strengths of MPI are its support for libraries and the existence of multiple implementations on many platforms.  These two strengths are in conflict, however, when an application wants to use libraries built with different MPI implementations.  This paper describes several solutions to this problem, based on minor changes to the API.  These solutions also suggest design considerations for other standards, particularly those that expect to have multiple implementations and to be used in concert with other libraries.
J. J. More' and T. S. Munson, "Computing Mountain Passes and Transition States," Preprint ANL/MCS-P957-0502, May 2002.  To appear in Mathematical Programming. We propose the elastic string algorithm for computing mountain passes in finite-dimensional problems. We analyze the convergence properties and numerical performance of this algorithm for benchmark problems in chemistry and discretizations of infinite-dimensional variational problems. We show that any limit point of the elastic string algorithm is a path that crosses a critical point at which the Hessian matrix is not positive definite.
   
M. Hereld, I. R. Judson, and R. Stevens, "DottyToto: A Measurement Engine for Aligning Multi-Projector Display Systems," Preprint ANL/MCS-P958-0502, May 2002. Tiled displays systems built by combining the images from arrays of projectors can provide huge numbers of pixel elements to applications needing to visually represent lots of information. Such applications are already coming into wide usage and include large scientific visualizations, collaborative virtual environments, and rich multimedia spaces. It is, however, difficult to create the illusion of a unified seamless display for a variety of reasons including optical distortion of the individual projector images due to imperfections in the lenses and basic alignment of the projectors. In this paper we describe an efficient and optimized measurement process using inexpensive components that is tolerant of a wide range of imperfections in components and measurement setup (lighting conditions, camera optics, etc.). Our method nonetheless is capable of accurate and detailed measurement of the layout of all projector images, including the generation of a detailed model of the distortions in each projector optical system. It performs these measurements on the entire array of projectors at once. Once the detailed mapping between projector pixels and mural pixels is measured, the resulting relations can be used in any of a number of ways to improve the appearance of images projected on the display.
L. Wos and R. Thiele, "Hilbert's New Problem," Preprint ANL/MCS-P959-0102, Jan. 2002. Throughout the twentieth century, the worlds of logic and mathematics were well aware of Hilbert's twenty-three problems and the challenge they offered.  Although not known until very recently, there existed yet one more challenge offered by Hilbert, his twenty-fourth problem.  This problem focuses on finding simpler proofs, on the criteria for measuring simplicity, and on the "development of a theory of the method of proof in mathematics in general".  Of the three themes of Hilbert's twenty-fourth problem, the first two are central to this article.  We visit various areas of logic, showing that some of the studies of the masters are indeed strongly connected to this newly discovered problem.  We also demonstrate that the use of an automated reasoning program (specifically, W. McCune's OTTER) enables one to address this challenging problem.  We offer questions that remain unanswered.
M. W. Knop, P. A. Dinda, and J. M. Schopf, "Windows Performance Monitoring and Data Reduction using WatchTower," Preprint ANL/MCS-P960-0602, June 2002. We describe and evaluate WatchTower, a system that simplifies the collection of Windows performance counter data for monitoring and usage profiling of Windows machines.  WatchTower has overheads similar to those of Microsoft's perfmon tool, but it is easily embedded into other software.  Furthermore, we use standard statistical tools to reduce the sheer volume of performance data to a manageable subset that nonetheless usefully captures the behavior of the machine and its interactive user.
K. Keahey, T. Fredian, Q. Peng, D. P. Schissel, M. Thompson, I. Foster, M. Greenwald, D. McCune, "Computational Grids in Action: The National Fusion Collaboratory," Future Generation Computer Systems 18 (2002), 1005-1015.  Also Preprint ANL/MCS-P961-0602, June 2002. The National Fusion Collaboratory (NFC) project was created to advance scientific understanding and innovation in magnetic fusion research by enabling more efficient use of existing experimental facilities through more effective integration of experiment, theory, and modeling.  To achieve this objective, NFC introduced the concept of "network services", which build on top of computational Grids and provide Fusion codes, together with their maintenance and hardware resources as a service to the community.  This mode of operation requires the development of new authorization and enforcement capabilities.  In addition, the nature of Fusion experiments places strident quality of service requirements on codes run during the experimental cycle.  In this paper, we describe Grid computing requirements of the Fusion community, and present our first experiments in meeting those requirements.
W. Gropp and E. Lusk, "Goals Guiding Design: PVM and MPI," Preprint ANL/MCS-P962-0502, May 2002. PVM and MPI, two systems for programming clusters, are often compared.  The comparisons usually start with the unspoken assumption that PVM and MPI represent different  solutions to the same problem. In this paper we show that, in fact, the two systems often are solving different problems. In cases where the problems do match but the solutions chosen by PVM and MPI are different, we explain the reasons for the differences.  Usually such differences can be traced to explicit differences in the goals of the two systems, their origins, or the relationship between their specifications and their implementations. For example, we show that the requirement for portability and performance across many platforms caused MPI to choose approaches different from those made by PVM, which is able to exploit the similarities of network-connected systems.
L. Wos, R. Veroff, and G. W. Pieper, Logical Basis for the Automation of Reasoning: Case Studies," Preprint ANL/MCS-P963-0102, January 2002 With the availability of computers in the late 1950s, researchers began to entertain the possibility of automating reasoning.  Immediately, two distinctly different approaches were considered as the possible basis for that automation.  In one approach, the intention was to study and then emulate prople; in the other, the plan was to rely on mathematical logic.  In this paper, we focus mainly on the latter, for logic has proved to provide an excellent basis for the automation of reasoning.
M. Russell, G. Allen, G. Daues, I. Foster, E. Seidel, J. Novotny, J. Shalf, G. von Laszewski, "The Astrophysics Simulation Collaboratory: A Science Portal Enabling Community Software Development," Preprint ANL/MCS-P965-0202, Feb. 2002. Grid Portals, based on standard Web technologies, are emerging as important and useful user interfaces to Computational and Data Grids.  Grid Portals enable Virtual Organizations, comprised of distributed researches, to collaborate and access resources more efficiently and seamlessly.  The Astrophysics Simulation Collaboratory (ASC) Grid Portal provides a framework to enable researchers in the field of numerical relativity to study astrophysical phenomenon by making use of the Cactus computational toolkit.  We examine user requirements and describe the design and implementation of the ASC Grid Portal.
L. Wos, D. Ulrich, and B. Fitelson, "XCB, the Last of the Shortest Single Axioms for the Classical Equivalential Calculus," Bull. Section Logic 32(3) (2003), pp. 120-134.  Also Preprint ANL/MCS-P966-0602, June 2002. It has long been an open question whether the formula XCB = EpEEEpqErqr is, with the rules of substitution and detachment, a single axiom for the classical equivalential calculus.  This paper answers that question affirmatively, thus completing a search for all such eleven-symbol single axioms that began seventy years ago.
R. Evard, N. Desai, J.-P. Navarro, and D. Nurmi, "Clusters as Large-Scale Development Facilities," Preprint ANL/MCS-P968-0602, June 2002. In this paper, we describe the use of a cluster as a generalized facility for development.  A development facility is a system used primarily for testing and development activities while being operated reliably for many users.  We are in the midst of a project to build and operate a large-scale development facility.  We discuss our motivation for using clusters in this way and compare the model with a classic computing facility.  We describe our experiences and findings from the first phase of this project.  Many of these observations are relevant to the design of standard clusters and to future development facilities.
J.-P. Navarro. R. Evard, D. Nurmi, N. Desai, "Scalable Cluster Administration - Chiba City I Approach and Lessons Learned," Preprint ANL/MCS-P969-0602, June 2002. Systems administrators of large clusters often need to perform the same administrative activity hundreds or thousands of times.  Often such activities are time-consuming, especially the tasks of installing and maintaining software.  By combining network services such as DHCP, TFTP, FTP, HTTP, and NFS with remote hardware control, cluster administrators can automate all administrative tasks.  Scalable clustger administration addresses the following challenge:  What systems design techniques can cluster builders use to automate cluster administration on very large clusters?  We describe the approach used in the Mathematics and Computer Science Division of Argonne National Laboratory on Chiba City I, a 314-node Linux cluster; and we analyze the scalability, flexibility, and reliability benefits and limitations from that approach.
A. Ching, A. Choudhary, W.-K. Liao, R. Ross, W. Gropp, "Noncontiguous I/O through PVFS," Preprint ANL/MCS-P970-0702, July 2002. With the tremendous advances in processor and memory technology, I/O has risen to become the bottleneck in high-performance computing for many applications.  The development of parallel file systems has helped to ease the performance gap, but I/O still remains an area needing significant performance improvement.  Research has found that noncontiguous I/O access patterns in scientific applications combined with current file system methods to perform these accesses lead to unacceptable performance for large data sets.  To enhance performance of noncontiguous I/O, we have created list I/O, a native version of noncontiguous I/O.  We have used the Parallel Virtual File System (PVFS) to implement our ideas.  Our research and experimentation shows that list I/O outperforms current noncontiguous I/O access methods in most I/O situations and can substantially enhance the performance of real-world scientific applications.
L. Wos, D. Ulrich, and B. Fitelson, "Vanquishing the XCB Question: The Methodological Discovery of the Last Shortest Axiom for the Equivalential Calculus," JAR (to appear).  Also Preprint ANL/MCS-P971-0702, July 2002 With the inclusion of an effective methodology, this article answers in detail a question that, for a quarter of a century, remained open despite intense study by various researchers.  Is the formula XCB = e(x,e(e(e(x,y),e(z,y)),z)) a single axiom for the classical equivalential calculus when the rules of inference consist of detachment (modus ponens) and substitution?  Where the function e represents equivalence, this calculus can be axiomatized quite naturally with the formulas e(x,x), e(e(x,y),e(y,x)), and e(e(x,y),e(e(y,z),e(x,z))), which correspond to reflexivity, symmetry, and transitivity, respectively.  (We note that e(x,x) is dependent on the other two axioms.)  Heretofore, thirteen shortest single axioms for classical equivalence of length eleven had been discovered, and XCB was the only remaining formula of that length whose status was undetermined. To show that XCB is indeed such a single axiom, we focus on the rule of condensed detachment, a rule that captures detachment together with an appropriately general, but restricted, form of substitution. The proof we present in this paper consists of twenty-five applications of condensed detachment, completing with the deduction of transitivity followed by a deduction of symmetry. We also discuss some factors that may explain in part why XCB resisted relinquishing its treasure for so long. Our approach relied on diverse strategies applied by the automated reasoning program OTTER. Thus ends the search for shortest single axioms for the equivalential calculus.
G. von Laszewski, B. Ruscic, P. Wagstrom, S. Krishnan, K. Amin, S. Nijsure, S. Bittner, R. Pinzon, J. C. Hewson, M. L. Morton, M. Minkoff, A. Wagner, "A Grid Service-Based Active Thermochemical Table Framework," Preprint ANL/MCS-P972-0702, July 2002. In this paper we report our work on the integration of existing scientific applications using Grid Services.  We describe a general architecture that provides access to these applications via Web services-based application factories.  Furthermore, we demonstrate how such services can interact with each other.  These interactions enable a level of integration that assists the scientific application architect in leveraging applications running in heterogeneous runtime environments.  Our architecture is implemented by using existing infrastructures and middleware, such as Web services, the Globus Toolkit, and the Java CoG Kit.  We test our architecture on a thermochemistry application that provides a number of requirements, such as batch processing, interactive and collaborative steering, use of multiple platforms, visualization through large displays, and access via a portal framework.  Besides the innovative use of the Grid and Web services, we have also provided a novel algorithmic contribution to scientific disciplines that use thermochemical tables.  Specifically, we modified the original approach to constructing thermochemical tables to include an iterative process of refinement leading to increased accuracy; we are now implementing this approach.  We have designed a portal for accessing the set of services provided, which include the display of network dependencies between the reactions a chemist may be interested in and interactive querying of associated species data.
G. von Laszewski, M. Russell, I. Foster, J. Shalf, G. Allen, G. Daues, J. Novotny, E. Seidel, "Community Software Development with the Astrophysics Simulation Collaboratory," Preprint ANL/MCS-P974-0702, July 2002. We describe a Grid-based collaboratory that supports the collaborative development and use of advanced simulation codes.  Our implementation of this collaboratory uses a mix of Web technologies (for thin-client access) and Grid services (for secure remote access to, and management of, distributed resources).  Our collaboratory enables researchers in geographically disperse locations to share and access compute, storage, and code resources, without regard to institutional boundaries.  Specialized services support community code development via specialized Grid services, such as online code repositories.  We use this framework to construct the Astrophysics Simulation Collaboratory, a domain-specific collaboratory for the astrophysics simulation community.  This Grid-based collaboratory enables researchers in the field of numerical relativity to study astrophysical phenomena by using the Cactus computational toolkit.
P. M. Dickens and W. Gropp, "An Evaluation of a User-Level Data Transfer Mechanism for High-Performance Networks," Preprint ANL/MCS-P976-0602, June 2002.

In this paper, we describe FOBS: a simple user-level communication protocol designed to take advantage of the available bandwidth in a high-bandwidth, high-delay network environment.  We compare the performance of FOBS with that of TCP both with and without the so-called Large Window extensions designed to improve the performance of TCP in this type of network environment.  We show that FOBS can obtain on the order of 90% of the available bandwidth across both short- and long-haul high-performance network connections.  For the long-haul connection, the bandwidth obtained was 1.8 times higher than that of the optimized TCP algorithm.  Also, we demonstrate that the additional traffic placed on the network because of the greedy nature of the algorithm is quite reasonable, representing approximately 3% of the total data transferred.

T. Iliescu and P. Fischer, "Backscatter in the Rational LES Model," Preprint ANL/MCS-P977-0702, July 2002. This paper presents a comparison for the backscatter (the inverse transfer of energy from small to large scales) in the rational and the gradient large eddy simulation (LES) models.  We applied both LES models in the numerical simulation of turbulent channel flows at Ret = 180 and Ret = 395.  The rational LES model yielded improved results and was more stable numerically.
   
J. Annis, Y. Zhao, J. Voeckler, M. Wilde, S. Kent, I. Foster, "Applying Chimera Virtual Data Concepts to Cluster Finding in the Sloan Sky Survey," Preprint ANL/MCS-P978-0702, July 2002. The GriPhyN project is one of several major efforts working to enable large-scale data-intensive computation as a routine scientific tool.  GriPhyN focuses in particular on virtual data technologies that allow computational procedures and results to be exploited as community resources so that, for example, scientists cannot only run their own computations on raw data, but also discover computational procedures developed by others and data produced by these procedures.  A request to retrieve data on a particular cluster might thus either lead to the retrieval of the requested data from a local or remote database or the scheduling of a computation to produce the data.
   
J. N. Lyness and S. Joe, "The Number of Lattice Rules of Specified Upper Class and Rank," BIT Numerical Mathematics 43 (2003), pp. 413-426.  Also Preprint ANL/MCS-P979-0802, August 2002. The upper class of a lattice rule is a convenient entity for classification and other purposes.  The rank of a lattice rule is a basic characteristic, also used for classification.  By introducing a rank proportionality factor and obtaining certain recurrence relations, we show how many lattice rules of each rank exist in any prime upper class.  The Sylow p-decomposition may be used to obtain corresponding results for any upper class.
S. Krishnan, P. Wagstrom, G. von Laszewski, "GSFL: A Workflow Framework for Grid Services," Preprint ANL/MCS-P980-0802, Aug. 2002. The Open Grid Services Architecture (OGSA) is addressing the challenge of integrating services spread across distributed, heterogeneous, dynamic virtual organizations, using the concepts and technologies from both the Grid and Web service communities.  The Web service community has realized that Web services can reach their full potential only if there exists a mechanism to describe the various interactions between the services and dynamically compose new services out of existing ones.  This situation is true in the case of Grid services as well.  In this paper, we analyze existing technologies that address workflow for Web services, and try to leverage them for Grid services, which have different needs from standard Web services.  We discuss these special needs and present the Grid Services Flow Language (GSFL), which addresses them for Grid services within the OGSA framework.
S. Vazhkudai and J. M. Schopf, "Using Disk Throughput Data in Predictions of End-to-End Grid Data Transfers," Preprint ANL/MCS-P981-0802, August 2002. Data Grids provide an environment for communities of researchers to share, replicate, and manage access to copies of large datasets.  In such environments, fetching data from one of the several replica locations requires accurate predictions of end-to-end transfer times.  Predicting transfer time is significantly complicated due to the involvement of several shared components such as networks, disks, etc., in the end-to-end data path each of which experiences load variations that can significantly affect the throughput.  Of these, disk accesses are rapidly growing in cost and have not been previously considered, although on some machines they can be up to 30% of the transfer time.  In this paper, we present techniques to combine observations of end-to-end application behavior and disk I/O throughput load data.  We develop a set of regression models to derive predictions that characterize the effect of disk load variations on file transfer times.  We also include network component variations and apply these techniques to the logs of transfer data using the GridFTP server, part of the Globus Toolkit™.  We observe up to 9% improvement in prediction accuracy when compared with approaches based on past system behavior in isolation.
M. R. Paul, M. C. Cross, P. F. Fischer, "Rayleigh-Bénard Convection with a Radial Ramp in Plate Separation," Preprint ANL/MCS-P982-0802, August 2002. Pattern formation in Rayleigh-Bénard convection in a large-aspect-ratio cylinder with a radial ramp in the plate separation is studied analytically and numerically by performing numerical simulations of the Boussinesq equations. A horizontal mean flow and a vertical large scale counterflow are quantified and used to understand the pattern wavenumber. Our results suggest that the mean flow, generated by amplitude gradients, plays an important role in the roll compression observed as the control parameter is increased. Near threshold the mean flow has a quadrupole dependence with a single vortex in each quadrant, while away from threshold the mean flow exhibits an octupole dependence with a counter-rotating pair of vortices in each quadrant.  This is confirmed analytically by using the amplitude equation and Cross-Newell mean flow equation. By performing numerical experiments the large-scale counterflow is also found to aid in the roll compression away from threshold but to a much lesser degree. Our results yield an understanding of the pattern wavenumbers observed in experiment away from threshold and suggest that near threshold the mean flow and large scale counterflow are not responsible for the observed shift to smaller than critical wavenumbers.
R. K. Madduri, C. S. Hood, and W. E. Allcock, "Reliable File Transfer in Grid Environments," Preprint ANL/MCS-P983-0802, August 2002. Grid-based computing environments are becoming increasingly popular for scientific computing.  One of the key issues for scientific computing is the efficient transfer of large amounts of data across the Grid.  In this poster we present a Reliable File Transfer (RFT) service that significantly improves the efficiency of large-scale file transfer.  RFT can detect a variety of failures and restart the file transfer from the point of failure.  It also has capabilities for improving transfer performance through TCP tuning.
M. Anitescu and G. D. Hart, "A Fixed-Point Iteration Approach for Multibody Dynamics with Contact and Small Friction," Math. Program., Ser. B (to appear).  Also Preprint ANL/MCS-P985-0802, August 2002, revised February 2004. Acceleration-force setups for multi-rigid-body dynamics are known to be inconsistent for some configurations and sufficiently large friction coefficients (a Painleve paradox).  This difficulty is circumvented by time-stepping methods using impulse-velocity approaches, which solve complementarity problems with possibly nonconvex solution sets.  We show that very simple configurations involving two bodies may have a nonconvex solution set for any nonzero value of the friction coefficient.  We construct two fixed-point iteration algorithms that solve convex subproblems and that are guaranteed, for sufficiently small friction coefficients, to retrieve, at a linear convergence rate, the unique velocity solution of the nonconvex linear complementarity problem whenever the frictionless configuration can be disassembled.  In addition, we show that one step of one of the iterative algorithms provides an excellent approximation to the velocity solution of the original, possibly nonconvex, problem if for all contacts we have that either the friction coefficient is small or the slip velocity is small.
E. D. Dolan, R. Fourer, J.-P. Goux, and T. S. Munson, "Kestrel: A Callable Interface to the NEOS Server," Preprint ANL/MCS-P986-0802, August 2002. The NEOS Server provides access to a variety of optimization packages via the Internet.  The new Kestrel interface to the NEOS Server extends the server's capabilities by permitting local programs to request optimization services and retrieve results.  As a result, a locally running modeling system can have much the same access to remote NEOS solvers as to those installed locally.  Kestrel clients have been implemented for the AMPL and GAMS modeling environments.  Extensions to the client designs enable them to queue subproblems for execution and later retrieval of results, making possible a limited form of parallel processing.  The creation of the Kestrel interface has also led to enhancements in the NEOS Server for submitting binary files and for tracking progress via the Web.
A. Majumder and R. Stevens, "Color Non-Uniformity in Projection Based Displays: Analysis and Solutions," Preprint ANL/MCS-P989-0802, August, 2002. Large-area displays made up of several projectors show significant variation in color.  The causes of this color variation are many and complex.  The device-dependent reasons include the color variations within a single projector's field of view and across different projectors, while the device independent reasons include illumination conditions, nature of the display material, shape of the display surface, and color bleeding or inter-reflections.  In this paper, we study the device dependent color variation of a multi-projector display and then present a method to solve for it.  First, we identify the different projector parameters, like position, zoom, axis of alignment, lamp age and projector controls such as brightness, contrast and white balance, that cause such variation.  Next, we study the change in the luminance and chrominance characteristics of the projection based display with the change in these parameters.  From this elaborate analysis we find that the luminance varies significantly within and across projectors while chrominance variation is relatively small, especially across projectors of same brand.  Based on this observation, we present a method to do a per channel per pixel luminance matching.  Our method consists of a one-time calibration procedure when a luminance attenuation map (LAM) is generated.  This LAM is then used to correct any image to achieve photometric uniformity.  In the one-time calibration step, we first use a camera to measure the per channel luminance response of a multi-projector display and find the pixel with the most "limited" luminance response.  Then, for each projector, we generate a per channel LAM that assigns a weight to every pixel of the projector to scale the luminance response of that pixel to match the most limited response.  This LAM is then used to attenuate any image projected by the projector. This method can be extended to do the image correction in real time on traditional graphics pipeline by using alpha blending and color look-up-tables.  To the best of our knowledge, this is the first effort to match luminance across all the pixels of a multi-projector display.  Our results show that luminance matching can indeed achieve photometric uniformity.
B. Nitzberg and M. M. Schopf, "Current Activities in the Scheduling and Resource Management Area of the Global Grid Forum," special issue of Lecture Notes in Computer Science, Springer-Verlag (to appear).  Also Preprint ANL/MCS-P990-0902, Sept. 2002. The Global Grid Forum's Scheduling and Resource Management Area is actively pursuing the standards that are needed for interoperability of Grid resource management systems.  This includes work in defining architectures, language standards, APIs and protocols.  In this article we overview the state of the working groups and research groups in the area as of September 2002.
K. Keahey and V. Welch, "Fine-Grain Authorization for Resource Management in the Grid Environment," Preprint ANL/MCS-P991-0802, August 2002. In this document we describe our work-in-progress for enabling fine-grain authorization of resource management.  In particular, we address the needs of Virtual Organizations (VOs) to enforce their own policies in addition to those of the resource owners.
A. Majumder and R. Stevens, "LAM: Luminance Attenuation Map for Photometric Uniformity in Projection Based Displays," Preprint ANL/MCS-P992-0902, Sept. 2002. Large-area multi-projector displays show significant spatial variation in color, both within a single projector's field of view and across different projectors.  Recent research in this area has shown that the color variation is primarily due to luminance variation.  Luminance varies within a single projector's field of view, across different brands of projectors and with the variation in projector parameters.  Luminance variation is also introduced by overlap between adjacent projectors.  On the other hand, chrominance remains constant throughout a projector's field of view and varies little with the change in projector parameters, especially for projectors of the same brand.  Hence, matching luminance response of all the pixels of a multi-projector display should help us to achieve photometric uniformity.  In this paper, we present a method to do a per channel per pixel luminance matching.  Our method consists of a one-time calibration procedure when a luminance attenuation map (LAM) is generated.  This LAM is then used to correct any image to achieve photometric uniformity.  In the one-time calibration step, we first use a camera to measure the per channel luminance response of a multi-projector display and find the pixel with the most "limited" luminance response.  Then, for each projector, we generate a per channel LAM that assigns a weight to every pixel of the projector to scale the luminance response of that pixel to match with the most limited response.  This LAM is then used to attenuate any image projected by the projector.  This method can be extended to do the image correction in real time on traditional graphics pipeline by using alpha blending and color look-up-tables.  To the best of our knowledge, this is the first effort to match luminance across all the pixels of a multi-projector display.  Our results show that luminance matching can indeed achieve photometric uniformity.
S. Bhowmick, P. Raghavan, L. McInnes, and B. Norris, "Faster PDE-Based Simulations Using Robust Composite Linear Solvers," Future Generation Comput. Syst. 20 (2004), pp. 373-387.  Also Preprint ANL/MCS-P993-0902, Sept. 2002. Many large-scale scientific simulations require the solution of nonlinear partial differential equations (PDEs).  The effect solution of such nonlinear PDEs depends to a large extend on efficient and robust sparse linear system solution.  In this paper, we show how fast and reliable sparse linear solvers can be composed from several underlying linear solution methods.  We present a combinatorial framework for developing optimal composite solvers using metrics such as the execution times and failure rates of base solution schemes.  We demonstrate how such composites can be easily instantiated using advanced software environments.  Our experiments indicate that overall simulation time can be reduced through highly reliable linear system solution using composite solvers.
I. Foster, F. Siebenlist, S. Tuecke, V. Welch, "Security and Certification Issues in Grid Computing," Preprint ANL/MCS-P994-0902, Sept. 2002. Grid computing is concerned with the sharing and coordinated use of diverse resources in dynamic, distributed "virtual organizations."  The dynamic nature of Grid environments introduces challenging security concerns that demand new technical approaches.  In this brief overview, we review key Grid security issues and outline the technologies that are being developed to address those issues.  We focus in particular on work being done within the context of the Open Grid Services Architecture, a new initiative aimed at recasting key Grid concepts within a service-oriented framework.  This work involves a tight integration with Web services mechanisms and appears particularly relevant to the concerns of e-services.
M. Anitescu and G. D. Hart, "Solving Nonconvex Problems of Multibody Dynamics with Contact and Small Friction by Successive Convex Relaxation," Mech. Des. Struct. Mach. 31(3) (2003), pp. 335-356.  Also Preprint ANL/MCS-P995-0902, Sept. 2002. Time-stepping methods using impulse-velocity approaches are guaranteed to have a solution for any friction coefficient, but they may have nonconvex solution sets.  We present an example of a configuration with a nonconvex solution set for any nonzero value of the friction coefficient.  We construct an iterative algorithm that solves convex subproblems and that is guaranteed, for sufficiently small friction coefficients, to retrieve, at a linear convergence rate, the velocity solution of the nonconvex linear complementarity problem whenever the frictionless configuration can be disassembled.  In addition, we show that one step of the iterative algorithm provides an excellent approximation to the velocity solution of the original, possibly non-convex, problem if the product between the friction coefficient and the slip velocity is small.
L. Yang, I. Foster, and J. M. Schopf, "Homeostatic and Tendency-based CPU Load Predictions," Preprint ANL/MCS-P997-0902, Sept. 2002. The dynamic nature of a resource-sharing environment means that applications must be able to adapt their behavior in response to changes in system status.  Predictions of future system performance can be used to guide such adaptations.  In this paper, we present and evaluate several new one-step-ahead and low-overhead time series prediction strategies that track recent trends by giving more weight to recent data.  We present results that show that a dynamic tendency prediction model with different ascending and descending behavior performs best among all strategies studied.  A comparative study conducted on a set of 38 machine load traces shows that this new predictor achieves average prediction errors that are between 2% and 55% less (36% less on average) than those incurred by the predictors used within the popular Network Weather Service system.
L. McInnes, B. Norris, S. Bhowmick, P. Raghavan, "Adaptive Sparse Linear Solvers for Implicit CFD using Newton-Krylov Algorithms," Preprint ANL/MCS-P998-0902, Sept. 2002. We consider the simulation of three-dimensional transonic Euler flow using pseudo-transient Newton-Krylov methods.  The main computation involves solving a large, sparse linear system at each Newton (nonlinear) iteration.  We develop a technique for adaptively selecting the linear solver method to match better the numeric properties of the linear systems as they evolve during the course of the nonlinear iterations.  We show how such adaptive methods can be implemented using advanced software environments, leading to significant improvements in simulation time.
C. Liu, L. Yang, I. Foster, D. Angulo, "Design and Evaluation of a Resource Selection Framework for Grid Applications," Proc. Int'l Symp. on High-Performance Distributed Computing (HPDC-11), Edinburgh, July 2002.  Also Preprint ANL/MCS-P1001-1002, October 2002. While distributed, heterogeneous collections of computers ("Grids") can in principle be used as a computing platform, in practice the problems of first discovering and then configuring resources to meet application requirements are difficult problems.  We present a general-purpose resource selection framework that addresses these problems by defining a resource selection service for locating Grid resources that match application requirements.  At the heart of this framework is a simple but powerful declarative language based on a technique called set matching, which extends the condor matchmaking framework to support both single resource and multiple resource selection.  This framework also provides an open interface for loading application-specific mapping modules to personalize the resource selector.  We present results obtained when this framework is applied in the context of a computational astrophysics application.  These results demonstrate the effectiveness of our technique.
M. Anitescu and G. D. Hart, "A Constraint- Stabilized Time-Stepping Approach for Rigid Multibody Dynamics with Joints, Contact and Friction," Int. J. Numer. Meth. Engng 60 (2004), pp. 1335-2371.  Also Preprint ANL/MCS-P1002-1002, October 2002. We present a method for achieving constraint stabilization for a linear-complementarity-based time-stepping scheme for multi-rigid-body dynamics with joints, contact and friction.  The method requires the solution of only one linear complementarity problem per step.  We show that under certain assumptions, that include the limited differentiability of the mappings governing the noninterpenetration and joint constraints; the pointed friction cone assumption; and at most linear growth of the external forces, the velocity is bounded for a sufficiently small size of the timestep over a fixed time-interval and the geometrical constraint infeasibility at step (l) is bounded above by a constant multiple of the square of the time-step and the square of the norm of the current value of the velocity.  If, in addition, the velocity is uniformly bounded with respect to the time interval of the simulation, then the geometrical constraint infeasibility is bounded by the same bound irrespective of the time interval of the simulation.
G. X. Yu and E. Marland, "Establishment of a Knowledge Base for Function Annotations in High-Throughput Sequence Analysis," Preprint ANL/MCS-P1004-1002, Oct. 2002. Motivation: The rapid accumulation of sequence data and information describing regulatory and metabolic networks has triggered the development of integrated systems for genome sequence analysis.  However, a great deal of uncertainty exists in the annotations found in these systems because of the heterogeneities in the public databases and limitations in current computational approaches.  Conflicts in assignments based on different computational tools add additional uncertainty to the annotations, and the situation is compounded by a lack of tools for cross-verification.  These uncertainties have greatly affected the performance of genome analysis systems, specifically with regard to the accuracy of functional assignments to the genes.  In order to minimize the effect of these uncertainties, a biological knowledge base is needed to provide rules for guiding function annotations and a global reference system for cross-verification of the results obtained by analysis using different computational tools.

Results:  In this study, we have developed a rule-based knowledge system specifically for automated high-throughput genetic sequence analysis.  It includes 22,612 protein function groups and their evolutionary spaces (distributions), which are characterized by protein sequence conservations, the phylogenetic distribution of protein motifs and domains, and their relationships to biological functions.  Our knowledge base demonstrates that tremendous variations exist among protein functional groups.  Over half of the protein functional groups are highly diversified in sequence similarities (53.6%, and 51.4% in Blast and Blocks measurements, respectively).  With regard to protein relationships, we found that Pfam patterns have much higher resolution and broader coverage than Blocks families.  Out of 10,604 protein functional groups that Blocks covered, 811 (7.6%) can be uniquely identified.  In contrast, Pfam patterns cover 13,803 significant protein functional groups, and 1,899 (almost 14%) of them have unique identifiers.  However, most of the relationships between protein functions and protein families or Pfam patterns are complex.  Each of the protein families or Pfam patterns can correspond to multiple functions or vice versa.  Hence, these families or patterns need to be further defined or additional tools introduced so that each function can be identified through its own unique set of features.  One of the important applications of our knowledge base is cross-verification of protein function annotations obtained by different computational tools.  Additional applications of this knowledge base are discussed in the paper.

R. Thakur and W. Gropp, "Improving the Performance of MPI Collective Communication on Switched Networks," Preprint ANL/MCS-P1007-1102, November 2002. In this paper, we present new algorithms for improving the performance of collective communication operations in MPI.  Our target architecture isa cluster of machines connected by a switched network such as Myrinet or switched ethernet.  We have developed new algorithms for all the MPI collective communication operations, namely, scatter/gather/reduce, allgather/allreduce, broadcast, reduce-scatters, all-to-all, and scan.  We compare the performance of our new algorithms with the algorithms currently used in the latest version of MPICH on up to 256 nodes of a Myrinet-connected cluster.  For operations such as scatter/gather/reduce, allgather/allreduce, and reduce-scatter, we observe an improvement of up to a factor of 10 for short messages sizes.  For operations such as broadcast and reduce-scatter and for long messages sizes, the new algorithms are truly scalable: the time taken remains fairly constant as we increase the number of processes participating in the operation.
J. M. Restrepo and G. K. Leaf, "Noise Effects on Wave-Generated Transport Induced by Ideal Waves," J. Physical Oceanography 32 (2002), pp. 2334-2349.  Also Preprint ANL/MCS-P1008-0102, Jan. 2002. We consider the transport velocity in boundary layer flows driven by either noisy monochromatic progressive or standing waves.  The central issue addressed here is whether such flows are capable of sustaining a transport velocity when noise is present in the wave field and, if so, in what ways the noise affects the transport velocity, the mean wall shear stress, and the total mass flux.  Specifically, we address the effect of noise due to unresolved processes.  Our study is motivated by the fact that in the natural setting it is the norm rather than the exception that noise is present in the wave field.  We find that when noise is added to standing waves, the transport in the boundary layer leads to a nonzero mass flux.  On the other hand, noise due to progressive waves reduces the mass flux.  Further, we find that the drift velocity will have two components: a deterministic one and a diffusive one.
K. Amin, S. Nijsure, G. von Laszewski, "Open Collaborative Grid Service Architecture (OCGSA)," Preprint ANL/MCS-P1009-1102, Nov. 2002. In this paper we introduce a new architecture, called Open Collaborative Grid Services Architecture (OCGSA) that serves as the building block for designing collaborative services within Grids.  The framework is based on the Open Grid Services Architecture (OGSA) that combines aspects of Grids with Web services.  Using OGSA enables users to access advanced Grid features such as lifetime management, notification, and security that are otherwise not available in Web services.  Our framework provides high-level Grid services that can be integrated within future collaborative applications, hence making them automatically Grid aware.  Besides the definition of such useful advanced collaborative services, we use this architecture to validate the approach taken in OGSA and verify whether necessary collaborative services can be implemented with the help of OGSA.  Use of these services will also enable the creation of peer-to-peer collaborative infrastructures.
M. Vilayannur, R. B. Ross, P. H. Carns, R. Thakur, A. Sivasubramaniam, M. Kandemir, "Improving the Performance of the POSIX I/O Interface to PVFS," Preprint ANL/MCS-P1010-1102, Nov. 2002. The ever-increasing gap in performance between CPU/memory technologies and the I/O subsystem (disks, I/O buses) in modern workstations has exacerbated the I/O bottlenecks inherent in applications that access large disk resident data sets. A simultaneous development in recent times has seen the maturity of Linux-based off-the-shelf clusters of PCs for low-cost, high-performance computing solutions. A common technique to alleviate the I/O bottlenecks on such platforms is the use of parallel file systems. One such parallel file system is the Parallel Virtual File System (PVFS), which is a freely available tool to achieve high-performance I/O on Linux-based clusters.

In this paper, we describe some of the key performance and scalability improvements that we have implemented for the UNIX I/O interface to PVFS. To illustrate the performance gains, we present experimental results using Bonnie++, a commonly used file system benchmark to test file system throughput; a synthetic parallel I/O application for calculating aggregate read and write bandwidths; and a synthetic benchmark which calculates the time taken to untar the Linux kernel source tree to measure performance of large number of small file operations. We also compare the I/O performance of these techniques when using a Myrinet-based network and when using a fast Ethernet-based network for I/O-related communications. With these techniques we achieve aggregate read and write bandwidth as high as 550 MB/s with Myrinet and 160 MB/s with fast Ethernet.
J. N. Lyness and G. Monegato, "Asymptotic Expansions for Two-Dimensional Hypersingular Integrals," Preprint ANL/MCS-P1011-1102, November 2002. We define and examine two-dimensional hypersingular integrals on [0,1)2 and on [0,\infty)2 and relate their Hadamard finite-part (HFP) values to Mellin transforms.  The integrands have algebraic singularities of a possibly unintegrable nature on the axes and at the origin.  Extending our work on one-dimensional integrals report in 1998, we obtain variants of the classical Euler-Maclaurin expansion for various two-dimensional integrals.  In many cases, the constant term in the expansion (which is not necessarily the leading term) provides the value of the HFP integral.  These expansions may be used as the basis for the numerical evaluation of a class of HFP integrals by extrapolation.
   
J. N. Lyness and T. Sorevik, "Four-Dimensional Lattice Rules Generated by Skew-Circulant Matrices," Math. Comput. 73 (245), (2003), pp. 279-295.  Also Preprint ANL/MCS-P1012-0702, July 2002. We introduce the class of skew-circulant lattice rules.  These are s-dimensional lattice rules that may be generated by the rows of an s×s  skew-circulant matrix.  (This is a minor variant of the familiar circulant matrix.)  We present briefly some of the underlying theory of these matrices and rules.  We are particularly interested in finding rules of specified trigonometric degree d.  We describe some of the results of computer-based searches for optimal four-dimensional skew-circulant rules.  Besides determining optimal rules for δ = d+1 ≤ 47, we have constructed an infinite sequence of rules Q(4,δ) that has a limit ρ index of 27/34 ≈ 0.79.  This index is an efficiency measure, which cannot exceed 1, and is inversely proportional to the abscissa count.
   
J. N. Lyness, "Notes on Lattice Rules," Journal of Complexity (to appear).  Also Preprint ANL/MCS-P1013-1102, November 2002. An elementary introduction to lattices, integration lattices, and lattice rules is followed by a description of the role of the dual lattice in assessing the trigonometric degree of a lattice rule.  The connection with the classical lattice-packing problem is established:  Any s-dimensional cubature rule can be associated with an index \rho = (\delta)s/s!N, where \delta is the enhanced degree of the rule and N its abscissa count.  For lattice rules, this is the packing factor of the associated dual lattice with respect to the unit s-dimensional octahedron.  An individual cubature rule may be represented as a point on a plot of \rho against \delta.  Two of these plots are presented.  They convey a clear idea of the relative cost-effectiveness of various individual rules and sequences of rules.
L. Wos, "Double-Negation Elimination in Some Propositional Logics," Studia Logica (to appear).  Also Preprint ANL/MCS-P1014-1202, December 2002. This article answers two questions (posed in the literature), each concerning
the guaranteed existence of proofs free of double negation. A proof is free of double negation if none of its deduced steps contains a term of the form n(n(t)) for some term t, where n denotes negation.  The first question asks for conditions on the hypotheses that, if satisfied, guarantee the existence of a double-negation-free proof when the conclusion is free of double negation. The second question asks about the existence of an axiom system for classical propositional calculus whose use, for theorems with a conclusion free of double negation, guarantees the existence of a double-negation-free proof. After giving conditions that answer the first question, we answer the second question by focusing on the Lukasiewicz three-axiom system. We then extend our studies to infinite-valued sentential calculus and to intuitionistic logic and generalize the notion of being double-negation free. The double-negation proofs of interest rely exclusively on the inference rule condensed detachment, a rule that combines modus ponens with an appropriately general rule of substitution. The automated reasoning program OTTER played an indispensable role in this study.
M. P. Friedlander and M. A. Saunders, "A Globally Convergent Linearly Constrained Lagrangian Method for Nonlinear Optimization," SIAM J. Optimization 15(3) (2005), pp. 863-897.  Also Preprint ANL/MCS-P1015-1202, December 2002. For optimization problems with nonlinear constraints, linearly constrained Lagrangian (LCL) methods solve a sequence of subproblems of the form "minimize an augmented Lagrangian function subject to linearized constraints''.  Such methods converge rapidly near a solution but may not be reliable from arbitrary starting points. The well known software package MINOS has proven effective on many large problems. Its success motivates us to propose a variant of the LCL method that possesses three important properties: it is globally convergent, the subproblem constraints are always feasible, and the subproblems may be solved inexactly.  The new algorithm has been implemented in Matlab, with the option to use either the MINOS or SNOPT Fortran codes to solve the linearly constrained subproblems. Only first derivatives are required. We present numerical results on a nonlinear subset of the COPS, HS, and CUTE test problems, which include many large examples. The results demonstrate the robustness and efficiency of the stabilized LCL procedure.
   
L. Pearlman, V. Welch, I. Foster, C. Kesselman, S. Tuecke, "A Community Authorization Service for Group Collaboration," Preprint ANL/MCS-P1042-0502, May 2002. In "Grids" and "collaboratories," we find distributed communities of resource providers and resource consumers, within which often complex and dynamic policies govern who can use which resources for which purpose.  We propose a new approach to the representation, maintenance, and enforcement of such policies that provides a scalable mechanism for specifying and enforcing these policies.  Our approach allows resource provides to delegate some of the authority for maintaining fine-grained access control policies to communities, while still maintaining ultimate control over their resources.  We also describe a prototype implementation of this approach and an application in a data management context.