Project DataSpace
R. Grossman, E. Creel, M. Mazzucco, S. Connelly, A. Turinsky, H. Sivakumar, S. Wahlstrom
University of Illinois at Chicago

B. Hollebeek, P. Proropapas
PENN

R. Williams
CalTech

B. Irwin
NCAR

D. Rocke, T. Arons
University of California Davis

Y. Guo, S. Hedvall
Imperial College, London

P. Milne, G. Williams
ACYys, Canberra

G. Becker R. Grossman, J. Hubshman, W. Martinez
Magnify Research, Inc.

Description:
Today the web provides an infrastructure for the remote viewing of multi-media documents, but does not provide a similar infrastructure for remotely exploring data. DataSpace is our attempt to provide such an infrastructure. DataSpace supports a) remote data access, analysis, and mining, and b) distributed data analysis and mining. DataSpace has several components: 1) The dataspace transfer protocol (dstp), a protocol for moving data over the web using both commodity and high performance networks. 2) The Predictive Model Markup Language (pmml), an XML languages for handling some of the metadata required for DataSpace. 3) The Predictive Scoring and Update Protocol (psup), a protocol for event driven, real time scoring. 4) Open source dstp servers for making data easily available to visitors to the data web. 5) Open source dstp clients for viewing remote data and mining data which is distributed over the data web.

The Terabyte Challenge, which is the testbed for Project DataSpace, will link 12 sites across 5 continents and demonstrate a variety of DataSpace applications which will publish, access, analyze, correlate and manipulate remote and distributed data.

For example, in the Sky Survey application, the DSTP Client downloads stellar object catalog data from a dstp Server, creates a machine learning model based on pmml, and scores large amounts of data at high rates using high performance dstp applications we have developed. Last year, we were able to move 250 Mbits/sec (~113 GB/hr) over the testbed to the floor of the Supercomputing Conference in Portland with no network tuning. We expect even higher rates at this year's Supercomputing Conference using a new release of our sofware.

The Network Storm application will demonstrate the flexibility of the dstp protocol. Application servers will be set up on 3 continents to collect network traffic data. Any dstp client can download data and view the state of the network. Ultimately the group plans to build an infrastructure that will predict network storms and allow for improved network traffic management.

We will also demonstrate several other high performance DSTP applications.