Bioscience
Manuscript accepted February 2000

Evolution of a Multi-Site Network Information System:
the LTER Information Management Paradigm

K.S.Baker, B.Benson, D.L.Henshaw, D. Blodgett, J.Porter, S.G.Stafford

Karen S. Baker, Palmer LTER Data Manager
University of California, San Diego (UCSD)
Scripps Institution of Oceanography
2252 Sverdrup Hall
La Jolla, CA 92093-0218, USA
(kbaker@ucsd.edu)
Phone:619-534-2350 Fax: 619-534-2997

Other Authors:

Barbara J. Benson, North Temperate Lakes LTER Data Manager
University of Wisconsin-Madison
Center for Limnology
680 N. Park Street
Madison, WI 53706-1492
(bjbenson@facstaff.wisc.edu)

Don L. Henshaw, Andrews Forest LTER Data Manager
USDA Forest Service
Pacific NW Research Station
3200 SW Jefferson Way
Corvallis, OR 97331
(henshaw@fsl.orst.edu)

Darrell Blodgett, Bonanza Creek LTER Data Manager
University of Alaska
Forest Soils Laboratory
305 O'Neill Building
Fairbanks, AK 99775-0740
(blodgett@taiga.lter.alaska.edu)

John H. Porter, Virginia Coast Reserve LTER Data Manager
University of Virginia
Department of Environmental Sciences
Clark Hall
Charlottesville, VA 22903
(jhp7e@virginia.edu)

Susan G. Stafford, Professor and Department Chair
Colorado State University
Department of Forest Sciences
College of Natural Resources
Ft. Collins, CO 80523-1470
(stafford@cnr.colostate.edu)

Introduction

Pelagic polar marine, temperate coniferous forest, urban watershed, coastal estuary, eastern deciduous forest, tropical rain forest, and tallgrass prairie -- these are just a few of the ecosystems represented in the LTER (Long-Term Ecological Research) Network of 21 sites (Franklin et al., 1990). These diverse ecosystems provide the broad perspective necessary for investigating complex phenomena such as climate change, biodiversity and soil dynamics and, thereby, for addressing environmental policy. The sharing and cognitive processes inherent in multi-investigator, multi-site research investigations of ecosystem studies are supported by the LTER Network through organizational infrastructure in general and data management in particular.

Many scientific studies today are no longer limited to a single time or location. Researchers examine many different temporal and spatial scales and often need to integrate data from related projects and additional research sites. Information systems support research and collaboration by facilitating data exchange and synthesis. The LTER Network promotes partnerships among ecologists and data managers to build information systems that electronically unite disparate types of information and also provide a coherent framework to store and access data sets.

The experience of LTER data managers in designing a networked information system for a field-based discipline provides the biological community with a model for developing multi-site information systems. Other examples which provide insight into the complex organizational and communication infrastructure needed for partnership efforts include those with specific research focus such as the Worm Community System and the Flora of North America Project (FNAP) as well as those with broader biological concerns such as the Organization of Biological Field Stations (OBFS). Prior to the availability of internet connectivity, the Worm Community System was developed as a collaborative software environment to share information on the genetics, behavior and biology of the nematode c.elegans within a dispersed community of more than 1,400 researchers. Attention was given to both the design and the analysis of this system's structure and use (Star and Ruhleder, 1996). In contrast, the FNAP development occurred after internet technology was available. The FNAP, with the specific goal of identifying and cataloging all plant species, focuses on the use of online technology to create an electronic community of authors able to write, edit, review and publish cooperatively (Tomlinson et al., 1996). More recently, OBFS, representing a community of more than 150 biological field station members with diverse environmental concerns, has initiated plans to coordinate electronically, making use of community partnerships and pilot projects (Stafford and McKee, 1999).

The LTER Network Information System (NIS) is a cooperative effort that accommodates local site independence and a flexible, modular design. Here we describe the evolution of the NIS, highlighting important components and presenting specific examples of software modules that enable cross-site data integration. We also describe how the LTER NIS working style fosters cross-site communication, technology transfer, and an interactive, participatory approach to information management within our community.

Background

Scientists today have increased data gathering capabilities due to new instrumentation and storage capabilities. As a result, the fields of biology and ecology, like many experimental sciences, are data rich. This richness challenges us to organize data and ancillary information in ways that make them readily and meaningfully accessible. Long-term data management goals include assuring that data are available on a long-term basis and are easily accessible; minimizing interpretation difficulties by addressing data quality and assuring documentation; and facilitating synthesis of information by structuring data sets so that they are comparable (Gurtz, 1986, NRC, 1995; Porter and Callahan, 1994; Stafford et al., 1996). Meeting these goals is requisite for the success of long-term research, multidisciplinary projects, and cross-site studies.

The rapid increase in electronic connectivity, as well as in available information, have increased the breadth, pace, and expectations of science. There is a trend away from practices that sequester data in office notebooks and make them available only to local staff for a single scientific objective. There is a developing emphasis on cross-site scientific studies often involving disparate data sets. As a result, funding agencies are increasingly focusing on the need for data management. For example, NSF policy, which now requires data availability within a two to three year period after project completion, establishes the quality of data management techniques as a review criterion for LTER sites.

Data manager. With changes in expectations and funding, the responsibilities of local data-handling personnel have broadened. Figure 1 illustrates the flow of data from an individual or site to the ultimate destination, community use. In addition to fulfilling current data needs, data management now stresses future access and data integration (Stafford et al., 1994) with a heightened focus on two tasks: the interface with technology and the balance of scope. The data manager interface role may be described as facilitator, translator and/or converter (Stonebraker, 1994; Bowker et al., 1997; Kies et al., 1998), and involves knowledge of both the field-science arena and information technology in order to integrate effectively data and meta-data across studies (see Sidebar 1). A data manager brings an expanded vocabulary to this particular task. Vocabulary, describing changes introduced by technology (Kies et al., 1998; Kay, 1998), is a tool that can either launch or limit efforts (Sidebar 2). The site data manager's second task in broadened responsiblities involves the balance or flexibility needed to respond to local data needs and exceptions while taking into consideration community databases. Balances include specific-scientist versus general-user requirements, short-term versus long-term data handling and local versus generic design methods. It is a distinct advantage when a site's data manager takes part in computer-based data handling design because this individual is familiar with the specific needs of the local scientific community. The data manager's understanding of both explicit and implicit or unspoken site information helps establish the balances between competing needs when resources are limited. Achieving flexibility and balance requires planning and introduces complexity, but imbalance incurs a higher cost: data sets and information systems that go unused.

The partnership between science and data management has influenced studies such as ecosystem variability (Kratz et al., 1995), prairie climatology (Strebel et al., 1994), vegetation (Riera et al., 1998), climate (Greenland and Swift, 1991) and ice (Magnuson et al., 1999). The development of an ice phenology database by the Lake Ice Analysis Group (LIAG) for over 750 lakes and rivers of the Northern Hemisphere provides a specific example of how data management contributes to data structuring and facilitates cross-site research. This database, used to explore the usefulness of ice data as a paleoclimate indicator, was developed for a 1996 workshop with 28 international participants organized by the North Temperate Lakes (NTL) LTER site. Prior to the workshop, project scientists worked closely with the NTL information management staff to identify variables of scientific interest as well as formats for construction of standard data sets. These often time-consuming activities required close collaboration between scientists and data managers but were critical steps in database design and workshop preparation. Discussions defining the data set stimulated development of important descriptive information about the data which increases the value of individual data sets by providing a broader context. Thus, a data set contribution, which might consist minimally of lake name and annual date of lake freeze, was augmented for submission to the ice phenology database with standard codes to describe the continent and the water body type in addition to specification of the latitude, longitude, elevation, mean depth, shore line length, population of largest city and ancillary weather data. Further discussions were necessary to decide whether to define 'ice covered' as total or partial cover and to define duration of the cover in the cases of thawing and refreezing. Instructions were developed to distinguish between the observation that the lake did not freeze and the case where an observation was not made.

Early LTER cross-site efforts made by the lead scientists in the 1980's to obtain data from sites often actually involved a visit to a site. For the ice phenology workshop, specifications were distributed electronically to participants prior to the workshop. Actual data submissions required ongoing communications between information management personnel and contributors in a majority of cases. Not all research groups were able to submit data electronically (for instance, data was faxed) or in the specified standard format (for instance, the date was written with characters or roman numerals rather than numbers). Information management staff were able to transform data efficiently into standardized formats. Further, data checking was performed by data managers to aid scientists in addressing quality assurance issues.

By the time of the workshop, a significant portion of the data were available for analysis on site. In earlier LTER cross-site efforts, scientists spent a significant amount of time integrating the diverse data sets into Excel spreadsheets that then supported subsequent analyses. In contrast, the ice phenology workshop data sets were incorporated into a relational database. At the workshop, data views were available providing an initial point of departure for the group. Subsequently, the availability of the database at the workshop supported the generation of important research questions as the scientists interacted. For instance, researchers were able to select readily subsets of water bodies with lengths of record or spatial locations best suited to their particular question, thus maximizing individual productivity. A policy limiting data access addressed researcher concerns on data sharing and following the workshop, the data were accessible to the LIAG group via the World Wide Web promoting continued updates and communications. The diverse studies possible from this single database resulted in a series of papers presented at the 27th Congress of the International Association of Theoretical and Applied Limnology in 1998 (Magnuson et al., in press). The preparations required to make data comparable, deal with multiple formats, structure data in a database form, and provide access to data and derived data resulted in a rich body of scientific publication and a robust database available for further future queries. Without personnel and an adequate system for managing data, the magnitude of the effort needed to deal with the issues of large and complex data sets can become a barrier to undertaking cross-site research.

Data management developments. Scientific progress and efficiency increase when data are available electronically from colleagues, libraries and archives because individual research units can devote more time to analysis and synthesis (Ingersoll et al., 1997) when time given to data access can be minimized. However, availability does not necessarily imply utility. Useful data must be of known quality and so must be well described by meta-data, a term referring to documentation about the data. With organizations spending considerable resources on data collection, there is growing attention to the development and implementation of meta-data standards for data archival (Gross et al., 1995). For instance, the Federal Geographic Data Committee (FGDC, 1995) created meta-data standards as part of the National Information Infrastructure (NII) efforts. National and international master directory efforts are beginning to catalog the location of data sets. The Global Change Master Directory (GCMD), supported by the National Aeronautics and Space Administration (NASA), is one example of an earth science database directory.

Traditionally, peer-reviewed publications have been the science community's data archive, but publications preserve only a subset of the authors' data. Complete data sets, when available electronically, allow future researchers to address alternative hypotheses as well as unanticipated questions. Both the high cost of properly documenting data for electronic archives and the lack of a reward structure for supporting these efforts are significant deterrents to making data available (Michener et al., 1997). Incentives for openly exchanging data among ecologists, such as venues for publishing data sets and peer-reviewed papers about data sets, will help to alleviate this resistance (Olson et al., 1996).

In addition to publication and traditional short-term data storage by individuals, there are a variety of other data archival methods (Olson and McCord, 1998). Scientists have available national data archives which emphasize data set identification, lineage, storage and redistribution. The National Aeronautic and Space Administration Earth Observing System Data Information System (EOSDIS) is one example of a national archive system which includes interconnected Distributed Active Archive Centers. The National Oceanic and Atmospheric Administration (NOAA) is another example with interconnected centers including the National Environmental Satellite, the Data and Information Services (NESDIS), the National Oceanographic Data Center (NODC), and the National Climatic Data Center (NCDC).

As another alternative, individual research sites with information systems are becoming stewards of long-term data repositories. Local site storage has the advantage of keeping the resource (the data) close to the source (the scientist). Data stored locally are readily available for updates as data analysis and synthesis proceed. A local repository promotes continuing dialogue of the researcher with the data manager about database entry, quality assurance and data analysis. It may also promote inclusion of full data lineage, in addition to data sets, such as literature references and published texts as well as informal documents such as project plans, proposals, newsletters, brochures, and preliminary results.

Other countries also have begun to focus on data management and to establish data sharing networks at the national level. Two examples are the Environmental Change Network (ECN) in the United Kingdom (Cuthbertson, 1993; Lane, 1997) and the Ecological Monitoring and Assessment Network (EMAN) in Canada (Canadian Global Change Program, 1995). With a growing recognition of the need to coordinate networks on a larger scale (IGBP, 1990), the International LTER (ILTER) network promotes development of national ecological networks and provides an international forum for global networking (Franklin, 1994) as well.

Data management paradigms. A variety of data management paradigms exist to address issues associated with the processing and synthesis of data. Regardless of approach, all of these paradigms are faced with the challenges of scaling up (Brown, 1994; Stafford, 1994; Robbins, 1995; Olson et al., 1999), dealing both with broader spatial (regional and global) and expanded temporal (from short-term time series to paleo records) data scales. Scaling up in a network arena means interfacing successfully with other sites and additional networks. A number of efforts have addressed the interface of field science with computer science (Gurtz, 1986; Stonebraker, 1994; Thorley and Trathan, 1994; Strebel et al., 1994; Michener et al., 1994; Robbins, 1995). The challenges associated with data synthesis have been addressed by partnerships focusing on discipline or theme (i.e., ecology, Gosz, 1994; oceanography, Flierl, 1992; worm community, Schatz, 1993, Kouzes et al., 1996, Star and Ruhleder, 1996; remote sensing in modeling, Vande Castle, 1991; prairie climatology, Strebel et al., 1998), on region (i.e., Antarctic, ICAIR, 1993), on task (i.e., management, Bannon, 1996) and on public policy (i.e., Antarctic ocean resources, CCAMLR).

Another pivotal issue in multi-contributor databases is the balance of local-versus-centralized management since the design as well as the implementation of data management functions may be carried out at either local or central locations. Some multi-national efforts, such as the global terrestrial observing system (GTOS; Heal et al., 1993) and the Antarctic Data Centers in support of the Antarctic Treaty (SCAR/COMNAP, 1996; Agnew, 1997) have proposed a networked structure of sites with relatively centralized management for the coordination and distribution of data. In this context, an overview of the multi-national Antarctic BIOMASS data program (Thorley and Trathan, 1994), stressed, in hindsight, the need for a better integration of science and data management efforts. Such an integration is the focus of the National Atmospheric Deposition Program (NADP) (Aubertin et al., 1990), a U.S. multi-agency group funded to oversee the National Trends Network. The NADP is an example of an organization with centralized functions; individual sites are responsible for data collection using well defined procedures and field form completion, while a central laboratory is responsible for quality assurance and and data analysis.

The general organizational structure and the specific data management schema of biology projects are often influenced by a combination of historical development, user community and scientific goals. With respect to data management protocols, a top-down approach can include specification of an initial unifying element. For example, the Human Genome Project (HGP, Pearson and Soll, 1991; Robbins, 1992, 1995) unified structurally part of the molecular biology community through the adoption of a single relational database software to address the well-defined gene sequencing databases. The Human Genome Project was able to project the type and amount of data involved before the project started and so planned carefully an electronic information system using or creating the necessary technology. The Sequoia Project (Dozier, 1992) is another example of initial project unification where coordination centered on the design, extension and adoption of an interactive working environment for earth science projects using specific object-oriented module concepts. Elements of the common work environment flourished past the project end in contrast to the earth science collaborations which remained dispersed. In fact, there are many situations where an abundance of research questions are not well defined at the onset. Platt (1988) suggests that the individuality of living things in the life sciences responding continuously to past and current events creates an abundance of research questions that simply are not able to be well posed. For example, the LTER encompasses ecological databases in which the variable definitions themselves are under development both within the differing site system environments as well as at the network level. The definition of as basic a variable as productivity requires discussion when considering how productivity is measured in a rain forest as compared to a desert or an ocean. Organizational structures will differ from the a priori top-down model when the research variables and questions are in development.

Structures influence the behavior and expectations of research participants by defining how decisions are made. Decision-making strategies can be described as falling within the continuum from autocratic or top-down to democratic or bottom-up. LTER data management decisions are made at many levels permitting a variety of decision-making strategies and promoting negotiation between interested parties across sites, themes and disciplines. The response to a policy query about productivity may be initiated at the LTER network level while the working definition of productivity for a research investigation may be prompted both at the site level by a site-specific research question and at the multi-site level by a working group research question. The bottom-up organizational structure of LTER relies on the collaboration of sites to define research priorities as well as to unify data management protocols and develop specific network-level tools that synthesize information across sites.

Network Information System (NIS)

LTER data management focuses on the task of supporting local-site science efforts while providing opportunities for network coordination beneficial to participants at all levels (Figure 2). The groundwork for LTER support of local-site data management was initiated during two workshops (Gorentz 1992) and further developed at LTER Data Manager Committee meetings held annually since 1988. Additional technology issues have been addressed historically by subcommittees at biannual science Coordinating Committee Meetings. By 1994, individual site data management had clearly evolved to a point at which LTER data managers could begin discussing the need for an integrated network information system (NIS). However, technological developments and system evolution were not primary motivators for the creation of the NIS. Instead, NIS owes its beginning to a 1994 mandate of the LTER Coordinating Committee (the governing body for the U.S. LTER Network, Figure 2) for each site to make available at least one online data set. With the mandate for online data, data management efforts on a network level became imperative.

The LTER NIS focus is summarized in a strategic vision (see Sidebar 3) "to promote ecological science by fostering the synergy of information systems and scientific research" (LTER Data Management Report, 1995). Using this vision as a guide, an implementation plan in 1996 outlined the development of the NIS through 2002 (Brunt and Nottrott, 1996). Three features important to the development of the NIS have been the balance between local site responsibilities and centralized network responsibilities, the modular design, and the process of prototype development.

The data management systems across LTER sites have a considerable degree of diversity which reflects the distinct needs, resources and organizational structures of the individual sites. The range of systems derives from the critical need to facilitate local research and publication (Strebel, et al., 1994). Some of the local data management systems predate site entry into the LTER network, which partially accounts for the wide variety of approaches (Baker, 1996; Baker, 1998; Benson, 1996; Briggs and Su, 1994; Gurtz, 1986; Ingersoll et al., 1997; Porter et al., 1996; Spycher et al., 1996; Veen et al., 1994). Two common organizational structures are centralized models with a full-time site-designated data manager supported by staff and student help or with a part-time site-designated data manager who works with individual research team data managers associated with that site.

Initial LTER network-wide data management efforts focused on mail list services, a personnel directory, a data access policy, an all-site bibliography (Chinn and Bledsoe, 1997), meta-data standards (Michener et al., 1997), and data catalogs (Michener et al., 1990; Porter et al., 1997). Subsequent cross-site data synthesis efforts have included a remote sensing sun photometer project (Vande Castle and Vermote, 1996), a soil roots database (C. Bledsoe, personal communication) and a climate database (Henshaw et al., 1998); these efforts have addressed data coordination from their inception. The growing availability of Internet tools (first Gopher and then the World Wide Web) have played an immediate and significant role in catalyzing the vision of an expanded, more integrated network-level information system.

Network Information System Modules

A modular framework was agreed upon for the LTER NIS so that individuals or working groups could develop prototypes independently. The components of the information system are illustrated in Figure 3. Modules may be divided into two content groups: support and research. The support modules consist solely of meta-data databases. These modules contain information about the research sites or the research data sets. Examples include a personnel directory, site description directory, bibliography and data set catalog. Support modules may be designed and developed independently by data managers. In contrast, research modules are composed of research data and related meta-data. It is important to note: experience has shown that scientific questions are optimum drivers for development of research module prototypes, and that development is best accomplished by having lead discipline specialists working in collaboration with data managers (Stafford et al., 1986). The ongoing collaboration addresses design questions regarding module goals, variable definitions and data structure. Research module examples include cross-site databases of climate, species lists, and net primary productivity. A generic module schema, developed originally for climate data (Figure 4), illustrates the interface of the individual site to the general scientific community with the mediating function of a central site. Prototype and operational components of the LTER NIS are listed in Table 1, and examples of both support and research modules are discussed below.

Climate Database. The methodology for collecting climatic data at LTER sites is generally standardized, following guidelines provided by the LTER Climate Committee (Greenland, 1997). The LTER climate database was one of the first proposed research modules for the LTER NIS since weather is an important parameter in many site and synthesis studies. The module objective is to provide current and comparable climate summaries for each site (Henshaw et al., 1998). Figure 4 illustrates how all data remain under local site control, with data involved in cross-site exchange being provided in a standard exchange format. This approach is well suited to the diversity of LTER sites, because it allows sites to store data in any desired format. A uniform resource locator (URL) for each site is required to identify the location of the exchange-formatted data. The method for producing the exchange format, known as the exchange filter, may take different forms and is determined by the site. The URL may link to a periodically updated static file or to a dynamic script or database program that generates the exchange data from the site's own database upon request.

Web harvesting techniques are used to collect automatically the exchange-formatted data from the site supplied URL on a periodic basis, and the data are deposited into a central relational database. Although quality assurance of the data is conducted locally, subsequent database validation is performed at the central location to check for errors in transmission and data composite. A web interface allows end users to download reports and graphically view the database.

Distribution format views are independent of the database storage structure. Distribution filters, implemented through use of small programs known as common gateway interface (CGI) scripts, create the distribution formats and graphical displays that the end user selects. Distribution filters are invoked through query forms within the web interface. Implementing many parallel formats allows for meeting diverse end-user requirements, and avoids discussions of which distribution format is most appropriate. For example, an initial Climate Committee forum recommended a single-variable matrix format while a subsequent climate workshop recommended a cross-site multi-variable matrix format (Bledsoe et al. 1996). Both formats have been implemented, and new distribution filters may be added without affecting previously established formats.

The climate prototype was developed at the North Temperate Lakes LTER site using Oracle relational database software with access through a web interface (Stubbs and Benson, 1996). The final climate prototype was moved to the LTER network office recently where it was implemented in a Microsoft SQLServer relational database. Portability issues, considered during the design phase, affect the ease of prototype transfer from development site to full production site. The sites participating in the climate module benefit by not having to develop similar presentation report capabilities for their local climate data. Individual site investigators as well as outside community users can use this network climate database to access both a single site's data as well as the data from a group of sites in a common format. Such benefits to local sites provide incentives that promote participation in research module development efforts. Currently, there are 15 contributing sites with a minimum dataset defined to include the years 1991-1995 while on-going development focuses on climate meta-data.

Data Catalog. A network-level LTER data catalog provides a method for locating data at the LTER sites by creating a table of contents of the more than 2000 data sets online. Catalog entires are derived from site meta-data documentation forms. These forms were developed independently at individual sites throughout the 1980's (Michener, 1986). Over the years the LTER data managers have addressed meta-data issues together in a sequence of working groups. The history of the LTER data catalog development illustrates how dramatically data access methods have changed. In 1990 site meta-data descriptions of selected data sets were compiled and published in a hardcopy bound volume (Michener et al., 1990) through the Network Office. In 1993 these published data set descriptions were made available electronically at a Network Office gopher internet site, while the actual data remained available at the local site. Recently, in 1997, the catalog was redesigned by the Virginia Coast Reserve LTER site as a module for the NIS, taking advantage of web search engines and web harvesting techniques (Porter, 1997).

The current LTER data catalog provides a centralized data table of contents for all the individual site online databases. Instead of harvesting the actual data sets, a central site harvests meta-data: individual site lists of local data sets including the data set name, the associated principal investigators, keywords, and local accession number. This list is provided dynamically as a script or statically as a file at a local URL and is harvested regularly into a centrally located database. The LTER data set lists are then prepared as hypertext mark-up language (HTML) files providing links to the actual local data sets and accompanying meta-data from the central site. A web search engine, WebGlimpse (Manber et al., 1997), is used to generate indexes which allow site-specific and cross-site catalog searches for data sets. The search engine provides keyword searching based on the locally provided keywords, as well as free-text searching of local site meta-data descriptions. Currently, the issues of keywords as part of a controlled-vocabulary search and of accession schemes have not been centrally addressed, but the data catalog prototype has generated further discussions illustrating how the design, evaluation, and feedback steps are an inherent part of module development.

The data catalog was developed predominantly through expertise at a local site, but group discussion elicited product definition and group consensus. Four different prototypes, differing in technical complexity, were presented to the larger group for consideration. The four prototypes ranged from a simple catalog listing with links to local site data sets to the more sophisticated creation of a new catalog archive. Use of existing catalog archives was also considered but update was difficult for this case. Each option was evaluated by data managers, within the context of each data manager's site; the consensus choice was the prototype based on web-crawling technologies which is currently in production mode at the Network Office. The primary selection factors were ease of use and ease of update via automated processing. The data catalog includes site-specific as well as network-wide search capabilities which benefits the individual site that does not want to invest in a locally-developed search capability.

Bibliography. The LTER all-site bibliography (Chinn and Bledsoe, 1997) is an example of an early NIS support module created using a variant of the harvest technique. In this case, individual site filters to convert local bibliographic files into a standardized, centrally-located database were centrally developed rather than site developed as in Figure 4. This scheme, however, inhibited further development because updating the database required manual steps, most of them performed at the central database. Such an arrangement distanced the site from the exchange filter used to convert their data. Such efforts provided a valuable opportunity for group education, but they also created a legacy of expectations since both module functionality and content may appear complete when this is actually true only momentarily. A working prototype can become incomplete with passing time unless there is provision to keep information current with some type of dynamic update. Design modification is underway to move the LTER bibliographic exchange filter to the local site and to insure that update capabilities can be handled more readily. Originally the LTER database was made available online using the gopher protocol and a WAIS (Wide Area Information Server) search but transitioned in 1998 to a Network Office SQL server. Currently the all-site bibliography contains more than 12,000 entries while several index schemes are being tested and a new interface strategy is under development. The all-site bibliography is used both for cross-site theme searches as well as for documenting site publication.

Site Description Directory. The goal of the site description module is to provide a uniform presentation of site information about location, personnel and research. In this prototype, information is entered through a web form and saved to a central database. In order to insure control at the site level, the entries are editable from the local level and easily available for download. All users may view the site descriptions while a passworded login is given to the local data manager who oversees each site's updates. For security, because of the non-local distribution of passwords, all input is saved to a temporary storage area and reviewed for integrity by the database manager prior to submission to the online database. A site description support module is critical for conducting cross-site research since it provides elemental site information such as site location (latitude, longitude, elevation), plot size and biome classification as well as overview information about climate, vegetation, soil, hydrology, geology and education. Ultimately, it provides an index into cross-site LTER common science themes and a mechanism to link with other site description directories that may exist in the future for other networks.

This module uses web forms, which are most useful for information requiring one-time entry or very occasional update. Web form utility depends on how the input is stored and the tools available to work with the gathered information. In the case of the site description directory, the web interfaces currently to a relational database (miniSQL) on a unix platform but is being transferred to another database (Microsoft SQLserver) on an NT platform in preparation for migration to the production location. Some modules may use a hybrid mechanisms, such as local site file harvesting supplemented with network level web form update, until local and network efforts interface seamlessly. For instance, the site description directory and personnel directory could require harvesting to accommodate substantial changes in a local database, but might generally be served by network database web form edits and additions.

Keys to Success of an Network Information System

The first step in developing a network information system is to understand in depth the use that the system is intended to serve. The concepts of information management may be addressed initially on a modest scale (Sidebar 4) as the requirements for data exchange and aggregation are defined. More comprehensive design elements used in the LTER NIS are highlighted in Sidebar 5. Design elements may be gradually incorporated in order to improve the process as it expands from a few scientists exchanging data to a larger group establishing a procedure for on-going exchanges. Elements include establishment of partnerships, definition of site responsibilities, creation of a modular structure and development of independent prototypes .

A support structure is needed to address the challenge of transforming individual site data into intercomparable data sets. The designation of a data manager and the development of a strategic vision at the local site level have important influences on local data management. Similarly, a network information system requires data managers in place at each site as well as designation of data managers at the network level. Agreement by sites upon a strategic vision for the network-level information system is critical. The strategic vision must encompass support of both local and cross-site ecological research.

Communications are a key factor in making a large multidisciplinary project feasible (IGBP, 1990). After all, the success of an enterprise is based upon the exchange of information in support of the participants. There is a developing literature on organizational learning, computer supported cooperative work and communities of practice (Jordan, 1996). Communication within the LTER network is promoted through a variety of informal means such as standardized lists, electronic mail lists, a network web site, surveys, working groups, and symposia in addition to the more formal committees, reports, proceedings, newsletters, and publications. Communication is essential to establishing and maintaining effective partnerships which are another key design element for any information system. A nested structure (Figure 2) facilitates active partnerships among individual site scientists, data managers, a coordinating office, and the broader ecological community. Partnerships interlock at many levels including within site and cross-site as well as network to network.

Communications specifically within the LTER data manager group have been encouraged through network office support and regular annual meetings. The concepts for online data handling are discussed frequently in working groups as well as plenary sessions of data management meetings prior to development. Surveys have been found to be effective tools for gathering and summarizing information for group discussion. Survey examples include site overviews, electronic and weather instrumentation, bibliographic software, site software, data access, data policy, web presentation elements, information system design, and information management support.

Development of modules has been a crucial feature of the LTER Network Information System design. Within the framework of a modular structure, tasks can be considered independently as long as each task is designed as a module that can interface with the NIS structure without disturbing other modules. An additional strength of modular design is that it is extensible: the information system can be extended or modified easily by adding, replacing, or deleting modules without compromising the NIS structure. This is made possible by having separate specifications for each part of the system (Brunt, 1998).

Prototypes of modules are individual test designs that can be developed and implemented at any site but may migrate eventually to another location. Several prototypes may be proposed, tested and modified in response to the need for a single module. A working prototype catalyzes development by providing a functioning product for immediate review and focuses discussion on concrete issues. Prototypes are developed by individual or collaborating sites and implemented subsequently by a few cooperating sites with the understanding that decisions made during development are provisional and subject to further modification. Evaluation within a small group strengthens the model prior to large inclusive group discussions. Still the pace of development gives a site in disagreement with some aspect of an adopted model the time needed to create alternative suggestions. A variety of local site initiatives demonstrating effective new approaches to data management are considered, but ultimately only those initiatives that scale to the full network, as well as being generally robust and sustainable, are adopted. Since these initial explorations are limited in time investment and development scale, there is less reluctance to accept an alternate, more effective solution in response to feedback. Iterative module development is similar to the concepts of iterative software interface design (Kies, Williges and Rosson, 1998). The local site contributions to network prototypes expands the pool of technical expertise for the network. In the best of cases, there is a leveraging of resources which saves any one site from having to attempt each task individually.

Software tool developments have accelerated and created an environment of rapid change in data management techniques. Network tools have been a dominant integrative force in the development of the LTER Network Information System. Prototype implementations use tools such as web pointers, which find web addresses; web harvesters, which provide automatic internet file retrieval; and web forms, which allow database information to be added and updated. With site funding support focused primarily on science, local computing environments have relied extensively on existing software. The use of existing software tools is possible because visionary tool development in computer science is funded elsewhere. The LTER data managers have promoted collaborative exchanges with the computer science community to identify appropriate new tools and to serve occasionally as a testbed in their development. Adoption of new software is influenced by the consensus decision-making process employed at the network data management level which acts as a conservative force in the adoption of new tools.

Different candidate hardware, software and methods can undergo an evolutionary evaluation much like a natural selection process where pieces that succeed are preserved and those that don't disappear over time (Kelly, 1994). The process is described by cybernetics in systems that have the capacity for learning and adapting. Although this feedback model does not promote linear progress, it offers flexibility because it simultaneously encompasses both design and evaluation as a part of development. Thus instrumentation and software adopted at one site may be documented in surveys, reported upon and discussed. Candidates that have not been discarded, are refined and may spread through the network by example and ultimately may be adopted at the network level. When a majority of sites have found a solution useful, it is a confirmation of the method's robustness. This feedback system is educational as sites are able to learn with and from each other. These communications foster replication of successful strategies and modification or avoidance of unsuccessful strategies.

Software choice will be influenced by the need to ensure site participation. Modules must be straight-forward to implement as well as provide benefits to the participating site. Module design can minimize interface difficulties thus maximizing the likelihood of site participation. Ensuring the ease of input and output of data to the system is a basic priority. Providing a mechanism for data exchange and distribution is an obvious benefit to each site. The implementation of translation filters accommodates this exchange while not dictating any changes in individual site data management. Alleviating the difficulties of data update by the sites is also critical for long-term maintenance and should be a transparent part of the database process. Even the simple web posting of prototype module participants provides a modest reward and encourages module implementation at the other sites.

Looking Ahead

As a community, the LTER may be considered a social system or even a cognitive ecosystem with its own unique infrastructure (Schatz, 1993; Star and Ruhleder, 1994; Tomlinson et al., 1998). The success of such a community system depends upon whether important issues can be recognized, communicated, and addressed (NRC, 1995; Spasser, 1997), issues such as whether the needs of those who work to support community systems and the benefits of such systems been defined (Grudin, 1989). The LTER structure represents a complex interleaving of scientific, political, economic, technological, social, and educational issues. A broadening at selected sites to include social science and education components will bring new expectations and approaches. New developments will necessitate a re-examination of both system size and organizational structure.

In looking ahead to the growth of the LTER NIS, it is important to consider the potential for future expansion in light of LTER history. The NIS has accommodated comfortably an increase in participants with initial sites unified by funding from the same agency. The LTER network, which began as a group of six sites funded by NSF's Division of Environmental Biology (DEB), reached 16 sites in 1990. Several of the early LTER sites (H.J.Andrews Experimental Forest, Coweeta Hydrologic Laboratory, Jornada Experimental Range, Niwot Ridge and Shortgrass Steppe) were initiated under the NSF International Biological Program (IBP). The network enlarged to 20 sites with the addition of 2 Antarctic sites funded 1991-1994 through the Office of Polar Programs, and added two urban sites funded in 1997 by DEB but with funding in part also by Directorate of Social Behavior and Economics (SBE) and the Directorate for Education and Human Resources (EHR). In 1998, a 21st site was funded through DEB as a conversion of a former Land Margin Ecosystem Research (LMER) site with plans for further LMER site additions.

As the network continues to grow and diversify, network participants must consider how the LTER model of nested structures (Figure 1) and consensus decision-making will be affected. We must be alert to whether additional communication mechanisms must be identified, whether the individual site will remain vested in LTER and whether a consistent base of support can be maintained given an increasing reliance on funding sources with differing requirements and agendas. We must be aware of both the inefficiency inherent in large group dynamics (Grudin, 1988) and the progress of computer supported cooperative work (RM Baecker, 1993; Bowker et al., 1997; McCarthy, 1994) in addition to the opportunistic network development balanced against centralized control. The LTER NIS is a multidatabase system (Sheth and Larson, 1990) which in the continuum of federated database systems described by Robbins (1995), is closer to the loosely coupled end of the spectrum than the tightly coupled end. What changes will the NIS undergo? We will need to revisit the evaluation criteria themselves which range from "Does the system support local science?" to "Does the system promote cross-disciplinary synthesis?" to "Does the system manage long-term data?" The growing need to address global ecological questions prompts us to ask how well the current LTER data management model will scale to more expansive networks and global database concerns and leads to the question "Will a paradigm shift occur?"

Conclusions

The Long-Term Ecological Research Network is a community focused on ecological systems and common goals for the long-term data the community generates. This is a community of more than 1100 scientists and more than 700 students in approximately 140 institutions. The LTER Network is a working model demonstrating how a data management structure can facilitate integrated science. Both the LTER program structure and the data management approach aim to integrate local and cross-site ecosystem research. Although early research methods emphasized the individual effort, contemporary paradigms are broadening to address community constructs (Jordan, 1996). As data management becomes recognized as an integral part of a group effort and single-investigator personal data systems interface with broader data and meta-data aggregations, issues such as documentation and data availability can be addressed in the larger context of integrated science. Unresolved issues such as module interoperability, data indexes, semantic capabilities and data policy remain the subject of community discussion and research.

The LTER development of information management is an adaptive process, not a rigid prescription. It is an approach that facilitates science immediately, fosters partnerships among data managers and scientists across the network, and creates technological interfaces. The structure of the LTER NIS allows exploration of rapidly changing technology in the field of computer science, avoiding the need to impose a single solution upon an entire network. Prototype development, carried out by independent subsets of interested sites in an arena of group and subgroup discussion, has proven an efficient mechanism for exploring and evaluating new methods. Local LTER site initiatives, not constrained by network considerations, can also identify emerging technology that will benefit scientific progress. Yet, an important feature is that the system is designed so that this experimentation does not impede the momentum of on-going research.

With an emphasis on communications, an ongoing feedback creates a synergy among LTER sites. This dynamic process permeates the spectrum of sites regardless of their stage of technological development. If one site is not able to maintain a web server initially, as long as the site is connected electronically, communications are possible and the network can provide initial basic services such as web page storage or remote sensing support. Group connectivity means that computer-mediated communication is available to create an extended learning environment. Internet connectivity is not absolutely required for a network, but it has been a determining factor in the success of the LTER network and has been identified as a goal for all sites (Brunt et al., 1990).

Finding an appropriate balance of local development in concert with network development has guided the evolution of the LTER Network Information System. Local versus network dynamics can help maintain a healthy tension (Stafford, personal communication; Star and Ruhleder, 1996) yet promote cohesiveness within the group of sites because of the participation by all in defining a system flexible enough to respond to today's rapid technological changes. From an organizational perspective, the decisions to maintain data management at a site level as well as to establish a Network Office that promotes and supports electronic connectivity on the Internet are significant contributors to the LTER program's success. Successfully identifying and allocating functions best maintained at the site and at the network-level is not a trivial assignment. Leadership and vision are necessary to ensure long-term viability of this complementary functioning.

The LTER NIS approach is appropriate for many groups of sites or countries faced with addressing both local needs and network co-operation. Such an approach is applicable also to a group faced with networking post hoc that wishes to build upon existing local data management efforts. Each LTER site has a unique research and administrative structure reflected in the local information system's balance of service, information management and computer science. This balance is defined by the vision of the organizing body, the available technical infrastructure, and the expertise of the local data manager. The LTER data management approach, which emphasizes partnerships, communications, and a modular structure developed through independent prototypes that are dynamically updated, has proven both productive and educational. The LTER Network Information System model permits site development to continue at the local level while drawing strength from the diversity of the sites in the network.

Acknowledgements: This work was funded by NSF Grants ( OPP-96-32763 ksb/PAL, DEB-96-32853 bb/NTL, DEB-92-11769 db/BNZ, DEB-96-32921 dh&ss/AND, DEB-94-11974/jp/VCR, and DEB-96-34135/NET). The model presented builds upon contributions from the collaborative network of LTER site data managers as well as from the LTER scientists and administrators who have worked with vision and patience. Specific recognition is given to R.Stubbs at the North Temperate Lakes site for continued development of the climate database, to D.Rawls for insightful editing, and to J. Tom Callahan for his undeterred support of data management.

*****************************************************

Figure 1: Information Flow

Raw data travels through several transformations before it provides useful knowledge for a wide range of communities. This information flow begins with the collection of data at an individual site. When data sets are structured to be intercomparable, a central (often, relational) database can be established. Scientific communities then draw data from this relational base in formats suited to their analyses. Broader communities turn to interpretations of such analyses to obtain knowledge on pertinent topics. Such synthesis may take various forms: publications for scientists, policy statements for government needs, reports for business needs and media presentations for the general public.

Figure 2: LTER Network-Background and Nested Structure

The Long-Term Ecological Research Network was established in 1981 by the National Science Foundation as a collaborative research program bringing together individual sites mandated to create a data legacy for the needs of future researchers. The original vision, that a group of sites addressing broad ecosystem issues be coordinated into a network, remains in place today (Callahan, 1984). Each site consists of a multi-investigator research-team exploring a specific ecosystem. These sites include forest, prairie, desert, tundra, agricultural, lake, river, wetland, estuary and marine environments which are linked into a network through core study areas, cross-site studies and ongoing collaborations (Franklin et al., 1990; Franklin, 1994) as well as through electronic connectivity and information management (Brunt et al., 1990).

The LTER Network organization may be visualized as nested activities represented by ovals: individual site research with a data management component and the network with cross-site research coordination. Each site is represented on the Coordinating Committee (CC) from which the Executive Committee is elected. Each site is also represented on the Data Manager Committee which elects a coordinating committee called DataTask. The central driver for these nested activities remains the site-funded ecosystem research while cross-site activities have been supported by establishment of an LTER Network Office which maintains support personnel and facilities such as a computing center. In-reach activities may take the form of both intra-site communications (within site) and inter-site communications (between sites or cross-site). The central support mechanism facilitates in-reach within the LTER community of scientists and out-reach to other communities such as other scientific networks, technical experts, government agencies and the public. These broader communities (enclosed by dotted lines) represent another dimension of overlap. The unions of the loops represent joint LTER activities. For instance, there is coordination of LTER data management with LTER scientific pursuits through data manager representation at both the annual scientist Coordinating Committee Meetings and cross-site committee meetings. An annual Data Manager Committee meeting provides an essential forum for information exchange among the data managers themselves.

Figure 3: NIS

The Long Term Ecological Research Network Information System facilitates information flow from individual research databases through composite relational databases and into appropriately formatted out-reach uses by broader communities. NIS rests upon the foundation of research done at individual sites. This example shows how data management interfaces and assembles data into individual data modules such as the climate database or the data catalog. Using appropriate filters, data managers arrange for these data to be transmitted to the NIS central climate database and the network data catalog. The web presentation may migrate eventually from its original development location at a research site to the Network Office.

Figure 4: Module Schema

A generic module schema illustrating how data originating at a local site can be posted using exchange filters in an agreed upon exchange format to either a dynamic or static URL. Given a set of site identified URLs, a harvester from a central site can gather site data and place it in a relational database. The data can then be displayed for the general scientific community using distribution filters in a variety of distribution formats.

Add Refs:
Sheth and Larson, 1990
Platt and White, 1988
Tomlinson et al, 1998
Stanford, J. and A. McKee. 1999