USDA Forest Service
 

North Central Research Station

 

North Central Research Station
1992 Folwell Avenue
St. Paul, MN 55108

(651) 649-5000

United States Department of Agriculture Forest Service.

Storing data and metadata

One objective of the publishing archiving program is to keep the effort required to maintain the data products over time to a minimum. In part, this means trying to avoid maintaining versions of the data in multiple formats or updating formats periodically. Another objective is to make the data and metadata as accessible as possible across computing platforms, again while minimizing the use of multiple file formats. As it happens, an attractive new technology is available to help with both of these objectives. Called eXtensible Markup Language (XML), this is a cousin of HTML. Like HTML, XML is defined by tags enclosing content. Unlike HTML, the tags are mostly defined by the user to describe their own data. The XML standard defines how tags are organized and how to manipulate them. XML is natively stored in Unicode, a 32-bit plain text standard that supercedes ASCII. The power and simplicity of XML has lead to implementations on every major platform. These features make XML an excellent candidate for long-term storage of data and metadata.

XML is also strongly supported by major vendors – Microsoft, IBM, Oracle, Sun, etc. – and is being incorporated into desktop software like Microsoft Office as well as high-end database systems like SQL Server and Oracle, not to mention XML-centric databases such as Tamino. For example, a data set stored as XML can be directly opened by Access in Office 2003; other Office applications have some level of support for XML, and this is expected to improve over time.

For metadata, XML has the added advantages of being searchable (in a metadata clearinghouse, for example) and displayable via style sheets. The use of style sheets allows one to generate the XML once, but display it in different formats by simply changing the style sheet used. The impact of this can be seen by opening this metadata file in a simple text editor like Notepad. This is the raw XML. If you open the same file in Internet Explorer (IE) 4.5 or later, it will look different — easier to navigate, but still exposing the actual XML. One can also specify a style sheet to use in the XML file. The ability to insulate a regular data user from the raw XML can be seen by opening this version of the metadata file in IE (or other XML-capable browser). This style sheet started out as an ESRI/FGDC display for FGDC metadata and was extended to handle NBII metadata.

Style sheets can also be used when it becomes necessary to move the archive to a new version of XML – write the translation once, apply it programmatically to all files in the archive, and the job is done. As time goes by, we may also be able to standardize some element tags for scientific data. Taking advantage of such standards is expected to be fairly simple, while yielding powerful results for data searching and re-use.

top

 

USDA Forest Service - North Central Research Station
Last Modified: Monday, 21 March 2005


USDA logo which links to the department's national site.Forest Service logo which links to the agency's national site.