Dataset Inventory Catalog Specification Version 0.7

last update: Dec 11, 2003 ($Date: 2005/03/30 05:40:31 $)


Overview

THREDDS Inventory Catalogs are designed to organize and describe collections of data. A dataset is a container for associated metadata and other datasets. Each dataset is either a collection dataset (i.e., contains other datasets) or an atomic dataset (i.e., has an access method).

An atomic dataset has no nested datasets, and has an access URL with service type not Resolver or QueryCapability. To find out more about this Dataset, one must use a non-THREDDS protocol, which we call crossing the protocol boundary.

A collection dataset may be of the following types:



Notes from discussions on 12 December:
More discussion on 18Dec:

Catalog Elements and Attributes

catalog Element

<xsd:element name="catalog" type="cat:catalogType">

<!-- Enforce dataset ID references:
1) Each dataset ID must be unique in the document.
2) Each dataset alias must reference a dataset ID in the document.
-->
<xsd:unique name="datasetID">
<xsd:selector xpath=".//cat:dataset"/>
<xsd:field xpath="@ID"/>
</xsd:unique>
<xsd:keyref name="datasetAlias" refer="cat:datasetID">
<xsd:selector xpath=".//cat:dataset"/>
<xsd:field xpath="@alias"/>
</xsd:keyref>

<!-- Enforce references to services:
1) Each service name must be unique and is required.
2) Each dataset that references a service (i.e., has a serviceName
attribute) must reference a service that exists.
3) Each access that references a service (i.e., has a serviceName
attribute) must reference a service that exists.
@todo Do we want unique service names. Currently, don't need to be unique.
@todo This does not enforce the current scoping of service elements.
-->
<!xsd:key name="serviceNameKey">
<xsd:selector xpath=".//cat:service" />
<xsd:field xpath="@name" />
</xsd:key>
<xsd:keyref name="datasetServiceName" refer="cat:serviceNameKey">
<xsd:selector xpath=".//cat:dataset" />
<xsd:field xpath="@serviceName" />
</xsd:keyref>
<xsd:keyref name="accessServiceName" refer="cat:serviceNameKey">
<xsd:selector xpath=".//cat:access" />
<xsd:field xpath="@serviceName" />
</xsd:keyref>
</xsd:element>

<xsd:complexType name="catalogType">
<xsd:sequence>
<xsd:element ref="cat:dataset" minOccurs="1" maxOccurs="1" />
</xsd:sequence>
<!--xsd:attribute name="name" type="xsd:string" use="required"/-->
<xsd:attribute name="version" type="xsd:token" default="0.7"/>
</xsd:complexType>

The catalog element is the top-level element and must contain exactly one top-level dataset. The version attribute allows DTD migration and should be set to"0.7". The name of the top-level dataset is considered the name of the catalog and should be displayed to the user when selecting from catalogs. Here is an example catalog with top-level dataset:

<?xml version="1.0" encoding="UTF-8"?>
<catalog version="0.7"
xmlns ="http://www.unidata.ucar.edu/schemas/thredds/InvCatalog.0.7.xsd"
xmlns:xlink="http://www.w3.org/1999/xlink">
<dataset name="My data collection" >
...
</dataset>
</catalog>

Several uniqueness and reference rules for other elements are enforced by parts of the schema snippet above. See the "dataset Element" section for more details on dataset elements referencing other dataset elements as well as service elements. See the "access Element" section for more details on access elements referencing service elements.

dataset Element

<xsd:element name="dataset" type="cat:datasetType" />
<xsd:complexType name="datasetType">
<xsd:sequence>
<xsd:element ref="cat:service" minOccurs="0" maxOccurs="unbounded" />
<xsd:element ref="cat:documentation" minOccurs="0" maxOccurs="unbounded" />
<xsd:choice minOccurs="0" maxOccurs="unbounded" >
<xsd:element ref="cat:metadata" />
<xsd:element ref="cat:property" />
</xsd:choice>
<xsd:element ref="cat:access" minOccurs="0" maxOccurs="unbounded" />
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<xsd:element ref="cat:dataset" />
<xsd:element ref="cat:catalogRef" />
</xsd:choice>
</xsd:sequence>
<xsd:attribute name="name" type="xsd:string" use="required" />
<xsd:attribute name="dataType" type="cat:dataTypeEnum" />
<xsd:attribute name="authority" type="xsd:string" />
<xsd:attribute name="ID" type="xsd:token" />
<xsd:attribute name="alias" type="xsd:token" />
<xsd:attribute name="serviceName" type="xsd:token" />
<xsd:attribute name="urlPath" type="xsd:token" />
</xsd:complexType>

A dataset element represents a named logical set of data at a level of granularity appropriate for presentation to a user. The name of the dataset element (i.e., the value of the name attribute) should be a human readable name that will be displayed to users. A dataset is considered an atomic dataset if it defines at least one access method, otherwise it is just a container for nested datasets. [If an atomic dataset is selected by a user, an event is sent to the client software. Should we seperate out the object/library/widget actions from the communication layer? Content vs presentation?] Multiple access methods specify different services for accessing the data. Choices among these different services should be filterered by client software or presented to the user for selection. There are a variety of ways to define an access method in a dataset element; they are described in detail in the "Constructing an Access Method" section below.

A dataset element contains 0 or more service elements followed by 0 or more documentation, metadata, or property elemets in any order, followed by 0 or more access elements, followed by 0 or more nested dataset or catalogRef elements.  The data represented by a nested dataset element should be a subset, a specialization or in some other sense "contained" within the data represented by its parent dataset element.

A dataset may have a dataType, specified within itself or in a containing collection, whose value comes from a controlled vocabulary.

If a dataset has an alias attribute, the value of the attribute must be an ID of another dataset within the same catalog. Note it may not refer to a dataset in another catalog referred to by a catalogRef element. In this case, any other properties of the dataset are ignored, and the dataset to which the alias refers is used in its place.

A dataset may have a authority specified within itself or in a containing collection. If a dataset has an ID and a authority attribute, then the combination of the two should be globally unique for all time. If the same dataset is specified in multiple catalogs, then its authority - ID should be identical if possible.

Many of the properties of a dataset become the default for contained datasets. This includes property elements, and dataType, authority, and serviceName attributes. Any documentation elements are displayed at the dataset itself when presenting the catalog to the user. Any metadata elements apply to all contained datasets.
 
A dataset element can reference another dataset element; the ID attribute (if one is given) must be unique to the XML document and the alias attribute must reference an existing ID attribute.

service Element

<xsd:element name="service" type="cat:serviceElemType" />
<xsd:complexType name="serviceElemType">
<xsd:sequence>
<xsd:element ref="cat:property" minOccurs="0" maxOccurs="unbounded" />
<xsd:element ref="cat:service" minOccurs="0" maxOccurs="unbounded" />
</xsd:sequence>
<xsd:attribute name="name" type="xsd:string" use="required" />
<xsd:attribute name="serviceType" type="cat:serviceTypeEnum" use="required" />
<!-- @todo What does "base" mean for a compound service? null value? -->
<xsd:attribute name="base" type="xsd:string" use="required" />
<xsd:attribute name="suffix" type="xsd:string" />
</xsd:complexType>

The service element ...

access Element

<xsd:element name="access" type="cat:accessType" />
<xsd:complexType name="accessType">
<xsd:attribute name="urlPath" type="xsd:token" use="required" />
<!-- @todo How can I restrict to serviceName OR serviceType, not both? -->
<xsd:attribute name="serviceName" type="xsd:string" />
<xsd:attribute name="serviceType" type="cat:serviceTypeEnum" />
</xsd:complexType>

An access element describes one method for accessing the data that the parent dataset represents. The access method accessing the data object that the dataset

documentation Element

<

A documentation element ...

metadata Element

<

A metadata element ...

catalogRef Element

<

A catalogRef element ...

property Element

<xsd:element name="property" type="cat:propertyType" />
<xsd:complexType name="propertyType">
<xsd:attribute name="name" type="xsd:string" />
<xsd:attribute name="value" type="xsd:string" />
</xsd:complexType>

A property element ...


Notes

Constructing an Access Method for a Dataset

There are a variety of ways to build an access method for a given dataset:

1) The access method can be defined as a combination of the urlPath and serviceName attribute of the given dataset element.

For example:

<dataset name="d1">
<service name="s1" serviceType="DODS" base="http://s1/dods" />
<service name="s2" serviceType="DODS" base="http://s2/dods" />

<!-- This datasets URL is "http://s1/dods/d1.1.nc" -->
<dataset name="d1.1" serviceName="s1" urlPath="d1.1.nc" />

<!-- This datasets URL is "http://s2/dods/d1.2.nc" -->
<dataset name="d1.2" serviceName="s2" urlPath="d1.2.nc" />
</dataset>

2) The access method can be defined as a combination of the urlPath attribute of the given dataset element and the serviceName attribute of an ancestor dataset (i.e., the service name value is inherited from or scoped within ancestor datasets).

This is convenient when all (or most) of the datasets in a parent dataset have the same service. For example:

<dataset name="d1" serviceName="s1">
<service name="s1" serviceType="DODS" base="http://s1/dods" />
<service name="s2" serviceType="DODS" base="http://s2/dods" />

<dataset name="d1.1" urlPath="d1.1.nc" /> <!-- URL: "http://s1/dods/d1.1.nc" -->
  <dataset name="d1.2" urlPath="d1.2.nc" /> <!-- URL: "http://s1/dods/d1.2.nc" -->
  <dataset name="d1.3" urlPath="d1.2.nc" /> <!-- URL: "http://s1/dods/d1.3.nc" -->
  <dataset name="d1.4" urlPath="d1.2.nc" /> <!-- URL: "http://s1/dods/d1.4.nc" -->

  <dataset name="d1.5" serviceName="s2" urlPath="d1.2.nc" /> <!-- URL: "http://s2/dods/d1.5.nc" -->
</dataset>

3) The access method can be defined by a access element that is the child of the given dataset element. Each access element defines one access method. An access element can define an access method in two ways. First, an access method is defined by a combination of a serviceType attribute and a urlPath attribute of the access element. In this case, the value of the urlPath attribute must be an absolute URL. Second, an access method can be defined as a combination of the serviceName attribute and the urlPath attribute of the access element. In this case, the URL given in the urlPath attribute is a relative URL, relative to the base URL of the service element referenced by the serviceName attribute.

An access method defined by the dataset element's urlPath attribute (1 and 2 above) is considered the default access method. The default access method should be the preferred access method when no filtering or user choice is possible.







OLD STUFF TO BE REVIEWED

Change Log

Catalog Elements and Attributes

Access Element

<!ELEMENT access EMPTY>
<!ATTLIST access
    urlPath CDATA #REQUIRED
    serviceName CDATA #IMPLIED
    serviceType (%ServiceType;) #IMPLIED
>
An access element specifies how a dataset can be accessed through a data service. It is typically used when there is more than one service available for a dataset.

Typically a serviceName is specified, which is the name of a service element in a parent element of the same catalog. Note it may not refer to a service in another catalog referred to by a catalogRef element. The dataset URL is then formed from the service base and the access urlPath, and optionally the service suffix (see forming URLs).

If a serviceName is not specified, a serviceType must be specified, which creates an "anonymous service" of that type. In this case the urlPath must be absolute.
 

Catalog Element

<!ELEMENT catalog (dataset) >
<!ATTLIST catalog
    name CDATA #REQUIRED
    version CDATA #REQUIRED
    xmlns:xlink CDATA #FIXED "http://www.w3.org/1999/xlink"
    xmlns CDATA #FIXED "http://www.unidata.ucar.edu/thredds"
>
This is the top-level element. A catalog element contains exactly one top-level dataset. The name of the catalog should be displayed to the user when selecting among catalogs. The version allows DTD migration and should be set to"0.6".

The XLink and default namespaces are declared here, so technically they do not have to be declared in the catalog XML itself. However Internet Explorer cannot deal with namespaces declared in the DTD, so you should add the same two namespace declarations in the catalog element in the XML document itself (see example). This allows you to view the catalog in the IE browser. Netscape Navigator cannot yet view XML files (as of version 6.2.1).
 

CatalogRef Element

<!ELEMENT catalogRef EMPTY>
<!ATTLIST catalogRef
    xlink:type (simple) #FIXED "simple"
    xlink:href CDATA #REQUIRED
    xlink:title CDATA #REQUIRED
>
A catalogRef element refers to another catalog that becomes a dataset inside this catalog. This is used to seperately maintain catalogs and to break up large catalogs. The referenced catalog should not be read until the user explicitly requests it, so that very large dataset collections can be represented with catalogRef elements without large delays in presenting them to the user. The referenced catalog is not textually substituted into the containing catalog, but remains a self-contained object. The referenced catalog must be a valid THREDDS catalog, but it does not have to match versions with the containing catalog.

The value of xlink:href is the URL of the referenced catalog. It may be absolute or reletive to the catalog URL. The value of xlink:title is displayed as the name of the dataset that the user can click on to follow the XLink. Note that the XLink has a fixed type of "simple" that is part of the DTD, so does not have to be specified in the catalog XML.

The dataset chooser software should seamlessly present a catalogRef to the user, for example by eliminating the referenced catalog's top-level dataset in its presentation of the catalog when its name matches the title of the catalogRef title attribute.
 

Dataset Element

<!ENTITY % DataType "Grid | Image | Station">
<!ELEMENT dataset (service*, (documentation | metadata | property)*, access*, (dataset | catalogRef)*)>
<!ATTLIST dataset
    name CDATA #REQUIRED
    dataType (%DataType;) #IMPLIED
    authority CDATA #IMPLIED
    ID ID #IMPLIED
    alias IDREF #IMPLIED
    serviceName CDATA #IMPLIED
    urlPath CDATA #IMPLIED
>
A dataset element represents a logical set of data at a level of granularity appropriate for presentation to a user. A dataset is selectable if it contains at least one access path, otherwise it is just a container for nested datasets. If selectable, upon selection, an event is sent to the client software.

A dataset element contains 0 or more service elements followed by 0 or more documentation, metadata, or property elemets in any order, followed by 0 or more access elements, followed by 0 or more nested dataset or catalogRef elements.  The data represented by a nested dataset element should be a subset, a specialization or in some other sense "contained" within the data represented by its parent dataset element.

A dataset  must have one or more access paths, specified implicitly through a urlPath attribute, or explicitly in contained access elements.  An access path should be thought of as a URL, but its actually information from which a protocol-aware layer can construct URLs.  When there is only one URL, this is typically specified in the dataset element itself. When there are multiple URLs, these may be specified in the dataset element and/or in contained access elements. Multiple URLs specify different services for accessing the  dataset. Choices among these different services should be filterered by client software or presented to the user for selection.  A URL specified in the dataset element itself is the default URL, which should be the preferred URL when no filtering or user choice is possible. Also see forming URLs.

A dataset may have a dataType, specified within itself or in a containing collection, whose value comes from a controlled vocabulary.

If a dataset has an alias attribute, the value of the attribute must be an ID of another dataset within the same catalog. Note it may not refer to a dataset in another catalog referred to by a catalogRef element. In this case, any other properties of the dataset are ignored, and the dataset to which the alias refers is used in its place.

A dataset may have a authority specified within itself or in a containing collection. If a dataset has an ID and a authority attribute, then the combination of the two should be globally unique for all time. If the same dataset is specified in multiple catalogs, then its authority - ID should be identical if possible.

Many of the properties of a dataset become the default for contained datasets. This includes property elements, and dataType, authority, and serviceName attributes. Any documentation elements are displayed at the dataset itself when presenting the catalog to the user. Any metadata elements apply to all contained datasets.
 

Documentation Element

<!ELEMENT documentation (#PCDATA)>
<!ATTLIST documentation
    xlink:type (simple) #FIXED "simple"
    xlink:href CDATA #IMPLIED
    xlink:title CDATA #IMPLIED
    xlink:show (new | replace | embed) "new"
>
A documentation element contains or refers to content that should be displayed to an end-user when making selections from the catalog. The content may be HTML or plain text. We call this kind of content "human readable" information.

The documentation element may contain arbitrary plain text content, which should be displayed inline at the position of the collection or the dataset element that contains it.

The documentation element may also contain an XLink to an HTML or plain text web page. This text should be either shown inline or displayed when the user activates the XLink, depending on the value of the xlink:show attribute, whose default is new. If the value of xlink:show is new, then the content of the XLink should be displayed in a new window when the user selects it. If the value of xlink:show is embed, then the context should be displayed inline, as if it was text content in the documentation element. If the value of xlink:show is replace, the content should replace the existing window. The value of xlink:title is used for show and replace, and should be the displayed as the name that the user can click on to follow the XLink. The value of xlink:show and xlink:title are heuristics for the dataset choosing widget, which may not be able to fully implement them. These heuristics are intended to follow the XLink specification as closely as possible. Note that the XLink has a fixed type of "simple" that is part of the DTD, so does not have to be specified in the XML.
 

Metadata Element

<!ENTITY % MetadataType "THREDDS | ADN | Aggregation | DublinCore | DIF | FGDC | LAS | Other">
<!ELEMENT metadata ANY>
<!ATTLIST metadata
    xlink:type (simple) #FIXED "simple"
    xlink:href CDATA #IMPLIED
    metadataType (%MetadataType;) #REQUIRED
>
A metadata element contains or refers to structured information about datasets, which is used by client programs to properly display or search for the dataset.  Typically, metadata is not displayed to an end-user when making selections from the catalog, although it may be useful to make it optionally available. We call this kind of content "machine readable" information.

The metadata element must contain a metadataType attribute whose value comes from a controlled vocabulary. The types and formats of the metadata are still being developed, and the current list should be considered experimental. Most are currently not operational.

The metadata content may be placed in the metadata element itself, or it may be pointed to through an XLink, but it may not have both. Generally when the metadata is referenced by an XLink, the information is not read until explicitly requested.
 

Property Element

<!ELEMENT property EMPTY>
<!ATTLIST property
   name CDATA #REQUIRED
   value CDATA #REQUIRED
>
Property elements are arbitrary name/value pairs to associate with a dataset, collection or service elements. They will be used to create extended semantics, and should be available to client applications, but not typically displayed during dataset selection. Currently they have no specified semantics.

Service Element1

<!ENTITY % ServiceType "DODS | ADDE | NetCDF | Catalog | FTP | WMS | WFS | WCS | WSDL | Compound | Other">
<!ELEMENT service (property*, service*)>
<!ATTLIST service
    name CDATA #REQUIRED
    serviceType (%ServiceType;) #REQUIRED
    base CDATA #REQUIRED
    suffix CDATA #IMPLIED
>


A service element represents a data service. It must contain a name and a serviceType attribute whose value comes from a controlled vocabulary. It must contain a name unique within the catalog (note that catalogs referenced by a catalogRef contain their own ID namespaces). It must have a base attribute and may have an optional suffix atribute which are used to construct the dataset URL (see constructing URLS). The base may be an absolute URL or reletive to the catalog URL.

A service element may contain 0 or more property elements. These property elements are made available to the application when a dataset is selected, but are not otherwise used.

The scope of a service element is its sibling elements and their descendents, excluding catalogs referenced by catalogRef elements.  The service name should be unique within its scope.

A service element with serviceType="Compound" must have nested service elements, and services with type other than Compound may not have nested service elements. Nested service elements may be used directly by dataset or access elements. They are at the same scoping level as their parent service.

Each dataset element must refer to one or more service elements that appear in a parent collection. Since typically there will be only a few service elements in a catalog but many dataset elements, a service element factors out the common properties of the data service for efficient representation within the catalog.

 

Miscellaneous

 

Validation Error Messages