Data and Data Management

The information that is associated with a fossil, which is known as specimen data, is almost as important as the fossil itself.

Specimen data can help confirm the identity and age of the fossil; provide information about the paleoenvironment of the site where the fossil was found; help other researchers find the site again, so that further collections can be made; guide curators, conservators, and preparators in making decisions about how best to treat the material; and provide information on the history of collecting. Without these data, the fossil is much less “valuable” to scientists than it would otherwise be. For this reason, collection managers spend almost as much time managing data as they do looking after the fossils themselves.

Sharing data is an important form of collection access. Researchers need to get access to this information in order to make the most effective use of the collection, while members of the public rely on interpretations of the data to understand specimens that are on display. However, it’s important to remember that collections data are valuable proprietary materials. Institutions need to keep this information secure and develop policies about what information to release and under what conditions it should be shared.

What are some important types of data?

Taxonomic data

This information relates to the identification of the fossil – what type of organism it is. This information often changes with time as the fossil is worked on by different researchers. The identification may become more detailed (e.g., down to genus and species, rather than some higher taxonomic level) or may change because new evidence that emerges during research. It is important to remember that taxonomic identifications are only opinions; for this reason, taxonomic data include not just the currently accepted name of the specimen, but also a record of all the different identifications that have been made since a specimen’s discovery – this is called a “taxonomic history.” Scientists may return to the taxonomic history many years later to get insights into why a previous researcher came to a particular conclusion about a specimen.

Locality data

This is information about the place where the specimen was found – its locality. It usually includes not just the name of the locality, but also geographic information, such as the country, state, and county where the locality is situated. Sometimes there will be more detailed information, such as coordinate data; these may include latitude and longitude, UTM or GIS coordinates, or township and range information. There may be information that is intended to help find the site, such as a narrative description (e.g., “100 yards upstream from highway bridge, at base of bluff”), annotated maps, or aerial photographs. Locality data is particularly sensitive because of the need to protect fossil sites from illegal collecting, to protect the rights of landowners, or to ensure that they remain secure from other researchers while the institution and its staff completes a multi-year research program. For this reason, it is a good idea to develop a policy that specifies the accuracy of the information can routinely be provided in response to public enquires – e.g., generally nothing more specific than county-level information.

Stratigraphic data

This is information about the geological context of the specimen. It might include the geological age of the rocks in which the fossil was found, the name of the rock formation, or more specific, narrative information about the position of the fossil within the site (e.g., “2 meters below the purple layer”). Stratigraphic data are important because they allow researchers to relate the fossil and the site where it was found to other fossils and fossil localities locally, nationally, or internationally.

Contextual information

Data about the immediate surroundings of the fossil – for example, was it found in association with other specimens, what was the orientation of the specimen within a level or quarry, or were there differences between the matrix surrounding the fossil and the rocks in the rest the site. Contextual information can be very important in understanding the environment in which the organism lived and died, as well as the taphonomy or paleoecology of the specimen.

Provenance data

This is information how the specimen was obtained. It could be the name of the person who collected the specimen and the date when they found it. Alternatively, it might relate to a gift or donation, or a specimen purchase. As well as being important for dealing with later questions of ownership (see Acquiring) provenance data can provide a link between the specimen and archival information like field notes, journals, or correspondence which may be an important source of additional information about the fossil.

Treatment history

This is a record of what has been done to a specimen since its discovery in terms of preparation, sampling, or repair. Treatment histories are vital because they let future workers determine what changes have been made to the specimen and what features of the specimen may have been altered or lost. They may also reveal the source of problems that can arise, for example from the use of inappropriate materials, which can guide preparators or conservators in making repairs or treatments.  It might also include information about photographs taken, CT images or laser surface scans, etc.

Legacy data

These are pieces of information that are no longer in active use, but may still be important to our understanding of the specimen. For example, if a fossil is given a new catalog number, it is important to keep a record of the old number – it may have been cited in a publication, or referred to in important historical correspondence.  Collectors’ field numbers are important pieces of legacy data; they link the specimen to field notes, which may provide essential locality or stratigraphic data that has not been recorded elsewhere.

What is primary versus secondary data?

The Darwin Core Standard
Because so many institutions collect data about specimens, and want to share these data efficiently, there are now major efforts within the scientific community to reach agreement on the basic types of information that should be collected. This “data about data” is known as “metadata” and the list of agreed metadata for natural history specimens is called the Darwin Core standard (or DwC for short). The Darwin Core is defined as “a specification of data concepts and structure intended to support the retrieval and integration of primary data that documents the occurrence of organisms in space and time and the occurrence of organisms in biological collections.” You can find out more about the Darwin Core by following this link.

Another way to think about specimen data is to consider who generates the information. The ultimate source of data about a specimen is the information first recorded by the collector in the field. This is known as primary data; there can be only one primary data source and all subsequent transcriptions of the information, such as catalog records, specimen labels, etc., are secondary data. It is important to remember this distinction when doing work like cataloging, which involves transcribing primary information. You should always retain a “verbatim” copy of the information recorded by the collector; “correcting” spellings, or changing punctuation, may actually lead to you loosing information that the collector was trying to communicate.

What data should you share and under what conditions?

As we discussed above, sharing data is an important form of collection access. This needs to be balanced against the need to keep valuable proprietary information secure and ensure that it is not misused, such as in support of illegal collecting. A Data Use and Access policy can help guide decision making on what data is appropriate to share or keep private, as well as setting conditions on the responsibility of a researcher to reciprocate with results or publications based on the shared data.

The first and most important point to bear in mind is that if your institution has accessioned a specimen, you generally should assert and maintain copyright on all data and images (in any form) relevant to the specimen. This includes not only the data or images that you hold, but also those that others generate from the specimen, including scans (e.g., NMR, X-ray, or CT), measurements, and sampling data. While it can be argued that institutions have an obligation to allow others to use the collections, they are under no obligation to transfer the copyright to information or products that are generated from those collections.

Having said so, it’s generally desirable to allow data or images to be used free of charge for non-commercial scientific or educational uses. You can also specify that the institution is acknowledged as the source of the data. If you do this, it’s advisable to make the use conditional on your obtaining copies of any publications that cite the data provided: aside from helping you ensure that the specimens and information are being used responsibly, proof that your collection is being actively researched can be very helpful when arguing for support from your institution or granting bodies.

Commercial usage may encompass a variety of categories, for example: conventional or electronic publication of photographs or specimen information; use of locality data for environmental impact or other commercial surveys; or reproduction of fossils for sale or display. In these cases, it may be appropriate to make a charge for the use of the data, either in the form of a one-time fee or an ongoing license. The arrangements for commercial usage are complex and beyond the scope of this site, and are best provided by experts in these legal and financial aspects.

What data should I not share?

There are some categories of information that are sufficiently sensitive that you typically should either not share them with others, or do so under very restrictive conditions. These include:

  • Information on storage environments;
  • Storage rooms and specimen locations;
  • Personnel information;
  • Conservation and preparation techniques;
  • Detailed locality information;
  • Specimen status and value;
  • Unpublished information on research being undertaken by staff, students, and visitors. 

Finally, you should remember that once you have provided the information, you may have little or no real control over how it is used. You are also unlikely to want to or be able to provide the user with a warranty regarding accuracy of the data, or its suitability for a particular purpose. For this reason, it’s helpful to ensure that the data are accompanied by a disclaimer of liability for any use that it might be put to later. Once again, legal advice should be sought regarding the wording of such a disclaimer.