Databasing

Although many large institutions have their primary catalog information on paper, most have been moving towards managing their collection information in databases that are capable of storing not only specimen data, but also different types of associated media (e.g., specimen images), and which can make these available and searchable by the academic community and general public via the worldwide web.

The choice, design, and building of a collection database is an enormous topic, far beyond the scope of this website. But because it is such an important topic, what follows is a quick overview of some of the basic things to keep in mind for any collection.

What is the difference between a flat file and relational database?

For small institutional and private collections it may be sufficient to store information in an easily available spreadsheet program like Microsoft Excel or another flat file database, where all the information is kept in a single large table.  However, these days modern relational database programs, from basic, off-the-shelf packages like Paradox, MySQL, or Microsoft Access to specialty programs like Specify or EMu, allow powerful searches cross-referencing of data, and association of specimen images and other media.  In a relational database, information is kept in various tables linked together by use of a common field, such as the specimen’s accession number or catalog number. The advantage to this approach is that data need to be entered only once. For example, locality or excavation information entered into one table can be linked to multiple specimens.

For an individual collector or small institution, it might be enough to have your database on one computer terminal that is shared by everyone who needs access to the information. In larger institutions, however, it is essential that multiple staff members and researchers are able to access a collections database at the same time; this requires moving from a stand-alone system to a client-server model, in which multiple computers are networked. The end-user at a client terminal connects to the central server to access the database information. This ensures that all users are accessing the same, up-to-date information. No matter what system you use, be sure to regularly backup the database and store copies in other safe places!

As mentioned in the section on Cataloging each discrete unit of data in a database is a field.  It is better to have fields be short and specific, allowing them to be searched most effectively.  It is always possible to combine fields later if you need to, but it is much more time intensive to separate data down the road.

Should I buy a database system or develop one in-house? 

Programs like Access are commonly available as bundled software pre-installed on new computers, whilst others, such as MySQL, can be downloaded free of charge; they are easy enough to learn that most collectors can now develop relational databases on their own.  For smaller institutions, developing your own database “in-house” may seem like an attractive proposition because of low start-up costs. However, it can be extremely time-consuming; even when  the design and development phases are completed, there can be significant issues (especially money and personnel time) involved with maintaining the database. Contracting with a database designer to build the system can cut down on the time taken to set up the database, but your developer may be unable or reluctant to provide long-term support, thereby reducing the long-term effectiveness and viability of your system.


Database applications  like Microsoft® Access can be used to catalog and track specimens

There are some commercial software packages that are aimed at both individual collectors and institutions.  A certain degree of customization is often possible, especially for some of the larger and more costly programs, where it may be possible to tailor the software package to perform functions that are specific to the particular institution.  For a more extensive discussion on how to select software see the chapter on Computerized Systems in The New Museum Registration Methods. The Canadian Heritage Information Network (CHIN) also has an excellent website with extensive information on the steps necessary in planning and implementing a collections database.

Data Standards

One of the first things to think about before embarking on building a database is what types of information will need to be stored in it. Obviously, if you already have a card or paper catalog, this will provide a set of fields to work from. However, if you are intending to share your data, then it’s important to make sure that you are collecting the same kind of information, and storing it in the same way, as other institutional collections. Increasingly, museums and other institutions that hold natural history collections have been looking for ways to allow researchers and the public to use the World Wide Web to compare data from different collections via common “portal” sites – The Paleontology Portal is just one of these sites. For this cross-collection searching to work effectively, different institutions need to be using the same core set of data types, known as a “data standard.” In recent years, the natural history collections community has been working to develop a standard for natural history specimen data, including fossils, which is known as the “Darwin Core.” You can find out more about the Darwin Core by following this link.

What kinds of data information should I be storing?

Potential fields can include, but are not limited to:
  • Object/specimen name
  • Description
  • Collection date
  • Collection site, town, county, state
  • Habitat/depositional environment, latitude, longitude, elevation, depth
  • Collector
  • Identified by and date
  • Cataloger and date
  • Condition
  • Value (at collection, current)
  • Dimensions and weight
  • Corresponding image/photograph numbers
  • Restriction on use, publication citation, reproduction
  • For more ideas click here

Data Modeling

Whether you build the database yourself, hire a designer to build it for you, or purchase a specialist collection database software package, you will need to define the different types of information that you will be collecting, how these data relate to each other, and how the database will store them. This process is known as “data modeling”. At its most basic level, data modeling involves creating a definition for every field in the database: what information will be entered in the field; how long the field will be; whether the contents will be numbers, letters, or a combination of both; whether you will be able to type anything into the field, or choose options from a fixed list, etc. The resulting list of field definitions is called a “data dictionary.” Once you have this, you can begin to think about how the different fields are related to each other: will it be a one-to-one relationship (e.g., a specimen can have only one catalog number), a many-to- one relationship (e.g., many different specimens may be collected from a single locality), or a many-to-many relationship (e.g., a specimen may be collected by more than one collector, and a collector may collect many specimens). These relationships will help you decide how many tables the database will need to best store your data, and how these tables should be linked together. The end result is known as the data model – you can see an example of one by following this link.

Data modeling requires you to think carefully about your collection and how it is used. It can be a very tedious and time-consuming exercise, so there often is a temptation to skip it in the rush to get data digitized and rapidly available on-line. However, it’s no exaggeration to say that most of the problems that arise in database design projects come from inadequate or insufficient data modeling, so avoid this temptation.

Backing up your database 

The data contained in a collections database are of paramount importance and so it is essential to have procedures in place to back up your data to protect against data loss or corruption. You should have a plan in place for backing up your database on a regular basis onto some media that, ideally, can be stored off-site. At its simplest level, backing up may involve copying your database to an external drive, or burning a copy to disk, on a regular basis. Depending on the rate at which you add data to the database, this could be daily, weekly, or monthly. Automating the process removes any chance that you will forget to back up; most external hard drives now have utilities that will allow this.  If you are in a larger institution, you usually can arrange to automatically back-up your records to a computer in another location, physically protecting it better.

What data should I put on the web?

Many databases now allow for easy publishing of data records onto the web, where they can be easily accessed by researchers and other interested parties. The collaborative PaleoPortal project allows you to search and access the paleontology collections databases of many major institutions at the same time. This type of data-sharing has great potential to stimulate research and innovative education projects, but care must be taken in terms of what information is made available, especially to protect fossil sites. For more information on this topic see Data and Data Management in the Sharing section of the website.

For more information

Take a look at the following links and references:

  • Buck, Rebecca A. and Jean Allman Gilmore, eds. 1998. The New Museum Registration Methods. Washington DC: American Association of Museums.
  • The Canadian Heritage Information Network (CHIN) also has an excellent website with extensive information on the steps necessary in planning and implementing a collections database.