Palmer LTER Metadata Guidebook

This guidebook was developed to assist PAL researchers in compiling the metadata needed to properly describe their datasets for archiving in the EDI Data Portal. We recommend using ezEML to assemble your data and metadata into a fully documented data package for submission (check out the PAL Guide to ezEML), but this guide goes into additional detail on how to standardize your data package with other PAL datasets.

Column Names & Definitions

Whenever possible, you should try to use the same column names and definitions as other PAL datasets. Please refer to the PAL Common Column Definitions list for suggestions.

This list is a work in progress. If you have any recommendations for changes or new standard columns, please reach out to the IM team.

Dataset Metadata

When compiling metadata for your data package, we recommend giving ezEML a whirl. It’s easy to use, and it will guide you through all the pieces you need to pull together. In addition, we have a PAL Starter Templatealready loaded in ezEML that will pre-populate many fields for you. (See the ezEML guide for more.)

Alternatively, if you’d prefer to use a Word template to compile your metadata, that’s ok too. Simply send the final version to the IM who can help you review and publish it.

Either way, here are some tips for each key piece of metadata you will need to include with your data package.

Field	Notes	Responsibility
Title	Each dataset should have a descriptive title, much like a paper title, that should describe “what” the dataset includes, along with “where” and “when” it was collected. Ideally, the title should include a date range, e.g., 2020-2024. For example: Chlorophyll and phaeopigments from water column samples, collected at selected depths aboard Palmer LTER annual cruises off the coast of the Western Antarctic Peninsula, 1991 – 2020.	Creator
Data Tables (CSV)	A data package requires at least one data file to be included in either the Data Tables section, or the Other Entities section, or both. CSV files should be added as a Data Table. For each table/file, you will need a name (typically something very similar to the file name), and a description. You will then need to classify the type, name, description, and units for each column. * Type: Text, Numerical, DateTime or Categorical * Name: The column label as it appears in the data file. Do not use special characters. Ideally, units should not be included in the column name. Spaces are ok, but many people prefer CamelCase or underscore_case. * Definition: Describe what the column represents in detail. * Units: Only needed if the column is numeric. When possible, datasets should use units specified in the LTER Unit Directory. If you are using ezEML, these units are built in, but you can also specify custom units and definitions if needed. * Code Definitions: Only needed if the column is categorical. * Date Format: Ideally this should be either “yyyy-mm-dd”, or “yyyy-mm-dd hh:mm:ss” * Missing value code(s): Specify one or more codes if needed, e..g. -999. (Personally, I prefer blanks.) Other file types, including NetCDF, GeoPackage, zipped image archives, or other common formats should be uploaded to the Other Entitiessection (see below).	Creator with IM support
Creators	Creators will show up as the authors in the dataset citation. These are the individuals who have provided intellectual or other significant contributions to the creation of this dataset, much like the authors of a research paper. Therefore, creators should include those “authors” who are most responsible for the dataset. For our core datasets, “Palmer Station Antarctica LTER” is typically included as the first author, followed by the historical sequence of PIs. For short-term projects or derived datasets, the creators might lead with the post-doc, student, and/or technician with their PI as the second or final author. (It’s really up to the research team.) For each author, include the full first and last name, organization, ORCID, and email. Do not include physical addresses, as they are no longer as relevant.	Creator
Contacts	The Contact should be the primary author and/or the PAL IM as desired. Include name, ORCID and email.	Creator or IM
Associated Parties	Associated Parties can be used to give credit to students, field crew, lab techs, or others who helped with data collection, entry, or processing. (This has not been commonly used to date.) Include their full name, ORCID, and email. You will also need to specify the “role” the person had in creating the dataset.	Creator
Metadata Provider	The Metadata Provider should be the primary author and/or PAL IM as desired. Include name, ORCID, and email.	Creator or IM
Abstract	Include what, why, where, when, and how. Abstracts are typically 200-500 words.	Creator
Keywords	Each dataset should have 4-6 keywords. You can use ezEML’s built in LTER Controlled Vocabulary list or add your own custom keywords. Each LTER dataset should also include at least 1 LTER Core Area from the following list. LTER Core Areas: Signature, Disturbance Patterns, Population Studies, Primary Production, Inorganic Matter, Organic Matter	Creator
Intellectual Rights	For PAL datasets, we typically use Creative Commons by Attribution (CC-BY), in keeping with LTER Data Policy.	IM
Geographic Coverage	All datasets should include at least one geographic point or bounding box. Two regions are provided in the starter template for the cruise (WAP) and Palmer Station areas. You can use either of these defaults or customize the bounding box to suit your dataset. Alternatively, if your dataset includes data from 12 or fewer stations, you can include those as individual points here instead of providing Lat/Lon info elsewhere in the dataset.	Creator / IM
Temporal Coverage	Specifying the years of coverage is typically sufficient (e.g. 2020-2024). This should match the dataset’s title.	Creator / IM
Taxonomic Coverage	When possible, you can add taxonomic links to the WORMS marine species database. However, if the dataset includes more than a dozen or so species, a separate data table with an index of all the taxonomic IDs included would be more appropriate.	Creator / IM
Maintenance	Specify whether this dataset will be updated annually, or if further updates are not expected. For example, “This dataset is updated annually, following the completion of the LTER research season and austral summer.” This section should also be used to add version notes when the dataset is updated. For example: “Version 8 – Added data from the 2023-2024 field season, and changed the units of the depth column from decibars to m.”	Creator / IM
Publisher	Ignore this section. It will be filled in automatically when the dataset is published.	n/a
Publication Info	Ignore this section. It will be filled in automatically when the dataset is published.	n/a
Methods	Use this section to describe how the dataset was collected and processed. Be specific about the study design and field and lab methods for collecting and processing the data. Include instrument descriptions and protocol citations when possible. Methods should be more detailed than the abstract, describing the who and how, while the abstract covers the what, when, where and why. When possible, data methods papers should be cited in this section, including a shortened reference with DOI. If needed, more detailed methods can also be uploaded as an ancillary file (text or pdf) in the Other Entities section. This is useful with the methods are best presented as several pages of formatted text and a data methods paper does not yet exist.	Creator
Project	If your dataset is PAL related, you can simply copy this information from the starter template. However, if another project supported this dataset, or if PAL was a secondary project, you should update this section.	IM
Other Entities	The Other Entities section can be used to upload non-CSV data files (e.g. GeoPackage, NetCDF, or zipped image files), as well as ancillary that are appropriate for providing extra context to your dataset (e.g. PDFs, zipped images, or software code, including instrument logs, maps, or processing code). If you need help determining the best format for your data files or how best to turn them into archived datasets, please reach out to the PAL IM.	Creator with IM support
Data Package ID	This will be provided by the IM.	IM
Citations	There are three kinds of citations a data package can include: * literatureCited: Used to reference external resources cited in the dataset, for example methods papers or other published protocols used to derive the data. * referencePublication: Used to reference a single “data paper” that describes how the dataset was created and initially used. (Typically, this cannot be included with the first version of a dataset as the paper would still be in review.) * usageCitation: Used for publications that reference the dataset, such as a journal article presenting research results derived from the data. (These are typically added using EDI’s data portal directly, since they are often published after the dataset, and typically it is impractical to update a dataset just to include new citations.)	Creator / IM

When creating a new version of an existing dataset, assuming the data file format hasn’t changed, you will likely need to spend the most time on updating the Title, Abstract and Methods.

Some other considerations

Do you have other information or files to include in the data package?

If you have PDF documentation or text files with extended sampling protocols, analysis descriptions, processing code, pictures, or maps that cannot easily be fit into the “methods” metadata tag, you can include these files in the “Other Entities” section. You will need to provide a name and description for each file.

Does it make sense for your dataset to have more than one file?

Traditionally, most PAL datasets consisted of a single CSV file. But under certain circumstances, it might make sense to provide multiple files.

Large CSV files could be broken up into separate (e.g. annual or decadal) chunks, making them easier to download individual.
Data tables that have several columns repeated across multiple rows might make more sense to break up into related tables. For example, CTD cast information is repeated for each bottle, you could create one table for indexing CTD casts, and a second that includes measurements from each bottle, referencing the CTD cast table.
In some instances, it might make more sense to have related tables included in the same dataset, instead of providing them as separate datasets. This would make sense if the tables would typically be accessed together, or when they were processed as part of the same workflow.

Was your dataset derived from other datasets?

If your dataset was created using existing PAL datasets, or data from other archives, you need to include the provenance information for those datasets. This would include the title, authors, and URL/DOI. This information should be included in the Methods section. If you are using ezEML, you can easily include links to another dataset available in EDI. But this can be tricky, so please ask the PAL IM for help.

More Information

For more guidance, please see: