ezEML Guidebook for PAL Datasets

A screenshot of the ezEML interface showing the title screen and the full menu of options.

As an LTER site, we archive most of our PAL datasets in the EDI data repository. To publish a dataset with EDI, you need to create or update an Ecological Markup Language (EML) file with all the appropriate metadata information necessary to describe your dataset.

In the past, this meant writing an EML file by hand or with complex coding scripts (neither of which were a lot of fun). Luckily, EDI now provides a web-based wizard, called ezEML, that walks through each of the required or recommended metadata elements you should consider including.

This guide provides an overview on how to use ezEML to assemble your PAL dataset and appropriate metadata for archiving with EDI.

Please refer to the PAL Metadata Guide for additional guidance on how to align the required metadata fields with PAL standards. The guide also includes a list of common PAL dataset column definitions.

Login

You will need to use a Google, Orcid, GitHub, or Microsoft account to log into the site. This can be a personal or university account (it’s up to you).

New Datasets

If you are creating a new dataset, we recommend using the New from Template feature and choosing the PAL Starter Template from the LTER / PAL directory. This template includes pre-populated items for many of the fields.

Alternatively, you can start with a blank document, and copy items as needed from the PAL starter template using the import feature.

Or you can start by adapting an existing dataset using the Import/Export menu. “Fetch a Package from EDI” allows you to pull a copy of an existing dataset from EDI, or you can upload a file using the “Import EML File (XML).” Note, if you dataset is PAL related, you will likely want to refer to the PAL Starter Template to make sure you have the most up to date information, especially for the Project section.

Existing Datasets

If you are updating a dataset that has already been published to EDI, and you do not already have a working copy of it in ezEML, use the “Fetch a Package from EDI” option in the Import/Export menu to import the existing version.

Note, you will probably want to rename the dataset after importing it, because the default name is not very helpful.

Metadata Notes

Once you have a working document open, here are a few sections to pay attention to.

TITLE / ABSTRACT: If you started with the PAL template, you will want to replace the Title and Abstract with ones appropriate for your dataset.
- Each dataset should have a descriptive title, like a paper title, which should include the date range, e.g., 2020-2024.
DATA FILES: If your dataset includes one or more CSV files, upload them to the “Data Tables” section, and then work through the wizard to classify the 1) type, 2) name, 3) description, and 4) units for each column.
- There are a number of optional fields you can also include for each column, like missing value codes, numeric bounds, and column labels. The more you add, the more helpful your metadata will be to future users.
- If your dataset is a NetCDF, GeoPackage, zipped image archive, or other common format, upload it to the “Other Entities” section.
- The Other Entities tab can also be used to upload ancillary files, like PDFs, zipped images, or software code, that are appropriate for more fully describing your dataset. This could include instrument logs, maps, or processing code.
- If you need help determining the best format for your data files or how best to turn them into archived datasets, please reach out to Sage.
PEOPLE:
- Creators should include those “authors” who are most responsible for the dataset. For core datasets, this is typically the historical sequence of PIs starting with “Palmer Station Antarctica LTER”. For short-term projects or derived datasets, this might be the post-doc, student and/or research technician with their PI as the second or final author. (It’s really up to the research team.)
- The Contact and Metadata Provider should be the primary author, PI, and/or PAL IM as desired.
- Associated Parties can be used to add students, techs or others who helped with data collection and processing.
- For all of the above, please include the person’s full first and last name, organization, ORCID, and email. Do not include physical addresses, as they are not generally relevant anymore. If you include any associated parties, you will need to specify their role in creating the dataset. Remember, you can use the Import/Export menu to easily copy people records across and even within documents.
KEYWORDS: Each dataset should have 4-6 keywords, including at least 1 LTER Core Area.
- You can use ezEML’s built in LTER keyword list or add your own.
- The template includes all of the core areas, so simply delete the ones you do not need.
- LTER Core Areas: Signature, Disturbance Patterns, Population Studies, Primary Production, Inorganic Matter, Organic Matter
RIGHTS: For PAL datasets, we typically use CC-BY.
GEOGRAPHIC COVERAGE: Two regions are provided in the template for the PAL cruise and Palmer Station areas. You can use either of these or customize the bounding box to suit your dataset. Alternatively, if your dataset includes data from 12 or fewer point stations, you can include those as individual points instead of providing Lat/Lon info elsewhere in the dataset.
TEMPORAL COVERAGE: Specifying the years of coverage is typically sufficient (e.g. 2020-2024). This should match the dataset’s title.
MAINTENANCE: Specify whether this dataset will be updated annually, or if further updates are not expected. This section is also used to add version notes when the dataset is updated.
PUBLISHER and PUBLICATION INFO: Ignore these sections.
METHODS: Use this section to describe how the dataset was collected and processed.
- Methods should be more detailed than the abstract, describing the who and how, while the abstract covers the what, when, where, and why.
- When possible, data methods papers should be cited in this section, including a shortened reference with DOI.
- If needed, more detailed methods can also be uploaded as an ancillary file (text or pdf) in the “other entities” section. This is useful when methods are best presented as several pages of formatted text and a data methods paper does not yet exist.
PROJECT: If your dataset is PAL related, and you started with the PAL starter template, you can probably leave this section alone. However, if another project supported this dataset, or if PAL was a secondary project, you should update this section accordingly.
DATA PACKAGE ID: If this is a new dataset, this will be provided by Sage. If it is an existing dataset being updated, simply increase the version number at the end.

While this is a big list, in the end, the title, abstract, and methods are the sections you’ll need to spend the most time on, as long as your data file is in good shape.

Collaborative Editing

If desired, you can use the “Collaborate” feature to share your document with other authors and the PAL Information Manager (Sage). This currently works a bit like Google Docs back in the early 2010’s when only one person could really edit a shared document at a time.

In ezEML, the document will be locked to prevent errors when another user is actively editing it.

Publishing

When you are ready for your dataset to be reviewed, please use the “Collaborate” feature to share your dataset with sage@marine.rutgers.edu.

If this is a Palmer LTER dataset, do not share the dataset with EDI directly, as they will publish it under a generic ID and not within the PAL collection.

The PAL Information Manager (Sage) will review your dataset and metadata to make sure it conforms with PAL and LTER standards. Once it’s ready, the PAL IM will take care of publishing the dataset in the EDI archive. And then you can begin work on the next version 🙂

Additional References

If you ever have any questions about your dataset or what metadata is necessary to support it, please don’t hesitate to reach out to Sage, your friendly PAL Information Manager!

For more guidance, please see:

PAL Metadata Guide
Creating Metadata for a Data Package from EDI
EML Best Practices