ezEML Guidebook for PAL Datasets

A screenshot of the ezEML interface showing the title screen and the full menu of options.

As an LTER site, we archive most of our PAL datasets in the EDI data repository.  To publish a dataset with EDI, you need to create or update an Ecological Markup Language (EML) file with all the appropriate metadata information necessary to describe your dataset.  In this past, this meant writing an EML file by hand or with complex coding scripts (neither of which were a lot of fun).  Luckily, EDI now provides a web-based wizard, called ezEML, that walks you through each of the required or recommended metadata elements you need to include.

This guide highlights how you can use ezEML to assemble metadata for your PAL dataset.

Login

You will need to use a Google, Orcid, GitHub or Microsoft account to log into the site.

New Datasets

If you are creating a new dataset, we recommend using the New from Template feature and choosing the PAL Starter Template from the LTER / PAL directory.  This template includes pre-populated items for many of the fields.  

Alternatively, you can start with a blank document, and copy items as needed from the PAL starter template or other datasets using the Import/Export menu.

Existing Datasets

If you are updating a dataset that has already been published to EDI, and you do not already have a working copy of it in ezEML, use the “Fetch a Package from EDI” option in the Import/Export menu to import the existing version.  Note, you will probably want to rename the dataset after importing it, because the default name is not very helpful.

Metadata Notes

Once you have a working document open, here are a few sections to pay attention to.

  1. TITLE / ABSTRACT: If you started with the PAL template, you will want to replace the Title and Abstract with ones appropriate for your dataset.
    • Each dataset should have a descriptive title, like a paper title, which should include the date range, e.g., 2020-2024.
  2. DATA FILES: If your dataset includes one or more CSV files, upload them to the “Data Tables” section, and then work through the wizard to classify the 1) type, 2) name, 3) description, and 4) units for each column.  
    • There are a number of optional fields you can also include for each column, like missing value codes, numeric bounds, and column labels.  The more you add, the more helpful your metadata will be to future users.
    • If your dataset is a NetCDF, GeoPackage, zipped image archive, or other common format, upload it to the “Other Entities” section.  
    • The Other Entities tab can also be used to upload ancillary files, like PDFs, zipped images, or software code, that are appropriate for more fully describing your dataset.  This could include instrument logs, maps, or processing code.  
    • If you need help determining the best format for your data files or how best to turn them into archived datasets, please reach out to Sage.
  3. PEOPLE
    • Creators should include those “authors” who are most responsible for the dataset.  For core datasets, this is typically the historical sequence of PIs starting with “Palmer Station Antarctica LTER”.  For short-term projects or derived datasets, this might be the post-doc, student and/or research technician with their PI as the second or final author.  (It’s really up to the research team.)  
    • The Contact and Metadata Provider should be the primary author, PI, and/or PAL IM as desired.  
    • Associated Parties can be used to add students, techs or others who helped with data collection and processing.  
    • For all of the above, please include the person’s full first and last name, organization, ORCID, and email.  Do not include physical addresses, as they are not relevant anymore.  If you include any associated parties, you will also need to specify their role in creating the dataset.  Remember, you can use the Import/Export menu to easily copy people records across documents.
  4. KEYWORDS: Each dataset should have 4-6 keywords, including at least 1 LTER Core Area.  
    • You can use ezEML’s built in LTER keyword list or add your own.  The template includes all of the core areas, so simply delete the ones you do not need.
    • LTER Core Areas: Signature, Disturbance Patterns, Population Studies, Primary Production, Inorganic Matter, Organic Matter
  5. RIGHTS: For PAL datasets, we typically use CC-BY.
  6. GEOGRAPHIC COVERAGE: Two regions are provided in the template for the PAL cruise and Palmer Station areas.  You can use either of these or customize the bounding box to suit your dataset. Alternatively, if your dataset includes data from 12 or fewer point stations, you can include those as individual points instead of providing Lat/Lon info elsewhere in the dataset.
  7. TEMPORAL COVERAGE: Specifying the years of coverage is typically sufficient (e.g. 2020-2024).  This should match the dataset’s title. 
  8. MAINTENANCE: Specify whether this dataset will be updated annually, or if further updates are not expected.  This section is also used to add version notes when the dataset is updated.
  9. PUBLISHER and PUBLICATION INFO: Ignore these sections.
  10. METHODS: Use this section to describe how the dataset was collected and processed.  
    • Methods should be more detailed than the abstract, describing the who and how, while the abstract covers the what, when, where, and why.  
    • When possible, data methods papers should be cited in this section, including a shortened reference with DOI.  
    • If needed, more detailed methods can also be uploaded as an ancillary file (text or pdf) in the “other entities” section.  This is useful when methods are best presented as several pages of formatted text and a data methods paper does not yet exist.
  11. PROJECT: If your dataset is PAL related, you can probably leave this section as it is in the current template.  However, if another project supported this dataset, or if PAL was a secondary project, you can update this section.
  12. DATA PACKAGE ID: This will be provided by Sage.

The title, abstract, and methods are the ones you’ll probably need to spend the most time on.

Collaboration

If desired, you can use the “Collaborate” feature to share your document with other authors.  This currently works a bit like Google Docs in 2010, when only one person could really edit a shared document at a time.  In ezEML, the documented is locked when another user is editing it to prevent errors. 

Publishing

When you are ready for your dataset to be reviewed, please use the “Collaborate” feature to share your dataset with sage@marine.rutgers.edu.  If this is a Palmer LTER dataset, do not share the dataset with EDI directly, as they will publish it under a generic ID and not within the PAL collection.  

The PAL Information Manager (Sage) will review your dataset and metadata to make sure it conforms with PAL and LTER standards.  Once it’s ready, the PAL IM will take care of publishing the dataset in the EDI archive and you can begin work on the next version.

If you ever have any questions about your dataset or what metadata is necessary to support it, please don’t hesitate to reach out to Sage, your friendly PAL Information Manager!

Additional References

For more guidance, please see: