Skip to main content
Skip table of contents

Validating Metadata

Just as you would analyze your data for accuracy, you must validate your metadata and manifest for completeness, correct values, and proper format. This helps to achieve consistency across data files, ensuring that the metadata works with our systems and ultimately optimizing data discoverability.

DCC Validator

Sage has built an application called the dccvalidator that allows for automated validation of metadata. It streamlines the process of creating consistent, complete metadata.

The app performs different types of checks to ensure that:

  • All required columns from the templates are present

  • Individuals and specimens have unique identifiers

  • The values in the cells are correct if the column uses controlled values (for example, TRUE is used instead of Yes or Y)

  • If both individuals and biospecimen templates are used, each biospecimen is linked to an individual

Once the dccvalidator runs, you can view the results as a summary of the files you have uploaded, showing the number of individuals, specimens, and files. We visualize the data in each column by its data type to help spot unexpected missing values.

How to use the dccvalidator

Prerequisites

To use the dccvalidator, you must:

Instructions

  1. Navigate to the Validator tab in the app

  2. Under Does the study currently exist? Select Yes or No

    • If Yes, select your study from the dropdown menu

    • If No, manually enter your study name in the field provided

  3. From the list provided, select the Species used in your study

  4. From the list provided, select the Biospecimen Type used in your study

  5. Using the drop down menu, select the Assay Type that matches your assay metadata

  6. Under each of the metadata file spaces provided, click Browse, and locate the file on your computer

    • If your file was successfully uploaded, its name will appear next to Browse (where it previously said No file selected)

    • Hover over the blue question marks next to each file type to reveal special requirements for that file

  7. Once all four of your files have been uploaded, click Validate. You must upload all four files at once.

  8. Any issues or potential issues with your file will appear in the Warnings and/or Failures boxes. Update your files based on those messages, and then re-upload the updated files. Find more information on how to interpret and troubleshoot some common error messages below.

  9. Once all of your files pass validation, you’re not ready to upload data yet. First, email the AD Data Liaison (AMPAD_SageAdmin@synapse.org). The team will need to do further validation, manually, for any items not checked by the dccvalidator. Once they have determined your metadata is complete and correct, they will notify you with official permission to upload your data and metadata.

Common validation errors and how to fix them

Error: Missing columns in the individual metadata file

  • A column in your template has been deleted. To fix, add the missing column to your template

Error: Some values in the biospecimen metadata are invalid

Error: individualID values are mismatched between individual and biospecimen

  • An individual in your biospecimen file is not in the individual file. To fix, make sure all of the individualIDs in your biospecimen file are also in your individual file


Note: Some portions of the app submit metadata to Synapse. This allows curators at Sage to troubleshoot issues if needed—the metadata will not be accessible to anyone outside the Sage curation team.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.