✏️
Alaska Region Interim Data Management User Guide
  • Alaska Region Interim Data Management User Guide
  • Background
    • Why Data Managment?
    • The Big Picture: Integrating Data Management with Project Management
    • Definition of Project and Product (aka Data Resources)
  • Four Fundamental Activities of Data Management
    • Establish Roles and Responsibilities
    • Quality Management
    • Security and Preservation
    • Documentation
  • Alaska Data Management 101
    • Workflow
    • File Organization and Best Practices
      • Best Practices in Naming Conventions
      • Best Practices for Version Control
      • Changelog Best Practices
    • Alaska Regional Data Repository
    • Data Management Policy
  • Plan
    • Why Data Planning?
    • Data Management Plan Templates
      • Data Standards in brief
    • Project & Data Management Integration
    • Considerations for Projects with External Partners
  • ACQUIRE
    • Common Data Types
      • Open Formats
      • Best Practices in Tabular Data
      • Best Practices in Databases
      • Best Practices in Geospatial Data
      • Best Practices with Collections of Similar Types of Data
      • Best Practices with Source Data
    • Quality Management Procedures
      • Incorporating Data Standards
      • Using Unique Identifiers
  • MAINTAIN
    • Update Metadata
  • Access & Share
    • Open Data Requirements
      • Obtaining a Digital Object Identifier (DOI)
      • Obtaining a URL
      • Sharing without a URL
  • Long-term Storage Options
    • Using the Regional Data Repository
    • Public Accessible Repositories
  • Records Schedule & Disposition
  • Data Management Actions Quick Guide
  • Glossary
Powered by GitBook
On this page
  • Tidy Data
  • Untidy Data

Was this helpful?

  1. ACQUIRE
  2. Common Data Types

Best Practices in Tabular Data

PreviousOpen FormatsNextBest Practices in Databases

Last updated 2 years ago

Was this helpful?

Much of the Region’s data is contained in spreadsheets (i.e., tabular data, most commonly MS Excel files). Multiple spreadsheets within the same file can contain data and derivatives of the data (tables, summaries, pivot tables, formulas, figures). Below are some best practices when dealing with this type of data.

  • One sheet in the file should contain a clean version of the data. Nothing else should be on this sheet. This is sometimes also referred to as tidy data. In tidy data:

    • The first row contains variable names and each column represents one variable. Variable names should use only letters, numbers, dashes, “-“, and underscores, “_”. Do not use spaces or any other characters. The variable name should include the unit where this is relevant (e.g., length_cm and weight_g).

    • Each row after the first row should represent one observation.

    • Avoid formatting information in this sheet (e.g., comma in the thousands place, font settings, border lines, colors, etc). If the formatting is there to convey some information; instead, add a new variable to record that information. For example, instead of highlighting a cell to indicate a possible error, add another column and assign an error code to the observation (e.g. 1=sensor failure, 2=lost data sheet).

    • For the purposes of tidy data, blank cells indicate that the data point is missing; some use -9999 to indicate missing data. “0” in a cell means that the data point was collected and it was “0”. The treatment of missing data should always be recorded in the data dictionary (see below).

    • The tidy data sheet, in addition to being part of the workbook, should also be saved in an open format (e.g. Text or CSV or TSV) using the same name as the Excel file (e.g., fish_data.xlsx and fish_data.csv) in the same archive folder as the Excel file.

  • One sheet in the file should provide a brief description of each variable in the tidy data. Each row of this sheet represents one variable. This is termed the data dictionary and will be part of the metadata record for the tidy data. An example and a template for the data dictionary is available .

  • Other sheets in the file can contain summaries of the data (pivot tables), graphical representations of the data (figures), or derived quantities from the data (formulae, macros, etc). Avoid including metadata on these sheets.

  • One sheet in the file should provide a brief description of each sheet in the file (what does it contain, any relevant information about its use) like a table of contents. Each row of this sheet represents one sheet of the file. First column is the sheet name, the second column is the sheet description.

  • Save the original workbook in the most recent format supported by the application. For example, save Excel files in .xlsx format rather than .xls format.

Tidy Data

Untidy Data

Why is this data untidy?

  • There are three different tables on one sheet

    • It is necessary for human intervention to tidy the spreadsheets up before they are machine-readable

  • One row is NOT one observation

  • One column is NOT one variable

    • Some of the values have indications of additional information

  • Totals and percentages are calculated in the sheet

    • Some of the rows have indications of formatting embedded

Resources to tidy your data

How to merge and tidy data with Excel
How to tidy data with Python
here
Tidy Data Example
Untidy Data Example