Tidy data
University of Edinburgh
Tabular (or rectangular) data (like spreadsheets).
Audio and/or video recordings.
Texts or transcripts.
Annotation (ELAN, TextGrids, …).
Images.
Prefer formats like Comma Separated Values (.csv
) or Tab Separated Values (.tsv
) over MS Excel files. For big data, use Parquet files.
If you use Excel files, keep one sheet per Excel file! (Don’t have data in multiple sheet within the same Excel file).
Include ONE TABLE per file. (You can transform and summarise data in R).
Use explicit coding:
Don’t use colours to code your data! (Software like R will discard colours and any formatting).
Each variable to be coded should have its own column.
Use clear labels: