Data Analysis for LEL - Week 3

File management

Stefano Coretta

University of Edinburgh

Research project management

Data Management Plan (DMP)

A Data Management Plan (DMP) covers data types and volume, capture, storage, integrity, confidentiality, retention and destruction, sharing and deposit.


Research Compendium

A research compendium accompanies, enhances, or is a scientific publication providing data, code, and documentation for reproducing a scientific workflow.

Research Compendium

A research compendium is a collection of all digital parts of a research project including data, code, texts (protocols, reports, questionnaires, meta data). The collection is created in such a way that reproducing all results is straightforward.

The Turing Way: Research Compendia

Organise files

  • Create one folder and make that the folder for your dissertation project.

  • In that folder, create folders for data/ and for scripts/ (and plots/, dissertation/, etc).

  • In data/ have a raw/ and derived/ folder:

    • Raw data (data that, if lost, it is very unfortunate; for example, experiment data, data which was manually annotated, etc) should be saved in data/raw/.

    • Derived data (data that is derived with scripts) should be saved in data/derived/.

Organise files: example

Back up

Make sure you have a backup system in place.

  • Save copies of the entire folder in an external hard drive.

  • Saving copies of the entire folder in an online storage service (iCloud Drive, One Drive, DropBox, Google Drive, …).

    • But if you are working on that copy via syncing, make sure you have a second independent place you back up to, like a hard drive.
  • Using a versioning system like git.

Research projects are dynamic

  • Be prepared to change how files and folders are organised after you start.

  • Projects evolve over time and sometimes you need to clean things up.

  • Use a good system to mark versions in your files. Two simple systems:

    • Use full DATE in the file name
      • dissertation-2022-11-21.
      • dissertation-2023-03-01.
    • Or use version number
      • Inspired by Semantic versioning from programming but can be helpful with research files too!
      • dissertation-v1.0.
      • dissertation-v1.1.
      • dissertation-v2.0.

File naming don’ts

Licensing

A license gives someone official permission to reuse something while protecting the intellectual property of the original creator.

Use open licenses to ensure the data/code can be used by other researchers.

The Creative Commons licenses are now common in research.

Activity

  • Discuss in small groups.

    • How have you organised your files so far?

    • Something you would like to change?

    • Something you would like to keep?