File management

Author

Stefano Coretta

Published

September 15, 2024

1 Data Management Plan (DMP)

A Data Management Plan (DMP) covers anything related to data types, their volume (size), how you will acquire or collect them, where and how you will store them, and aspects about data integrity, confidentiality, retention and destruction, sharing and deposit.

The University of Edinburgh has two important resources:

Here is what a DMP project looks like in the UoE DMP Online tool.

2 Research Compendium

Research compendia are all the data, files, publications related to a research project. Here some definitions:

A research compendium accompanies, enhances, or is a scientific publication providing data, code, and documentation for reproducing a scientific workflow.

Research Compendium

A research compendium is a collection of all digital parts of a research project including data, code, texts (protocols, reports, questionnaires, meta data). The collection is created in such a way that reproducing all results is straightforward.

The Turing Way: Research Compendia

It is important to think about how you will structure and keep up your research compendium, from the initial stages throughout the project and after completion.

It is usually helpful to have a single folder on a computer which contains everything. The following sections give you some tips on how to organise the folder.

3 Organising and naming files

Some of the following suggestions come from Organising files and File naming which are part of the Open Science Framework (OSF) Support pages.

To organise your files:

  • Create one folder and make that the folder for your research project folder.

  • In that folder, create folders for data/ and for scripts/ (and plots/, dissertation/, etc).

  • In data/ have a raw/ and derived/ folder:

  • Raw data (data that, if lost, it is very unfortunate; for example, experiment data, data which was manually annotated, etc) should be saved in data/raw/.

  • Derived data (data that is derived with scripts) should be saved in data/derived/.

The figure below shows an example of an MSc research project.

You also want to carefully think about file naming. Check the File naming for tips.

Avoid common pitfalls in naming your files, like the ones in the following example.

4 Back up

Make sure you have a backup system in place.

  • Save copies of the entire folder in an external hard drive.

  • Saving copies of the entire folder in an online storage service (iCloud Drive, One Drive, DropBox, Google Drive, …).

    • But if you are working on that copy via syncing, make sure you have a second independent place you back up to, like a hard drive.
  • Using a versioning system like git.

5 Research projects are dynamic

  • Be prepared to change how files and folders are organised after you start.

  • Projects evolve over time and sometimes you need to clean things up.

  • Use a good system to mark versions in your files.

Two simple systems to mark file versions:

  • Use full DATE in the file name
    • dissertation-2022-11-21.
    • dissertation-2023-03-01.
  • Or use version number
    • Inspired by Semantic versioning from programming but can be helpful with research files too!
    • dissertation-v1.0.
    • dissertation-v1.1.
    • dissertation-v2.0.

6 Licensing

A license gives someone official permission to reuse something while protecting the intellectual property of the original creator.

Use open licenses to ensure the data/code can be used by other researchers.

The Creative Commons licenses are now common in research.