You can read them in any order you like (even jump from one to the other). Make sure you try the code out yourself and feel free to try the exercises as well (you can find the solutions here.
These chapters cover functions and methods that you will need to complete Exercise 2 of Summative Assessment 1.
2 Reading multiple files at once
Another important skill to learn is how to read multiple files at once and save the output into a single tibble/data frame.
This can be achieved with the list.files() function.
For example, let’s read individual files with tongue contours data from ultrasound tongue imaging (UTI). These files are in data/coretta2018/ultrasound/.
You see now the full path is return, relative to the Quarto Project directory.
In our case, we really just want to read the *-tongue-cart.tsv files, so we can specify a regular expression to list only those files that contain -tongue-cart.tsv.
There’s another catch. These files don’t have column headings! We need to supply them ourselves as a character vector to the col_names argument of read_tsv(). Alternatively you can set that to FALSE and automatic column names will be created for you.
Finally, we might want to create a new column on the fly which has the file path. This is helpful when the files you are reading don’t have a column that allows you to distinguish data from different files (in these files the first column do this for us).
You can create a new column with the path by specifying a name for this new column as the value of the id argument. With id = "file" a new column called file will be created with the path of the file.
files <-list.files("data/coretta2018/ultrasound",full.names =TRUE,pattern ="*-tongue-cart.tsv")# Column names of the first 14 columns. The rest of the columns are X and Y# coordinates of tongue contours of 42 points along the contour:# X1,Y1,X2,Y2,X3,Y3,...,X42,Y42.## Note that R automatically names unnamed columns with X followed by# the column number, so the 84 coordinate columns will be all named Xn.columns <-c("speaker","seconds","rec_date","prompt","label","TT_displacement_sm","TT_velocity","TT_velocity_abs","TD_displacement_sm","TD_velocity","TD_velocity_abs","TR_displacement_sm","TR_velocity","TR_velocity_abs")tongue <-read_tsv(files, id ="file", col_names = columns, na ="*")
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 7598 Columns: 99
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (4): speaker, rec_date, prompt, label
dbl (94): seconds, TT_displacement_sm, TT_velocity, TT_velocity_abs, TD_disp...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.