An important skill to learn is how to read multiple files at once and save the output into a single tibble/data frame.
This can be achieved with the list.files() function.
For example, let’s read individual files with tongue contours data from ultrasound tongue imaging (UTI). These files are in data/coretta2018/ultrasound/.
You see now the full path is return, relative to the Quarto Project directory.
In our case, we really just want to read the *-tongue-cart.tsv files, so we can specify a regular expression to list only those files that contain -tongue-cart.tsv.
There’s another catch. These files don’t have column headings! We need to supply them ourselves as a character vector to the col_names argument of read_tsv(). Alternatively you can set that to FALSE and automatic column names will be created for you.
Finally, we might want to create a new column on the fly which has the file path. This is helpful when the files you are reading don’t have a column that allows you to distinguish data from different files (in these files the first column do this for us).
You can create a new column with the path by specifying a name for this new column as the value of the id argument. With id = "file" a new column called file will be created with the path of the file.
files <-list.files("data/coretta2018/ultrasound",full.names =TRUE,pattern ="*-tongue-cart.tsv")# Column names of the first 14 columns. The rest of the columns are X and Y# coordinates of tongue contours of 42 points along the contour:# X1,Y1,X2,Y2,X3,Y3,...,X42,Y42.## Note that R automatically names unnamed columns with X followed by# the column number, so the 84 coordinate columns will be all named Xn.columns <-c("speaker","seconds","rec_date","prompt","label","TT_displacement_sm","TT_velocity","TT_velocity_abs","TD_displacement_sm","TD_velocity","TD_velocity_abs","TR_displacement_sm","TR_velocity","TR_velocity_abs")tongue <-read_tsv(files, id ="file", col_names = columns, na ="*")
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
Rows: 7598 Columns: 99
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (4): speaker, rec_date, prompt, label
dbl (94): seconds, TT_displacement_sm, TT_velocity, TT_velocity_abs, TD_disp...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.