I'd like to import files (of different lengths) recursively from sub-directories and put them into one data.frame, having one column with the subdirectory name and one column with the file name (minus the extension):
e.g. folder structure
IsolatedData
00
tap-4.out
cl_pressure.out
15
tap-4.out
cl_pressure.out
So far I have:
setwd("~/Documents/IsolatedData")
l <- list.files(pattern = ".out$",recursive = TRUE)
p <- bind_rows(lapply(1:length(l), function(i) {chars <- strsplit(l[i], "/");
cbind(data.frame(Pressure = read.table(l[i],header = FALSE,skip=2, nrow =length(readLines(l[i])))),
Angle = chars[[1]][1], Location = chars[[1]][1])}), .id = "id")
But I get an error saying line 43 doesn't have 2 elements.
Also seen this one using dplyr which looks neat but I can't get it to work: http://www.machinegurning.com/rstats/map_df/
tbl <-
list.files(recursive=T,pattern=".out$")%>%
map_df(~data_frame(x=.x),.id="id")
Here's a workflow with the map
functions from purrr
within the tidyverse.
I generated a bunch of csv files to work with to mimic your file structure and some simple data. I threw in 2 lines of junk data at the beginning of each file, since you said you were trying to skip the top 2 lines.
library(tidyverse)
setwd("~/_R/SO/nested")
walk(paste0("folder", 1:3), dir.create)
list.files() %>%
walk(function(folderpath) {
map(1:4, function(i) {
df <- tibble(
x1 = sample(letters[1:3], 10, replace = T),
x2 = rnorm(10)
)
dummy <- tibble(
x1 = c("junk line 1", "junk line 2"),
x2 = c(0)
)
bind_rows(dummy, df) %>%
write_csv(sprintf("%s/file%s.out", folderpath, i))
})
})
That gets the following file structure:
├── folder1
| ├── file1.out
| ├── file2.out
| ├── file3.out
| └── file4.out
├── folder2
| ├── file1.out
| ├── file2.out
| ├── file3.out
| └── file4.out
└── folder3
├── file1.out
├── file2.out
├── file3.out
└── file4.out
Then I used list.files(recursive = T)
to get a list of the paths to these files, use str_extract
to pull text for the folder and file name for each, read the csv file skipping the dummy text, and add the folder and file names so they'll be added to the dataframe.
Since I did this with map_dfr
, I get a tibble back, where the dataframes from each iteration are all rbind
ed together.
all_data <- list.files(recursive = T) %>%
map_dfr(function(path) {
# any characters from beginning of path until /
foldername <- str_extract(path, "^.+(?=/)")
# any characters between / and .out at end
filename <- str_extract(path, "(?<=/).+(?=\\.out$)")
# skip = 3 to skip over names and first 2 lines
# could instead use col_names = c("x1", "x2")
read_csv(path, skip = 3, col_names = F) %>%
mutate(folder = foldername, file = filename)
})
head(all_data)
#> # A tibble: 6 x 4
#> X1 X2 folder file
#> <chr> <dbl> <chr> <chr>
#> 1 b 0.858 folder1 file1
#> 2 b 0.544 folder1 file1
#> 3 a -0.180 folder1 file1
#> 4 b 1.14 folder1 file1
#> 5 b 0.725 folder1 file1
#> 6 c 1.05 folder1 file1
Created on 2018-04-21 by the reprex package (v0.2.0).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With