Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automating object naming when importing multiple files in R

Tags:

object

r

So, I downloaded a dataset containing 900 txt files, one for each biological sample. What I want to do is merge all of this data into one data matrix in R.

txt_files = list.files()

# read txt files into a list 
for (i in length(txt_files)){
  x <- read.table(file=txt_files[i], sep="\t", header=TRUE, row.name=1)
}

All files are in one folder, so I use list.files() to query all file names. Then I want to read each table into a separate R object (which is called x in this case). The problem is that I would like to name each object after the name of the actual file instead of x.

I've tried a couple of things and tried to search the internet, but haven't found a solution yet. One thing I did find was to use lapply to import them all into a data list.

data_list = lapply(txt_files, read.table, sep = "\t")

However, I don't think this will be appropriate for me, since the data matrixes are not available anymore after this. I hope someone can help me.

like image 870
Rianne Fijten Avatar asked Dec 27 '22 12:12

Rianne Fijten


2 Answers

Naming connected (especially sequential) things is in general a bad thing. The next thing you'll want to do is loop over these things, and that means constructing names by pasting bits together. Its a mess.

Store things in a list whenever possible. You've done that. I created a few CSV files:

> txt_files=c("f1.txt","f2.txt","f3.txt","f4.txt","f5.txt")
> data_list = lapply(txt_files, read.table, sep = ",")
> data_list[[1]]
  V1 V2 V3
1  1  2  3
> data_list[[3]]
  V1 V2 V3
1  1  2  3
2  5  4  3
3  1  2  3

So now I can loop over them with for(i in 1:length(txt_files)) and get the name of the file with txt_files[i] and so on:

> for(i in 1:length(txt_files)){
+ cat("File is ",txt_files[i],"\n")
+ print(summary(data_list[[i]]))
+ }

File is  f1.txt 
       V1          V2          V3   
 Min.   :1   Min.   :2   Min.   :3  
 1st Qu.:1   1st Qu.:2   1st Qu.:3  
 Median :1   Median :2   Median :3  
 Mean   :1   Mean   :2   Mean   :3  
 3rd Qu.:1   3rd Qu.:2   3rd Qu.:3  
 Max.   :1   Max.   :2   Max.   :3  
File is  f2.txt 
       V1          V2          V3   
 Min.   :1   Min.   :2   Min.   :3  
 1st Qu.:1   1st Qu.:2   1st Qu.:3  
 Median :1   Median :2   Median :3  
 Mean   :1   Mean   :2   Mean   :3  
 3rd Qu.:1   3rd Qu.:2   3rd Qu.:3  
 Max.   :1   Max.   :2   Max.   :3  
 ...

[etc]

like image 142
Spacedman Avatar answered Dec 31 '22 13:12

Spacedman


You can do something like this:

names(data_list) <- txt_files

Or perhaps:

names(data_list) <- basename(txt_files)

Or maybe use sapply instead of lapply.

like image 30
Romain Francois Avatar answered Dec 31 '22 13:12

Romain Francois