So, I downloaded a dataset containing 900 txt files, one for each biological sample. What I want to do is merge all of this data into one data matrix in R.
txt_files = list.files()
# read txt files into a list 
for (i in length(txt_files)){
  x <- read.table(file=txt_files[i], sep="\t", header=TRUE, row.name=1)
}
All files are in one folder, so I use list.files() to query all file names. Then I want to read each table into a separate R object (which is called x in this case). The problem is that I would like to name each object after the name of the actual file instead of x.
I've tried a couple of things and tried to search the internet, but haven't found a solution yet. One thing I did find was to use lapply to import them all into a data list.
data_list = lapply(txt_files, read.table, sep = "\t")
However, I don't think this will be appropriate for me, since the data matrixes are not available anymore after this. I hope someone can help me.
Naming connected (especially sequential) things is in general a bad thing. The next thing you'll want to do is loop over these things, and that means constructing names by pasting bits together. Its a mess.
Store things in a list whenever possible. You've done that. I created a few CSV files:
> txt_files=c("f1.txt","f2.txt","f3.txt","f4.txt","f5.txt")
> data_list = lapply(txt_files, read.table, sep = ",")
> data_list[[1]]
  V1 V2 V3
1  1  2  3
> data_list[[3]]
  V1 V2 V3
1  1  2  3
2  5  4  3
3  1  2  3
So now I can loop over them with for(i in 1:length(txt_files)) and get the name of the file with txt_files[i] and so on:
> for(i in 1:length(txt_files)){
+ cat("File is ",txt_files[i],"\n")
+ print(summary(data_list[[i]]))
+ }
File is  f1.txt 
       V1          V2          V3   
 Min.   :1   Min.   :2   Min.   :3  
 1st Qu.:1   1st Qu.:2   1st Qu.:3  
 Median :1   Median :2   Median :3  
 Mean   :1   Mean   :2   Mean   :3  
 3rd Qu.:1   3rd Qu.:2   3rd Qu.:3  
 Max.   :1   Max.   :2   Max.   :3  
File is  f2.txt 
       V1          V2          V3   
 Min.   :1   Min.   :2   Min.   :3  
 1st Qu.:1   1st Qu.:2   1st Qu.:3  
 Median :1   Median :2   Median :3  
 Mean   :1   Mean   :2   Mean   :3  
 3rd Qu.:1   3rd Qu.:2   3rd Qu.:3  
 Max.   :1   Max.   :2   Max.   :3  
 ...
[etc]
You can do something like this:
names(data_list) <- txt_files
Or perhaps:
names(data_list) <- basename(txt_files)
Or maybe use sapply instead of lapply. 
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With