Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import all txt files in folder, concatenate into data frame, use file names as variable in R?

I have a folder with 142 tab-delimited text files. Each file has 19 variables, and then a number of rows beneath (usually no more than 30 rows, but it varies). I want to do several things with these files in R automatically, and I can't seem to get exactly what I want with my code. I am new to loops, I got both sections of code from previous posts here at stackoverflow but can't seem to figure out how to combine their functions.

  1. I want to turn the filename into a variable when reading the files into R, so that each row has the identifying file name

  2. Concatenate all files (with filename variable and no header) into one dataframe with dimensions Yx19, where Y=however many resulting rows there are.

I am able to create a list of the 142 dataframes using this code:

myFiles = list.files(path="~/Documents/ForR/", pattern="*.txt")
data <- lapply(myFiles, read.table, sep="\t", header=FALSE)
names(data) <- myFiles
    for(i in myFiles) 
    data[[i]]$Source = i
    do.call(rbind, data)

I am able to create the dataframe I want with 19 variables, but the filename is not present:

files <- list.files(path="~/Documents/ForR/.", pattern=".txt")
    DF <- NULL
        for (f in files) {
        dat <- read.csv(f, header=F, sep="\t", na.strings="", colClasses="character")
        DF <- rbind(DF, dat)
    }

How do I add the file name (without .txt if possible) as a variable to the loop?

like image 530
Ros1920 Avatar asked Jan 14 '14 06:01

Ros1920


People also ask

How to load data file in R?

The easiest way to load the data into R is to double-click on the particular file yourfile. RData after you download it to your computer. This will open in RStudio only if you have associated the . RData files with RStudio.

What is the r command that you can used to export a text file with headers?

Use write. csv() to export R DataFrame to CSV file with fields separated by comma delimiter, header (column names), rows index, and values surrounded with double-quotes.


1 Answers

add to the loop dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]

files <- list.files(path="~/Documents/ForR/.", pattern=".txt")
    DF <- NULL
        for (f in files) {
        dat <- read.csv(f, header=F, sep="\t", na.strings="", colClasses="character")
        dat$file <- unlist(strsplit(f,split=".",fixed=T))[1]
        DF <- rbind(DF, dat)
    }

Shouldn't the row.names from the do.call be in the format names(list)[n].i where i is 1:number_of_rows_for_data.frame n? so you can just make a column from the row.names

data <- lapply(myFiles, read.table, sep="\t", header=FALSE)
combined.data <- do.call(rbind, data)
combined.data$file_origin <- row.names(combined.data)
like image 184
JeremyS Avatar answered Nov 05 '22 19:11

JeremyS