Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inserting file names as column values in a data frame

Tags:

dataframe

r

I have several txt files. Each file has the columns of data separated by a comma. And each has its own individual file name.

So far I have combined these files into one big data frame, using the following code:

files = list.files()
data2=lapply(files, read.table, header=FALSE, sep=",")
data_rbind <- do.call("rbind", data2) 
colnames(data_rbind)[c(1,2,3)]<-c("name", "sex", "amount")

This returns:

name sex amount

Anna F 24567

Emma F 23210

Isabelle F 31212

Amanda F 22631

I would like to add a 4th column which specifies next to each line of data, the name of the file that the data was originally sourced from.

So, for example, if the first file 'example1.txt' contained the following:

Anna, F, 24567

Emma, F, 23210

Isabelle, F, 31212

And the second file 'example2.txt' contained the following:

Amanda, F, 22631

Sara, F, 41355

Katie, F, 2387

I would like to get the following:

Name Sex Amount Year

Anna F 24567 example1.txt

Emma F 23210 example1.txt

Amanda F 22631 example2.txt

Sara F 41355 example2.txt

Katie F 2387 example2.txt

Is this possible?

like image 743
perriebtee Avatar asked Oct 09 '14 08:10

perriebtee


People also ask

How do you assign a value to a column in a data frame?

DataFrame - assign() function The assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.


2 Answers

Try:

files = list.files()
data2=lapply(files, read.table, header=FALSE, sep=",")
for (i in 1:length(data2)){data2[[i]]<-cbind(data2[[i]],files[i])}
data_rbind <- do.call("rbind", data2) 
colnames(data_rbind)[c(1,2,3,4)]<-c("name", "sex", "amount","year")
like image 120
Ujjwal Avatar answered Oct 24 '22 06:10

Ujjwal


You could also use:

   nm1 <- c("Name", "Sex", "Amount", "Year")
  files <- list.files(pattern="^example")
  files
  #[1] "example1.txt" "example2.txt"

  setNames(do.call(rbind,Map(`cbind`, 
           lapply(files, read.table, sep=","), V4=files)), nm1)

   #       Name Sex Amount         Year
   #1     Anna   F  24567 example1.txt
   #2     Emma   F  23210 example1.txt
   #3 Isabelle   F  31212 example1.txt
   #4   Amanda   F  22631 example2.txt
   #5     Sara   F  41355 example2.txt
   #6    Katie   F   2387 example2.txt

Or use rbindlist from data.table

 library(data.table)
 setnames(rbindlist(Map(`cbind`,lapply(files, fread),files)),nm1)[]
 #     Name Sex Amount         Year
 #1:     Anna   F  24567 example1.txt
 #2:     Emma   F  23210 example1.txt
 #3: Isabelle   F  31212 example1.txt
 #4:   Amanda   F  22631 example2.txt
 #5:     Sara   F  41355 example2.txt
 #6:    Katie   F   2387 example2.txt
like image 35
akrun Avatar answered Oct 24 '22 07:10

akrun