Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge many data frames from csv files, when ID column is implied?

Tags:

I'd like to merge a bunch of data frames together (because it seems many operations are easier if you're only dealing w/ one, but correct me if I'm wrong).

Currently I have one data frame like this:

ID, var1, var2
A,  2,    2
B,  4,    5
.
.
Z,  3,    2

Each ID is on a single row w/ several single measurements

I also have a csv file w/ repeated measurement for each ID, like:

filename = ID_B.csv

time, var4, var5
0,    1,    2
1,    4,    5
2,    1,    6
...

What I'd like is:

ID, time, va1, var2, var4, var5
...
B,  0,    4,   5,    1,    2,
B,  1,    4,   5,    4,    5,
B,  2,    4,   5,    1,    6,
...

I don't really care about the column order. The only solution I can think of is to add the ID column to each csv file then loop through them calling merge() several times. Is there a more elegant approach?

like image 713
Peter Avatar asked Oct 13 '09 18:10

Peter


People also ask

How do I merge two columns in the same csv file?

Use the CONCATENATE function: Click Text functions and select CONCATENATE. Enter A1 in the text1 field, B1 in the text2 field, and C1 in the text3 field. Click OK. The columns are combined.

How do I merge two CSV datasets in Python?

To merge all CSV files, use the GLOB module. The os. path. join() method is used inside the concat() to merge the CSV files together.


1 Answers

My understanding is that you need to extract the ID from the filename, and then merge the imported csv with the existing dataframe.

df1 <- read.csv(textConnection("ID, var1, var2
A,  2,    2
B,  4,    5"))

# assuming the imported csv-files are in working directory
filenames <- list.files(getwd(), pattern = "ID_[A-Z].csv")

# extract ID from filename
ids <- gsub("ID_([A-Z]).csv", "\\1", filenames)

# import csv-files and append ID
library(plyr)
import <- mdply(filenames, read.csv)
import$ID <- ids[import$Var1]
import$Var1 <- NULL

# merge imported csv-files and the existing dataframe
merge(df1, import)  

Result:

ID var1 var2 time var4 var5
1  B    4    5    0    1    2
2  B    4    5    1    4    5
3  B    4    5    2    1    6
like image 98
learnr Avatar answered Nov 03 '22 03:11

learnr