How do I stack data in R?

Question

I have 20 different .csv files and I need to some how stack the data in R so that I can get an overall picture of the data. Presently I am copying and pasting the columns in excel to make one big data set. However, I am sure there is a quicker and more efficient way of doing this in R as this would ultimately take a while.

Also, to make things worse some of the variable names are not the same in each data set. eg VARIABLE1 is written as variable1 in some datasets. How would i rectify this in R as I understand that R is case sensitive?

Any help would be greatly appreciated. Thanks!

Arun · Accepted Answer

The easiest and the fastest way to do this, if you're (or wish you to be) familiar with data.table package is this way (not tested):

require(data.table)
in_pth <- "path_to_csv_files" # directory where CSV files are located, not the files.
files <- list.files(in_pth, full.names=TRUE, recursive=FALSE, pattern="\.csv$")
out <- rbindlist(lapply(files, fread))

`list.files` parameters:

full.names = TRUE will return the full path to your file. Suppose your in_pth <- "c:\my_csv_folder" and inside this you've two files: 01.csv and 02.csv. Then, full.names=TRUE will return c:\my_csv_folder\01.csv and c:\my_csv_folder\02.csv (full path).
recursive = FALSE will not search inside directories within your in_pth folder. Assume you've two more csv files in c:\my_csv_folder\another_folder. Now, if you want to load these files inside this one, then you can set recursive=TRUE, which'll scan for files until you reach all directories searching down.
pattern=\.csv$: This is a regular expression to tell which sort of files to load. If your folder, in addition to csv files also has text files (.txt), then by specifying this pattern, you'll load only the csv files. If your folder has only CSV files, then this is not necessary.

data.table functions:

rbindlist avoids conflict in column names by retaining the name of the previous data.table. That is, if you've two data.tables dt1, dt2 with column names x,y and a,b respectively, then doing rbindlist(dt1,dt2) will take care of changing a,b to x,y and rbindlist(dt2, dt1) will take care of changing x,y to a,b.
fread takes care of columns, headers separators etc most often automatically.. and is extremely fast (although still experimental, so you may want to check your output to be sure it's all fine (even if stable)).

How do I stack data in R?

Tags:

merge

r

dataset

REnthusiast

1 Answers

`list.files` parameters:

data.table functions:

Arun

Recent Activity

Donate For Us

How do I stack data in R?

Tags:

merge

r

dataset

REnthusiast

1 Answers

list.files parameters:

data.table functions:

Arun

Related questions

Recent Activity

Donate For Us

`list.files` parameters: