Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When importing CSV into R how to generate column with name of the CSV?

Tags:

I have a large number of csv files that I want to read into R. All the Column headings in the csvs are the same. At first I thought I would need to create a loop based on the list of file names, but after searching I found a faster way. This reads in and combines all the csvs correctly (as far as i know).

filenames <- list.files(path = ".", pattern = NULL, all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE)

library(plyr)
import.list <- llply(filenames, read.csv)

combined <- do.call("rbind", import.list)

The only problem is that I want to know which csv a specific row of data comes from. I want a column labeled 'source' that contains the name of the csv that the particular row came from. so for example if the csv was called Chicago_IL.csv when the data got into R the row would look something like this:

> City    State   Market  etc Source  
> Burbank IL      Western etc Chicago_IL
like image 869
Arndt Avatar asked Mar 03 '11 21:03

Arndt


People also ask

How do I import a specific column into a CSV file in R?

Method 1: Using read. table() function. In this method of only importing the selected columns of the CSV file data, the user needs to call the read. table() function, which is an in-built function of R programming language, and then passes the selected column in its arguments to import particular columns from the data.

Which function is used to import data contained in a CSV file to R?

We can import the data into R using the read_csv() function; this is part of the readr package, which is part of the tidyverse .


1 Answers

You have already done all the hard work. With a fairly small modification this should be straight-forward.

The logic is:

  1. Create a small helper function that reads an individual csv and adds a column with the file name.
  2. Call this helper function in llply()

The following should work:

read_csv_filename <- function(filename){
    ret <- read.csv(filename)
    ret$Source <- filename #EDIT
    ret
}

import.list <- ldply(filenames, read_csv_filename)

Note that I have proposed another small improvement to your code: read.csv() returns a data.frame - this means you can use ldply() rather than llply().

like image 121
Andrie Avatar answered Oct 19 '22 01:10

Andrie