Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Import a CSV file containing multiple sections into R?

Tags:

file

import

r

csv

I want to import the contents of a csv file into R, the csv file contains multiple sections of data vertically, seperated by blank lines and asterisks. For example

********************************************************
* SAMPLE DATA ******************************************
********************************************************
Name, DOB, Sex
Rod, 1/1/1970, M
Jane, 5/7/1980, F
Freddy, 9.12,1965, M

*******************************************************
*  Income Data ****************************************
*******************************************************
Name, Income
Rod, 10000
Jane, 15000
Freddy, 7500

I would like to import this into R as two seperate dataframes. Currently I'm manually cutting the csv file up into smaller files, but I think I could do it using read.csv and the skip and nrows settings of read.csv, If I could work out where the secion breaks are.

This gives me a logical TRUE for every blank line

ifelse(readLines("DATA.csv")=="",TRUE,FALSE)

I'm hoping someone has already solved this problem.

like image 372
PaulHurleyuk Avatar asked Apr 13 '10 10:04

PaulHurleyuk


1 Answers

In this case I will do something like:

# Import raw data:
data_raw <- readLines("test.txt")

# find separation line:
id_sep <- which(data_raw=="")

# create ranges of both data sets:
data_1_range <- 4:(id_sep-1)
data_2_range <- (id_sep+4):length(data_raw)

# using ranges and row data import it:
data_1 <- read.csv(textConnection(data_raw[data_1_range]))
data_2 <- read.csv(textConnection(data_raw[data_2_range]))

Actually your first example set has inconsistent structure so data_1 looks strange.

like image 104
Marek Avatar answered Sep 30 '22 00:09

Marek