Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read csv from specific row

Tags:

r

I have daily data starting from 1980 in csv file. But I want to read data only from 1985. Because the other dataset in another file starts from 1985. How can I skip reading the data before 1985 in R language?

like image 239
Jabed Avatar asked Jul 06 '11 06:07

Jabed


2 Answers

I think you want to take a look at ?read.csv to see all the options.

It's a bit hard to give an exact answer without seeing a sample of your data.

If your data doesn't have a header and you know which line the 1985 data starts on, you can just use something like...

impordata <- read.csv(file,skip=1825)

...to skip the first 1825 lines.

Otherwise you can always just subset the data after you've imported it if you have a year variable in your data.

impordata <- read.csv("skiplines.csv")
impordata <- subset(impordata,year>=1985)

If you don't know where the 1985 data starts, you can use grep to find the first instance of 1985 in your file's date variable and then only keep from that line onwards:

impordata <- read.csv("skiplines.csv")
impordata <- impordata[min(grep(1985,impordata$date)):nrow(impordata),]
like image 96
thelatemail Avatar answered Sep 20 '22 11:09

thelatemail


Here are a few alternatives. (You may wish to convert the first column to "Date" class afterwards and possibly convert the entire thing to a zoo object or other time series class object.)

# create test data
fn <- tempfile()
dd <- seq(as.Date("1980-01-01"), as.Date("1989-12-31"), by = "day")
DF <- data.frame(Date = dd, Value = seq_along(dd))
write.table(DF, file = fn, row.names = FALSE)

read.table + subset

# if file is small enough to fit in memory try this:

DF2 <- read.table(fn, header = TRUE, as.is = TRUE)
DF2 <- subset(DF2, Date >= "1985-01-01")

read.zoo

# or this which produces a zoo object and also automatically converts the 
# Date column to Date class.  Note that all columns other than the Date column
# should be numeric for it to be representable as a zoo object.
library(zoo)
z <- read.zoo(fn, header = TRUE)
zw <- window(z, start = "1985-01-01")

If your data is not in the same format as the example you will need to use additional arguments to read.zoo.

multiple read.table's

# if the data is very large read 1st row (DF.row1) and 1st column (DF.Date)
# and use those to set col.names= and skip=

DF.row1 <- read.table(fn, header = TRUE, nrow = 1)
nc <- ncol(DF.row1)
DF.Date <- read.table(fn, header = TRUE, as.is = TRUE, 
   colClasses = c(NA, rep("NULL", nc - 1)))
n1985 <- which.max(DF.Date$Date >= "1985-01-01")

DF3 <- read.table(fn, col.names = names(DF.row1), skip = n1985, as.is = TRUE)

sqldf

# this is probably the easiest if data set is large.

library(sqldf)
DF4 <- read.csv.sql(fn, sql = 'select * from file where Date >= "1985-01-01"')
like image 35
G. Grothendieck Avatar answered Sep 18 '22 11:09

G. Grothendieck