Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a zoo object from a csv file (with a few inconsistencies) with R

Tags:

r

csv

zoo

I am trying to create a zoo object in R from the following csv file: http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/Skewdailyprices.csv

The problem seems to be that there are a few minor inconsistencies in the period from 2/27/2006 to 3/20/2006 (some extra commas and an "x") that lead to problems.

I am looking for a method that reads the complete csv file into R automatically. There is a new data point every business day and when doing manual prepocessing you would have to re-edit the file every day by hand.

I am not sure if these are the only problems with this file but I am running out of ideas how to create a zoo object out of this time series. I think that with some more knowledge of R it should be possible.

like image 392
vonjd Avatar asked Feb 21 '23 06:02

vonjd


2 Answers

Use colClasses to tell it that there are 4 fields and use fill so it knows to fill them if they are missing on any row. Ignore the warning:

library(zoo)
URL <- "http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/Skewdailyprices.csv"
z <- read.zoo(URL, sep = ",", header = TRUE, format = "%m/%d/%Y", skip = 1, 
         fill = TRUE, colClasses = rep(NA, 4))
like image 84
G. Grothendieck Avatar answered Feb 23 '23 20:02

G. Grothendieck


It is a good idea to separate the cleaning and analysis steps. Since you mention that your dataset changes often, this cleaning must be automatic. Here is a solution for autocleaning.

#Read in the data without parsing it
lines <- readLines("Skewdailyprices.csv")

#The bad lines have more than two fields 
n_fields <- count.fields(
  "Skewdailyprices.csv", 
  sep = ",", 
  skip = 1
)

#View the dubious lines
lines[n_fields != 2]

#Fix them
library(stringr) #can use gsub from base R if you prefer
lines <- str_replace(lines, ",,x?$", "")

#Write back out to file
writeLines(lines[-1], "Skewdailyprices_cleaned.csv")

#Read in the clean version
sdp <- read.zoo(
    "Skewdailyprices_cleaned.csv", 
    format = "%m/%d/%Y", 
    header = TRUE, 
    sep = ","
)
like image 37
Richie Cotton Avatar answered Feb 23 '23 20:02

Richie Cotton