I am trying to create a zoo object in R from the following csv file: http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/Skewdailyprices.csv
The problem seems to be that there are a few minor inconsistencies in the period from 2/27/2006 to 3/20/2006 (some extra commas and an "x") that lead to problems.
I am looking for a method that reads the complete csv file into R automatically. There is a new data point every business day and when doing manual prepocessing you would have to re-edit the file every day by hand.
I am not sure if these are the only problems with this file but I am running out of ideas how to create a zoo object out of this time series. I think that with some more knowledge of R it should be possible.
Use colClasses
to tell it that there are 4 fields and use fill
so it knows to fill them if they are missing on any row. Ignore the warning:
library(zoo)
URL <- "http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/Skewdailyprices.csv"
z <- read.zoo(URL, sep = ",", header = TRUE, format = "%m/%d/%Y", skip = 1,
fill = TRUE, colClasses = rep(NA, 4))
It is a good idea to separate the cleaning and analysis steps. Since you mention that your dataset changes often, this cleaning must be automatic. Here is a solution for autocleaning.
#Read in the data without parsing it
lines <- readLines("Skewdailyprices.csv")
#The bad lines have more than two fields
n_fields <- count.fields(
"Skewdailyprices.csv",
sep = ",",
skip = 1
)
#View the dubious lines
lines[n_fields != 2]
#Fix them
library(stringr) #can use gsub from base R if you prefer
lines <- str_replace(lines, ",,x?$", "")
#Write back out to file
writeLines(lines[-1], "Skewdailyprices_cleaned.csv")
#Read in the clean version
sdp <- read.zoo(
"Skewdailyprices_cleaned.csv",
format = "%m/%d/%Y",
header = TRUE,
sep = ","
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With