I'm trying to create a subset of a data frame and when I do so, R switches the formatting of the date column. Any idea why or how to fix this?
> head(spyPr2) Date Open High Low Close Volume Adj.Close 1 12/30/2011 126.02 126.33 125.50 125.50 95599000 125.50 2 12/29/2011 125.24 126.25 124.86 126.12 123507200 126.12 3 12/28/2011 126.51 126.53 124.73 124.83 119107100 124.83 4 12/27/2011 126.17 126.82 126.06 126.49 86075700 126.49 5 12/23/2011 125.67 126.43 125.41 126.39 92187200 126.39 6 12/22/2011 124.63 125.40 124.23 125.27 119465400 125.27 > spyPr2$Date <- as.Date(spyPr2$Date, format = "%m/%d/%Y") > head(spyPr2) Date Open High Low Close Volume Adj.Close 1 2011-12-30 126.02 126.33 125.50 125.50 95599000 125.50 2 2011-12-29 125.24 126.25 124.86 126.12 123507200 126.12 3 2011-12-28 126.51 126.53 124.73 124.83 119107100 124.83 4 2011-12-27 126.17 126.82 126.06 126.49 86075700 126.49 5 2011-12-23 125.67 126.43 125.41 126.39 92187200 126.39 6 2011-12-22 124.63 125.40 124.23 125.27 119465400 125.27 > spyPr2 <- data.frame(cbind(spyPr2$Date, spyPr2$Close, spyPr2$Adj.Close)) > str(spyPr2) 'data.frame': 1638 obs. of 3 variables: $ X1: num 15338 15337 15336 15335 15331 ... $ X2: num 126 126 125 126 126 ... $ X3: num 126 126 125 126 126 ... > head(spyPr2) X1 X2 X3 1 15338 125.50 125.50 2 15337 126.12 126.12 3 15336 124.83 124.83 4 15335 126.49 126.49 5 15331 126.39 126.39 6 15330 125.27 125.27
UPDATE:
> spyPr2 <- data.frame(cbind(spyPr2["Date"], spyPr2$Close, spyPr2$Adj.Close)) Error in `[.data.frame`(spyPr2, "Date") : undefined columns selected > spyPr2 <- data.frame(cbind(spyPr2[,"Date"], spyPr2$Close, spyPr2$Adj.Close)) Error in `[.data.frame`(spyPr2, , "Date") : undefined columns selected
UPDATE 2:
structure(list(Date = structure(c(15338, 15337, 15336, 15335, 15331, 15330), class = "Date"), Open = c(126.02, 125.24, 126.51, 126.17, 125.67, 124.63), High = c(126.33, 126.25, 126.53, 126.82, 126.43, 125.4), Low = c(125.5, 124.86, 124.73, 126.06, 125.41, 124.23), Close = c(125.5, 126.12, 124.83, 126.49, 126.39, 125.27 ), Volume = c(95599000L, 123507200L, 119107100L, 86075700L, 92187200L, 119465400L), Adj.Close = c(125.5, 126.12, 124.83, 126.49, 126.39, 125.27)), .Names = c("Date", "Open", "High", "Low", "Close", "Volume", "Adj.Close"), row.names = c(NA, -6L), class = "data.frame")
The cbind function is used to combine vectors, matrices and/or data frames by columns.
The data. frame() function works very similarly to cbind() – the only difference is that in data. frame() you specify names to each of the columns as you define them. Again, unlike matrices, dataframes can contain both string vectors and numeric vectors within the same object.
The cbind data frame method is just a wrapper for data. frame(..., check. names = FALSE) . This means that it will split matrix columns in data frame arguments, and convert character columns to factors unless stringsAsFactors = FALSE is specified.
Obvious answer is don't do subsetting like that! Use the appropriate tools. What is wrong with
spyPr2.new <- spyPr2[, c("Date", "Close", "Adj.Close")]
?
To explain the behaviour you are seeing, you need to understand what $
returns and how cbind()
works. cbind()
is one of those oddities in R wherein method dispatch is not done via the usual method but is instead handled via special code buried in the internals of R. This is all the R code behind cbind()
:
> cbind function (..., deparse.level = 1) .Internal(cbind(deparse.level, ...)) <bytecode: 0x24fa0c0> <environment: namespace:base>
Not much help, eh? There are methods for data frames and "ts"
objects however:
> methods(cbind) [1] cbind.data.frame cbind.ts* Non-visible functions are asterisked
Before I do the reveal, also note what $
returns (dat2
is your 6 lines of data after converting Date
to a "Date"
object):
> str(dat2$Date) Date[1:6], format: "2011-12-30" "2011-12-29" "2011-12-28" "2011-12-27" ...
This is a "Date"
object, which is a special vector really.
> class(dat2$Date) [1] "Date"
The key thing is that it is not a data frame. So when you use cbind()
, the internal code is seeing three vectors and the internal code creates a matrix.
> (c1 <- cbind(dat2$Date, dat2$Close, dat2$Adj.Close)) [,1] [,2] [,3] [1,] 15338 125.50 125.50 [2,] 15337 126.12 126.12 [3,] 15336 124.83 124.83 [4,] 15335 126.49 126.49 [5,] 15331 126.39 126.39 [6,] 15330 125.27 125.27 > class(c1) [1] "matrix"
There can only be numeric or character matrices in R so the Date
object is converted to a numeric vector:
> as.numeric(dat2$Date) [1] 15338 15337 15336 15335 15331 15330
to allow cbind()
to produce a numeric matrix.
You can force the use of the data frame method by calling it explicitly and it does know how to handle "Date"
objects and so doesn't do any conversion:
> cbind.data.frame(dat2$Date, dat2$Close, dat2$Adj.Close) dat2$Date dat2$Close dat2$Adj.Close 1 2011-12-30 125.50 125.50 2 2011-12-29 126.12 126.12 3 2011-12-28 124.83 124.83 4 2011-12-27 126.49 126.49 5 2011-12-23 126.39 126.39 6 2011-12-22 125.27 125.27
However, all the explanation aside, you are trying to do the subsetting in a very complex manner. [
as a subset function works just fine:
> dat2[, c("Date", "Close", "Adj.Close")] Date Close Adj.Close 1 2011-12-30 125.50 125.50 2 2011-12-29 126.12 126.12 3 2011-12-28 124.83 124.83 4 2011-12-27 126.49 126.49 5 2011-12-23 126.39 126.39 6 2011-12-22 125.27 125.27
subset()
is also an option but not needed here:
> subset(dat2, select = c("Date", "Close", "Adj.Close")) Date Close Adj.Close 1 2011-12-30 125.50 125.50 2 2011-12-29 126.12 126.12 3 2011-12-28 124.83 124.83 4 2011-12-27 126.49 126.49 5 2011-12-23 126.39 126.39 6 2011-12-22 125.27 125.27
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With