Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Subsetting a data frame using a list of dates as the filter

Tags:

I have a data frame with a date column and some other value columns. I would like to extract from the data frame those rows in which the date column matches any of the elements in a pre-existing list of dates. For example, using a list of one element, the date '2012-01-01' would pull the row with a date of '2012-01-01' from the data frame.

For numbers I think I know how to match the values. This code:

testdf <- data.frame(mydate = seq(as.Date('2012-01-01'),                                    as.Date('2012-01-10'), by = 'day'),                      col1 = 1:10,                      col2 = 11:20,                      col3 = 21:30) 

...produces this data frame:

       mydate col1 col2 col3 1  2012-01-01    1   11   21 2  2012-01-02    2   12   22 3  2012-01-03    3   13   23 4  2012-01-04    4   14   24 5  2012-01-05    5   15   25 6  2012-01-06    6   16   26 7  2012-01-07    7   17   27 8  2012-01-08    8   18   28 9  2012-01-09    9   19   29 10 2012-01-10   10   20   30 

I can do this:

testdf[which(testdf$col3 %in% c('25','29')),] 

which produces this:

      mydate col1 col2 col3 5 2012-01-05    5   15   25 9 2012-01-09    9   19   29 

I can generalise this to a list like this:

myvalues <- c('25','29') testdf[which(testdf$col3 %in% myvalues),] 

And I get the same output. So I had thought I would be able to use the same approach for dates, but it appears that I was wrong. Doing this:

testdf[which(testdf$mydate %in% c('2012-01-05','2012-01-09')),] 

Gets me this:

[1] mydate col1   col2   col3   <0 rows> (or 0-length row.names) 

And popping the dates in their own list - which is the ultimate aim - doesn't help either. I can think of ways round this with loops or an apply function, but it seems to me that there must be a simpler way for what is probably a fairly common requirement. Is it that I have again overlooked something simple?

Q: How can I subset those rows of a data frame that have a date column the values of which match one of a list of dates?

like image 673
SlowLearner Avatar asked Jul 13 '12 05:07

SlowLearner


People also ask

Can you use filter on a data frame in R?

How to apply a filter on dataframe in R ? A filter () function is used to filter out specified elements from a dataframe that return TRUE value for the given condition (s). filter () helps to reduce a huge dataset into small chunks of datasets.

Can you include an R list as a column of a data frame?

Data frame columns can contain lists You can also create a data frame having a list as a column using the data. frame function, but with a little tweak. The list column has to be wrapped inside the function I.


2 Answers

You have to convert the date string into a Date variable using as.Date (try ?as.Date at the console). Bonus: you can drop which:

> testdf[testdf$mydate %in% as.Date(c('2012-01-05', '2012-01-09')),]       mydate col1 col2 col3 5 2012-01-05    5   15   25 9 2012-01-09    9   19   29 
like image 129
Ryogi Avatar answered Nov 15 '22 13:11

Ryogi


Both suggestions so far are definitely good, but if you are going to be doing a lot of work with dates, you may want to invest some time with the xts package:

# Some sample data for 90 consecutive days  set.seed(1) testdf <- data.frame(mydate = seq(as.Date('2012-01-01'),                                    length.out=90, by = 'day'),                      col1 = rnorm(90), col2 = rnorm(90),                      col3 = rnorm(90))  # Convert the data to an xts object require(xts) testdfx = xts(testdf, order.by=testdf$mydate)  # Take a random sample of dates testdfx[sample(index(testdfx), 5)] #                   col1        col2        col3 # 2012-01-17 -0.01619026  0.71670748  1.44115771 # 2012-01-29 -0.47815006  0.49418833 -0.01339952 # 2012-02-05 -0.41499456  0.71266631  1.51974503 # 2012-02-27 -1.04413463  0.01739562 -1.18645864 # 2012-03-26  0.33295037 -0.03472603  0.27005490  # Get specific dates testdfx[c('2012-01-05', '2012-01-09')] #                 col1      col2       col3 # 2012-01-05 0.3295078  1.586833  0.5210227 # 2012-01-09 0.5757814 -1.224613 -0.4302118 

You can also get dates from another vector.

# Get dates from another vector lookup = c("2012-01-12", "2012-01-31", "2012-03-05", "2012-03-19") testdfx[lookup] testdfx[lookup] #                   col1        col2       col3 # 2012-01-12  0.38984324  0.04211587  0.4020118 # 2012-01-31  1.35867955 -0.50595746 -0.1643758 # 2012-03-05 -0.74327321 -1.48746031  1.1629646 # 2012-03-19  0.07434132 -0.14439960  0.3747244 

The xts package will give you intelligent subsetting options. For instance, testdfx["2012-03"] will return all the data from March; testdfx["2012"] will return for the year; testdfx["/2012-02-15"] will return the data from the start of the dataset to February 15; and testdfx["2012-02-15/"] will go from February 15 to the end of the dataset.

like image 22
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 15 '22 13:11

A5C1D2H2I1M1N2O1R2T1