My second question of the day and my last attempt to use R to clean up this data. Here's the sitrep:
I have a data frame that has a column which is a POSIXlt date type. I want to extract the day, month and year from that column and create 3 new columns called (cleverly) day, month and year.
The data frame looks like this:
order_id dd_mmm_yy
1 2005-07-28
2 2007-03-04
I want to end up with this:
order_id dd_mmm_yy day month year
1 2005-07-28 28 7 2005
2 2007-03-04 4 3 2007
I've created a function to extract the day, month and year and return them in a list (or data frame, I've tried both).
extractdate = function (date) {
day = format(date, format="%d")
month = format(date, format="%m")
year = format(date, format="%Y")
list(day=day, month=month, year=year)
}
Here's what I've tried based on an earlier problem and question:
cbind(orders, t(sapply(orders$dd_mmm_yy, extractdate)))
which gives me this:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 5, 9
The t(sapply... by itself gives me this for some crazy reason:
day month year
sec Character,5 Character,5 Character,5
min Character,5 Character,5 Character,5
hour Character,5 Character,5 Character,5
mday Character,5 Character,5 Character,5
mon Character,5 Character,5 Character,5
year Character,5 Character,5 Character,5
wday Character,5 Character,5 Character,5
yday Character,5 Character,5 Character,5
isdst Character,5 Character,5 Character,5
What on earth is going on? Am I better off using something like Python or Java to do all the data manipulation I need to do on this data before bringing it into R for analysis?
Method 2: Extract Year from a Column in a Dataframe To extract the year from the column, we will create a dataframe with date columns and then separate the year from DateTime using format() methods and extract the year and convert to a numeric format.
To get the year from a date in R you can use the functions as. POSIXct() and format() . For example, here's how to extract the year from a date: 1) date <- as. POSIXct("02/03/2014 10:41:00", format = "%m/%d/%Y %H:%M:%S) , and 2) format(date, format="%Y") .
POSIXlt
objects are a list of 9 components (see the Details section of ?POSIXlt
for more information). Because the dd_mmm_yy
column is POSIXlt
, you don't need a function to extract the components. You can just extract the components by their names:
orders$day <- orders$dd_mmm_yy$mday # day of month
orders$month <- orders$dd_mmm_yy$mon+1 # month of year (zero-indexed)
orders$year <- orders$dd_mmm_yy$year+1900 # years since 1900
orders
# order_id dd_mmm_yy day month year
# 1 1 2005-07-28 28 7 2005
# 2 2 2007-03-04 4 3 2007
One liner using lubridate
require(plyr); require(lubridate)
mutate(mydf, date = ymd(dd_mmm_yy), day = day(date),
month = month(date), year = year(date))
order_id dd_mmm_yy date day month year
1 1 2005-07-28 2005-07-28 28 7 2005
2 2 2007-03-04 2007-03-04 4 3 2007
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With