The data I'm trying to convert is supposed to be a date, however it is formatted as mmddyyyy with no separation by dashes or slashes. In order to work with dates in R, I would like to have this formatted as mm-dd-yyyy or mm/dd/yyyy.
I think I might need to use grep()
, but I'm not sure how to use it to reformat all of the dates that are in the mmddyyyy format.
Date() function in R Language is used to convert a string into date format.
Using strptime() , date and time in string format can be converted to datetime type. The first parameter is the string and the second is the date time format specifier. One advantage of converting to date format is one can select the month or date or time individually.
To convert a data frame column of type string into date/time in R, call as. POSIXct() function and pass the column as argument to this function. We may optionally pass time zone, date/time format, etc., to this function. POSIXct class represents the calendar dates and times.
We can convert the character to timestamp by using strptime() method. strptime() function in R Language is used to parse the given representation of date and time with the given template.
Updated: Improved with @Richard Scriven's colClasses
and simpler as.Date()
suggestions
Here are two similar methods that worked for me, going from a csv containing mmddyyyy
format date, to getting it recognized by R as a date object.
Starting first with a simple file tv.csv:
Series,FirstAir
Quantico,09272015
Muppets,09222015
Once within R,
> t = read.csv('tv.csv', colClasses = 'character')
tv.csv
as a data frame named t
colClasses = 'character')
option causes all the data to be considered the character
data type (instead of being Factor
, int
types)Examine its initial structure:
> str(t)
'data.frame': 2 obs. of 2 variables:
$ Series : chr "Quantico" "Muppets"
$ FirstAir: chr "09272015" "09222015"
chr
The chr
or string of characters are then easily converted into a date:
> t$FirstAir = as.Date(t$FirstAir, "%m%d%Y")
as.Date()
performs string to date conversion%m%d%Y
specifies how to interpret the input in t$FirstAir
. These format codes, at least on Linux, can be found with running $ man date
which brings up the manual on the date
program, where there is a list of formatting codes. For example it says %m month (01..12)
If for some reason you don't want a blanket import conversion to all characters, for example a file with many variables and wish to leave R's auto type recognition in use but merely "fix" the one date variable, follow this method.
Once within R,
> t = read.csv('tv.csv')
tv.csv
as a data frame named t
Examine its initial structure:
> str(t)
'data.frame': 2 obs. of 2 variables:
$ Series : Factor w/ 2 levels "Muppets","Quantico": 2 1
$ FirstAir: int 9272015 9222015
>
FirstAir
variable R has imported 09272015
as int
meaning integer, and dropped off the leading zero padding , the 0 in 09 is important later for date conversion yet R has imported it without. So we need to fix this.This can be done in a single command but for clarity I have broken this into two steps. First,
> t$FirstAir = sprintf("%08d", t$FirstAir)
sprintf
is a formatting function0
means pad with zeroes8
means ensure 8 characters, because mmddyyyy is total 8 charactersd
is used when the input is a number, which currently it is, recall str()
output claimed the t$FirstAir
is an int
meaning integert$FirstAir
is the variable we are both setting and using as inputCheck the result:
> str(t$FirstAir)
chr [1:2] "09272015" "09222015"
int
to a chr
type, for example 9272015
became "09272015"
Now it is a string or chr
type we can then convert, same as method 1.
> t$FirstAir = as.Date(strptime(t$FirstAir, "%m%d%Y"))
We do a final check:
> str(t$FirstAir)
Date[1:2], format: "2015-09-27" "2015-09-22"
In both cases, what were original values in a text file are have now been successfully converted into R date objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With