Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a character string into a date in R

Tags:

date

r

The data I'm trying to convert is supposed to be a date, however it is formatted as mmddyyyy with no separation by dashes or slashes. In order to work with dates in R, I would like to have this formatted as mm-dd-yyyy or mm/dd/yyyy.

I think I might need to use grep(), but I'm not sure how to use it to reformat all of the dates that are in the mmddyyyy format.

like image 428
Patrick Sajovec Avatar asked Sep 29 '15 22:09

Patrick Sajovec


People also ask

How do I convert a string to a date in R?

Date() function in R Language is used to convert a string into date format.

How do I convert a string to a date?

Using strptime() , date and time in string format can be converted to datetime type. The first parameter is the string and the second is the date time format specifier. One advantage of converting to date format is one can select the month or date or time individually.

How do I convert a string to a date in a Dataframe in R?

To convert a data frame column of type string into date/time in R, call as. POSIXct() function and pass the column as argument to this function. We may optionally pass time zone, date/time format, etc., to this function. POSIXct class represents the calendar dates and times.

How do I convert character to time in R?

We can convert the character to timestamp by using strptime() method. strptime() function in R Language is used to parse the given representation of date and time with the given template.


1 Answers

Updated: Improved with @Richard Scriven's colClasses and simpler as.Date() suggestions

Here are two similar methods that worked for me, going from a csv containing mmddyyyy format date, to getting it recognized by R as a date object.

Starting first with a simple file tv.csv:

Series,FirstAir
Quantico,09272015
Muppets,09222015

Method 1: All as string

Once within R,

> t = read.csv('tv.csv', colClasses = 'character')
  • imports tv.csv as a data frame named t
  • colClasses = 'character') option causes all the data to be considered the character data type (instead of being Factor, int types)

Examine its initial structure:

> str(t)
'data.frame':   2 obs. of  2 variables:
 $ Series  : chr  "Quantico" "Muppets"
 $ FirstAir: chr  "09272015" "09222015"
  • R has imported all as strings of characters, indicated here as type chr

The chr or string of characters are then easily converted into a date:

> t$FirstAir = as.Date(t$FirstAir, "%m%d%Y")
  • as.Date() performs string to date conversion
  • %m%d%Y specifies how to interpret the input in t$FirstAir. These format codes, at least on Linux, can be found with running $ man date which brings up the manual on the date program, where there is a list of formatting codes. For example it says %m month (01..12)

Method 2: Import then fix only the date

If for some reason you don't want a blanket import conversion to all characters, for example a file with many variables and wish to leave R's auto type recognition in use but merely "fix" the one date variable, follow this method.

Once within R,

> t = read.csv('tv.csv')
  • imports tv.csv as a data frame named t

Examine its initial structure:

> str(t)
'data.frame':   2 obs. of  2 variables:
 $ Series  : Factor w/ 2 levels "Muppets","Quantico": 2 1
 $ FirstAir: int  9272015 9222015
>
  • R tries its best to guess the variable type per variable
  • As you can see an immediate problem is, for FirstAir variable R has imported 09272015 as int meaning integer, and dropped off the leading zero padding , the 0 in 09 is important later for date conversion yet R has imported it without. So we need to fix this.

This can be done in a single command but for clarity I have broken this into two steps. First,

> t$FirstAir = sprintf("%08d", t$FirstAir)
  • sprintf is a formatting function
  • 0 means pad with zeroes
  • 8 means ensure 8 characters, because mmddyyyy is total 8 characters
  • d is used when the input is a number, which currently it is, recall str() output claimed the t$FirstAir is an int meaning integer
  • t$FirstAir is the variable we are both setting and using as input

Check the result:

> str(t$FirstAir)
 chr [1:2] "09272015" "09222015"
  • it successfully converted from an int to a chr type, for example 9272015 became "09272015"

Now it is a string or chr type we can then convert, same as method 1.

> t$FirstAir = as.Date(strptime(t$FirstAir, "%m%d%Y"))

Result

We do a final check:

> str(t$FirstAir)
 Date[1:2], format: "2015-09-27" "2015-09-22"

In both cases, what were original values in a text file are have now been successfully converted into R date objects.

like image 139
clarity123 Avatar answered Sep 18 '22 22:09

clarity123