Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle blank items when converting dates in R

I have a csv download of data from a Management Information system. There are some variables which are dates and are written in the csv as strings of the format "2012/11/16 00:00:00".

After reading in the csv file, I convert the date variables into a date using the function as.Date(). This works fine for all variables that do not contain any blank items.

For those which do contain blank items I get the following error message: "character string is not in a standard unambiguous format"

How can I get R to replace blank items with something like "0000/00/00 00:00:00" so that the as.Date() function does not break? Are there other approaches you might recommend?

like image 306
Tyler Durden Avatar asked Nov 29 '12 13:11

Tyler Durden


People also ask

How do I convert a date in R?

You can use the as. Date( ) function to convert character data to dates. The format is as. Date(x, "format"), where x is the character data and format gives the appropriate format.

How does R handle different date formats?

Importing Dates from Character Format For example, “05/27/84” is in the format %m/%d/%y, while “May 27 1984” is in the format %B %d %Y. This outputs the dates in the ISO 8601 international standard format %Y-%m-%d. If you would like to use dates in a different format, read “Changing Date Formats” below.

How are dates handled in R?

R has developed a special representation for dates and times. Dates are represented by the Date class and times are represented by the POSIXct or the POSIXlt class. Dates are stored internally as the number of days since 1970-01-01 while times are stored internally as the number of seconds since 1970-01-01.

What is the default date format in R?

Note that the default date format is YYYY-MM-DD; therefore, if your string is of different format you must incorporate the format argument. There are multiple formats that dates can be in; for a complete list of formatting code options in R type ? strftime in your console.


1 Answers

If they're strings, does something as simple as

mystr <- c("2012/11/16 00:00:00","   ","")
mystr[grepl("^ *$",mystr)] <- NA
as.Date(mystr)

work? (The regular expression "^ *$" looks for strings consisting of the start of the string (^), zero or more spaces (*), followed by the end of the string ($). More generally I think you could use "^[[:space:]]*$" to capture other kinds of whitespace (tabs etc.)

like image 123
Ben Bolker Avatar answered Sep 25 '22 18:09

Ben Bolker