Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find dates that fail to parse in R Lubridate

Tags:

date

r

lubridate

As a R novice I'm pulling my hair out trying to debug cryptic R errors. I have csv that containing 150k lines that I load into a data frame named 'date'. I then use lubridate to convert this character column to datetimes in hopes of finding min/max date.

  dates <- csv[c('datetime')]
  dates$datetime <- ymd_hms(dates$datetime)

Running this code I receive the following error message:

Warning message:
3 failed to parse. 

I accept this as the CSV could have some janky dates in there and next run:

min(dates$datetime) 
max(dates$datetime)

Both of these return NA, which I assume is from the few broken dates still stored in the data frame. I've searched around for a quick fix, and have even tried to build a foreach loop to identify the problem dates, but no luck. What would be a simple way to identify the 3 broken dates?

example date format: 2015-06-17 17:10:16 +0000
like image 253
Korben Dallas Avatar asked Feb 12 '16 18:02

Korben Dallas


People also ask

What does Lubridate do in R?

Lubridate is an R package that makes it easier to work with dates and times. Below is a concise tour of some of the things lubridate can do for you. Lubridate was created by Garrett Grolemund and Hadley Wickham, and is now maintained by Vitalie Spinu.

What does the Lubridate package do?

lubridate: Make Dealing with Dates a Little EasierFunctions to work with date-times and time-spans: fast and user friendly parsing of date-time data, extraction and updating of components of a date-time (years, months, days, hours, minutes, and seconds), algebraic manipulation on date-time and time-span objects.

How do I change a character to a date in R?

You can use the as. Date( ) function to convert character data to dates. The format is as. Date(x, "format"), where x is the character data and format gives the appropriate format.


1 Answers

Credit to LawyeR and Stibu from above comments:

  1. I first sorted the raw csv column and did a head() & tail() to find which 3 dates were causing trouble
  2. Alternatively which(is.na(dates$datetime)) was a simple one liner to also find the answer.
like image 57
Korben Dallas Avatar answered Oct 22 '22 00:10

Korben Dallas