One of my columns in my data frame looks like so:
> head(df$col2,n = 50)
[1] "NA, 2015" "November 13, 2014" "September 27, 2014" "October 8, 2014" "December 16, 2013"
[6] "February 8, 2015" "November 2, 2014" "November 30, 2014" "February 18, 2015" "August 22, 2014"
[11] "October 26, 2014" "January 3, 2014" "May 5, 2015" "February 3, 2014" "October 15, 2014"
[16] "September 12, 2014" "April 2, 2014" "April 23, 2015" "November 4, 2014" "January 16, 2014"
[21] "September 28, 2014" "January 14, 2014" "February 13, 2014" "January 17, 2014" "January 4, 2014"
[26] "February 1, 2015" "January 14, 2014" "April 18, 2014" "October 14, 2014" "August 20, 2014"
[31] "January 20, 2014" "April 11, 2015" "July 5, 2014" "November 29, 2013" "March 22, 2014"
[36] "December 29, 2014" "February 18, 2015" "January 13, 2014" "January 5, 2015" "April 19, 2014"
[41] "November 28, 2014" "13 August, 2014" "14 December, 2014" "10 January, 2014" "3 February, 2014"
[46] "17 March, 2014" "3 July, 2014" "17 October, 2014" "28 January, 2014" "10 October, 2014"
As you can see, aside from the first row (which I know to be NA, that's no problem) there's two different date formats between m-d-y and d-m-y. Is there a recommended way to standardize all the dates to m-d-y?
They're all listed as character format in my column of this data frame. I've tried
> datestest <- as.Date(df$col2)
,
but I get
Error in charToDate(x) : character string is not in a standard unambiguous format
as a result.
The parse_date_time
function in lubridate
allows you to parse vectors with heterogeneous formats using the "orders" argument:
require(lubridate)
x <- c("November 2, 2014", "13 August, 2014")
parse_date_time(x, orders = c("mdy", "dmy"))
[1] "2014-11-02 UTC" "2014-08-13 UTC"
Here is a solution with lubridate
:
library(lubridate)
x <- c("November 2, 2014", "13 August, 2014" )
It consists of selecting with grep
the different ways dates are displayed (say, first the ones starting with a number and then use the -
to select the other ones) and then use the different appropriate function of lubridate
.
ind <- grep("^\\d", x)
dmy(x[ind])
[1] "2014-08-13 UTC"
mdy(x[-ind])
[1] "2014-11-02 UTC"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With