I've a character vector with dates in French. I would like to convert them to a date format in R. It seems to work but there are some mysterious errors. For instance, R recognize "30 juin 2012" but not "30 juillet 2012" :
> as.Date("30 juin 2012", format = "%d %B %Y")
[1] "2012-06-30"
> as.Date("28 février 2012", format = "%d %B %Y")
[1] "2012-02-28"
> as.Date("30 juillet 2012", format = "%d %B %Y")
[1] NA
Do you have any explanation ?
PS : my local setting is French UTF8
> Sys.getlocale()
[1] "fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8"
I don't really have an explanation, but I do have a solution. Having had similar problems with German numbers using "," instead of "." for decimals and some different ways of writing dates too. Here's what I usually do to my data that is not in the correct format:
a<-"30 juillet 2012"
b<-gsub(pattern="juillet", a, replacement="july")
as.Date(b, format="%d %B %Y")
[1] "2012-07-30"
Hope this helps you out. If "july" doesn't work on your system, you could always replace it with a 7. Like so
a<-"30 juillet 2012"
b<-gsub(pattern="juillet", a, replacement="/ 7 /")
b<-gsub(pattern="|| ", b, replacement="")
as.Date(b, format= "%d/%m/%Y")
Greetings, Ben
As GSee said, it's a locale issue. Set your locale to French using Sys.setlocale
and your code example runs OK.
Under Linux (I think OS X too, but not tested):
Sys.setlocale(locale="fr_FR")
Under Windows:
Sys.setlocale(locale="French_France")
The UTF-8
in GSee's comment is the character encoding, and is optional. See ?iconvlist
for more info.
Some googling on "OSX strptime juillet" produces this comment from Peter Dalgaard http://grokbase.com/t/r/r-sig-mac/12696r26eh/as-date-does-not-work-with-format-b :
Looks like this is
http://lists.freebsd.org/pipermail/freebsd-bugs/2009-December/037796.html
which was fixed in May 2010, but apparently hasn't percolated down to the OSX updates yet. (Still there in local build on Lion, so not just CRAN binaries. Insert appropriate rant about Open Source and commercial vendors here...)
Summary of bug: strptime with %B goes through the months and checks for full name, then abbreviation. Problem is that "jui" of "juillet" matches abbr. for "juin"! but "llet" mismatches %Y and we get the NA.
So it's a BSD bug that persists in OSX.
Looks like you're going to have to use something like @Ben K's solution to work around this. (Sorry.)
(An answer to "why?" and the "how answer" was already posted. So will leave this as what seems to be the "deep" explanation, even if it isn't a patch on OSX. And it's a bug in OSX, not R.)
Despite setting my locale (also on a Mac) to "fr_FR", the LC_TIME setting remains 'en_US'
> Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
> Sys.setlocale(locale="fr_FR") # Should have category="LC_ALL"
[1] "fr_FR/fr_FR/fr_FR/C/fr_FR/en_US.UTF-8"
> month.abb
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
> month.name
[1] "January" "February" "March" "April" "May" "June" "July"
[8] "August" "September" "October" "November" "December"
After reading the Github bug for lubridate, it appears I'm not reporting anything new. Macs have this bug I guess. The help pages say the 'month.abb' and 'month.name' values should be used as references, but changing them is also ineffective. (Perhaps they are read at Startup?) It has been reported on SIG-R-Mac: http://markmail.org/search/?q=+list%3Aorg.r-project.r-sig-mac+french+locale#query:%20list%3Aorg.r-project.r-sig-mac%20french%20locale+page:1+mid:oie7r5qksadmzjia+state:results
And then reading further along we see that the bug is in OSX and has been there for some time: http://lists.freebsd.org/pipermail/freebsd-bugs/2009-December/037796.html
I'm only on Lion, but will be updating to Mavericks "real soon now."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With