Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysterious error by parsing French dates on OSX

Tags:

date

r

I've a character vector with dates in French. I would like to convert them to a date format in R. It seems to work but there are some mysterious errors. For instance, R recognize "30 juin 2012" but not "30 juillet 2012" :

> as.Date("30 juin 2012", format = "%d %B %Y")
[1] "2012-06-30"
> as.Date("28 février 2012", format = "%d %B %Y")
[1] "2012-02-28"
> as.Date("30 juillet 2012", format = "%d %B %Y")
[1] NA

Do you have any explanation ?

PS : my local setting is French UTF8

> Sys.getlocale()
[1] "fr_FR.UTF-8/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8"
like image 745
PAC Avatar asked Jul 25 '13 11:07

PAC


4 Answers

I don't really have an explanation, but I do have a solution. Having had similar problems with German numbers using "," instead of "." for decimals and some different ways of writing dates too. Here's what I usually do to my data that is not in the correct format:

a<-"30 juillet 2012"

b<-gsub(pattern="juillet", a, replacement="july")

as.Date(b, format="%d %B %Y")
[1] "2012-07-30"

Hope this helps you out. If "july" doesn't work on your system, you could always replace it with a 7. Like so

a<-"30 juillet 2012"
b<-gsub(pattern="juillet", a, replacement="/ 7 /")
b<-gsub(pattern="|| ", b, replacement="")
as.Date(b, format= "%d/%m/%Y")

Greetings, Ben

like image 64
Ben K Avatar answered Oct 22 '22 03:10

Ben K


As GSee said, it's a locale issue. Set your locale to French using Sys.setlocale and your code example runs OK.

Under Linux (I think OS X too, but not tested):

Sys.setlocale(locale="fr_FR")

Under Windows:

Sys.setlocale(locale="French_France")

The UTF-8in GSee's comment is the character encoding, and is optional. See ?iconvlist for more info.

like image 26
Richie Cotton Avatar answered Oct 22 '22 03:10

Richie Cotton


Some googling on "OSX strptime juillet" produces this comment from Peter Dalgaard http://grokbase.com/t/r/r-sig-mac/12696r26eh/as-date-does-not-work-with-format-b :

Looks like this is

http://lists.freebsd.org/pipermail/freebsd-bugs/2009-December/037796.html

which was fixed in May 2010, but apparently hasn't percolated down to the OSX updates yet. (Still there in local build on Lion, so not just CRAN binaries. Insert appropriate rant about Open Source and commercial vendors here...)

Summary of bug: strptime with %B goes through the months and checks for full name, then abbreviation. Problem is that "jui" of "juillet" matches abbr. for "juin"! but "llet" mismatches %Y and we get the NA.

So it's a BSD bug that persists in OSX.

Looks like you're going to have to use something like @Ben K's solution to work around this. (Sorry.)

like image 4
Ben Bolker Avatar answered Oct 22 '22 05:10

Ben Bolker


(An answer to "why?" and the "how answer" was already posted. So will leave this as what seems to be the "deep" explanation, even if it isn't a patch on OSX. And it's a bug in OSX, not R.)

Despite setting my locale (also on a Mac) to "fr_FR", the LC_TIME setting remains 'en_US'

> Sys.getlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
> Sys.setlocale(locale="fr_FR") # Should have category="LC_ALL"
[1] "fr_FR/fr_FR/fr_FR/C/fr_FR/en_US.UTF-8"
> month.abb
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
> month.name
 [1] "January"   "February"  "March"     "April"     "May"       "June"      "July"     
 [8] "August"    "September" "October"   "November"  "December" 

After reading the Github bug for lubridate, it appears I'm not reporting anything new. Macs have this bug I guess. The help pages say the 'month.abb' and 'month.name' values should be used as references, but changing them is also ineffective. (Perhaps they are read at Startup?) It has been reported on SIG-R-Mac: http://markmail.org/search/?q=+list%3Aorg.r-project.r-sig-mac+french+locale#query:%20list%3Aorg.r-project.r-sig-mac%20french%20locale+page:1+mid:oie7r5qksadmzjia+state:results

And then reading further along we see that the bug is in OSX and has been there for some time: http://lists.freebsd.org/pipermail/freebsd-bugs/2009-December/037796.html

I'm only on Lion, but will be updating to Mavericks "real soon now."

like image 4
IRTFM Avatar answered Oct 22 '22 04:10

IRTFM