Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dec argument in data.table::fread

Tags:

r

csv

data.table

I am using fread from data.table to load csv files. However my csv files uses dec="," as a decimal-separator (1.23 will be 1,23). Unlike in read.csv it seems that dec is not an allowed parameter.

R) args(fread)
function (input = "test.csv", sep = "auto", sep2 = "auto", nrows = -1,
    header = "auto", na.strings = "NA", stringsAsFactors = FALSE,
    verbose = FALSE, autostart = 30)

Do you see a work around (a R option to set may be) that will enable me to use fread (it is so much faster that it saves me a lot of time)?

PS: colClasses is not yet implemented so setAs cannot be used like in this post

like image 364
statquant Avatar asked Jan 21 '13 14:01

statquant


1 Answers

Update Oct 2014 : Now in v1.9.5

fread now accepts dec=',' (and other non-'.' decimal separators), #917. A new paragraph has been added to ?fread. If you are located in a country that uses dec=',' then it should just work. If not, you will need to read the paragraph for an extra step. In case it somehow breaks dec='.', this new feature can be turned off with options(datatable.fread.dec.experiment=FALSE).



Previous answer ...

Matt Dowle found a nice work-around with locales. First my sessionInfo

sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=C
...

Trying the following shows the culprit:

Sys.localeconv()["decimal_point"]
decimal_point 
          "." 

Trying to set the LC_NUMERIC worked on Ubuntu(Matthew) and WinXP(me)

Sys.setlocale("LC_NUMERIC", "French_France.1252")
[1] "French_France.1252"
Message d'avis :
In Sys.setlocale("LC_NUMERIC", "French_France.1252") :
  changer 'LC_NUMERIC' peut résulter en un fonctionnement étrange de R

The behaviour is fine and changes as:

DT = fread("A,B\n3,14;123\n4,22;456\n",sep=";")
str(DT)
Classes ‘data.table’ and 'data.frame':  2 obs. of  2 variables:
 $ V1: num  3.14 4.22
 $ V2: int  123 456

The "." decimal separators are now loaded as strings (as it should), it was the opposite previously.

DT = fread("A,B\n3.14;123\n4.22;456\n",sep=";")
str(DT)
Classes ‘data.table’ and 'data.frame':  2 obs. of  2 variables:
 $ V1: chr  "3.14" "4.22"
 $ V2: int  123 456
like image 158
statquant Avatar answered Nov 03 '22 10:11

statquant