Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most elegant way to load csv with point as thousands separator in R

NB: To the best of my knowledge this question is not a duplicate! All the questios/answers I found are either how to eliminate points from data that are already in R or how to change the decimal point to a comma when loading it.

I have a csv with numbers like: 4.123,98. The problem is that because of the . the output becomes a character string matrix when loading with read.table, read.csv or read.csv2. Changing dec to , doesn't help.

My question
What is the most elegant way to load this csv so that the numbers become e.g. 4123.98 as numeric?

like image 686
vonjd Avatar asked May 13 '15 14:05

vonjd


2 Answers

Adapted from this post: Specify custom Date format for colClasses argument in read.table/read.csv

#some sample data
write.csv(data.frame(a=c("1.234,56", "1.234,56"),
                     b=c("1.234,56", "1.234,56")),
          "test.csv", row.names=FALSE, quote=TRUE)

#define your own numeric class
setClass('myNum')
#define conversion
setAs("character", "myNum",
      function(from) as.numeric(gsub(",", "\\.", gsub("\\.", "", from))))

#read data with custom colClasses
read_data = read.csv("test.csv",
                     stringsAsFactors=FALSE,
                     colClasses=c("myNum", "myNum"))
#let's try whether this is really a numeric
read_data[1, 1] * 2

#[1] 2469.12
like image 171
cryo111 Avatar answered Nov 16 '22 16:11

cryo111


Rather than try to fix it all at loading time, I would load the data into R as a string, then process it to numeric.

So after loading, it's a column of strings like "4.123,98"

Then do something like:

 number.string <- gsub("\\.", "", number.string)
 number.string <- gsub(",", "\\.", number.string)
 number <- as.numeric(number.string)
like image 26
Transcriptase Avatar answered Nov 16 '22 18:11

Transcriptase