Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove thousand's separator [duplicate]

I imported an Excel file and got a data frame like this

structure(list(A = structure(1:3, .Label = c("1.100", "2.300", 
"5.400"), class = "factor"), B = structure(c(3L, 2L, 1L), .Label = c("1.000.000", 
"500", "7.800"), class = "factor"), C = structure(1:3, .Label = c("200", 
"3.100", "4.500"), class = "factor")), .Names = c("A", "B", "C"
), row.names = c(NA, -3L), class = "data.frame")

I would now like to convert these chars to numeric or even integer. However, the dot character (.) is not a decimal sign but a "thousand's separator" (it's German).

How would I convert the data frame properly?

I tried this:

df2 <- as.data.frame(apply(df1, 2, gsub, pattern = "([0-9])\\.([0-9])", replacement= "\\1\\2"))

df3 <- as.data.frame(data.matrix(df2))

however, apply seems to convert each column to a list of factors. Can I maybe prevent apply from doing so?

like image 208
speendo Avatar asked Apr 05 '13 12:04

speendo


2 Answers

You can use this :

sapply(df, function(v) {as.numeric(gsub("\\.","", as.character(v)))})

Which gives :

        A       B    C
[1,] 1100    7800  200
[2,] 2300     500 3100
[3,] 5400 1000000 4500

This will give you a matrix object, but you can wrap it into data.frame() if you wish.

Note that the columns in you original data are not characters but factors.


Edit: Alternatively, instead of wrapping it with data.frame(), you can do this to get the result directly as a data.frame:

# the as.character(.) is just in case it's loaded as a factor
df[] <- lapply(df, function(x) as.numeric(gsub("\\.", "", as.character(x))))
like image 115
juba Avatar answered Oct 21 '22 16:10

juba


I think I just found another solution:

It's necessary to use stringsAsFactors = FALSE.

Like this:

df2 <- as.data.frame(apply(df1, 2, gsub, pattern = "([0-9])\\.([0-9])", replacement= "\\1\\2"), stringsAsFactors = FALSE)

df3 <- as.data.frame(data.matrix(df2))
like image 2
speendo Avatar answered Oct 21 '22 16:10

speendo