Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R: remove commas from a field AND have the modified field remain part of the dataframe

Tags:

string

r

comma

I need to remove commas from a field in an R dataframe. Technically I have managed to do this, but the result seems to be neither a vector nor a matrix, and I cannot get it back into the dataframe in a usable format. So is there a way to remove the commas from a field, AND have that field remain part of the dataframe.

Here is a sample of the field that needs commas removed, and the results generated by my code:

> print(x['TOT_EMP'])
         TOT_EMP
1    132,588,810
2      6,542,950
3      2,278,260
4        248,760

> y
[1] "c(\"132588810\" \"6542950\" \"2278260\" \"248760\...)"

The desired result is a numeric field:

       TOT_EMP
1    132588810
2      6542950
3      2278260
4       248760

x<-read.csv("/home/mark/Desktop/national_M2013_dl.csv",header=TRUE,colClasses="character")
y=(gsub(",","",x['TOT_EMP']))
print(y)
like image 835
mark stevenson Avatar asked Jan 24 '15 19:01

mark stevenson


People also ask

How do I remove all commas from a string?

To remove all commas from a string, call the replace() method, passing it a regular expression to match all commas as the first parameter and an empty string as the second parameter. The replace method will return a new string with all of the commas removed.

How do you remove commas in Python?

sub() function to erase commas from the python string. The function re. sub() is used to swap the substring. Also, it will replace any match with the other parameter, in this case, the null string, eliminating all commas from the string.


1 Answers

gsub() will return a character vector, not a numeric vector (which is it sounds like you want). as.numeric() will convert the character vector back into a numeric vector:

> df <- data.frame(numbers = c("123,456,789", "1,234,567", "1,234", "1"))
> df
      numbers
1 123,456,789
2   1,234,567
3       1,234
4           1
> df$numbers <- as.numeric(gsub(",","",df$numbers))
> df
    numbers
1 123456789
2   1234567
3      1234
4         1

The result is still a data.frame:

> class(df)
[1] "data.frame"
like image 95
Richard Border Avatar answered Sep 22 '22 12:09

Richard Border