I need to remove commas from a field in an R dataframe. Technically I have managed to do this, but the result seems to be neither a vector nor a matrix, and I cannot get it back into the dataframe in a usable format. So is there a way to remove the commas from a field, AND have that field remain part of the dataframe.
Here is a sample of the field that needs commas removed, and the results generated by my code:
> print(x['TOT_EMP'])
TOT_EMP
1 132,588,810
2 6,542,950
3 2,278,260
4 248,760
> y
[1] "c(\"132588810\" \"6542950\" \"2278260\" \"248760\...)"
The desired result is a numeric field:
TOT_EMP
1 132588810
2 6542950
3 2278260
4 248760
x<-read.csv("/home/mark/Desktop/national_M2013_dl.csv",header=TRUE,colClasses="character")
y=(gsub(",","",x['TOT_EMP']))
print(y)
To remove all commas from a string, call the replace() method, passing it a regular expression to match all commas as the first parameter and an empty string as the second parameter. The replace method will return a new string with all of the commas removed.
sub() function to erase commas from the python string. The function re. sub() is used to swap the substring. Also, it will replace any match with the other parameter, in this case, the null string, eliminating all commas from the string.
gsub()
will return a character vector, not a numeric vector (which is it sounds like you want). as.numeric()
will convert the character vector back into a numeric vector:
> df <- data.frame(numbers = c("123,456,789", "1,234,567", "1,234", "1"))
> df
numbers
1 123,456,789
2 1,234,567
3 1,234
4 1
> df$numbers <- as.numeric(gsub(",","",df$numbers))
> df
numbers
1 123456789
2 1234567
3 1234
4 1
The result is still a data.frame
:
> class(df)
[1] "data.frame"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With