I have a large vector of strings of the form:
Input = c("1,223", "12,232", "23,0")
etc. That's to say, decimals separated by commas, instead of periods. I want to convert this vector into a numeric vector. Unfortunately, as.numeric(Input)
just outputs NA
.
My first instinct would be to go to strsplit
, but it seems to me that this will likely be very slow. Does anyone have any idea of a faster option?
There's an existing question that suggests read.csv2
, but the strings in question are not directly read in that way.
And in countries where a point is used as a decimal separator, a comma is usually used to separate thousands. So, for example, twelve thousand five hundred with a decimal of five zero is written differently depending on the country: In the USA, Mexico, or the UK, it would be written: 12 500.50 or 12,500.50.
The character used as the decimal separator In the United States, this character is a period (.). In Germany, it is a comma (,). Thus one thousand twenty-five and seven tenths is displayed as 1,025.7 in the United States and 1.025,7 in Germany.
In English, we use commas to separate numbers greater than 999. We use a comma every third digit from the right. More than 50,000 people turned up to protest.
as.numeric(sub(",", ".", Input, fixed = TRUE))
should work.
The readr
package has a function to parse numbers from strings. You can set many options via the locale
argument.
For comma as decimal separator you can write:
readr::parse_number(Input, locale = readr::locale(decimal_mark = ","))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With