Edit: Thanks to R Yoda, I was finally able to create a reproducible example to the issue I am facing:
x = rawToChar(as.raw(c(0xa0, 0x31, 0x31, 0x2e, 0x31, 0x33, 0x32, 0x35, 0x39, 0x32)))
trimws(x)
=> Question: How can I trim x?
Old text of the question:
Please see attached screenshot. Unfortunately I am not able to create reproducible example as dput
is affecting the result...
As anyone an idea how to investigate what's going wrong with x? The leading whitespace doesn't seem to be a standard one!
charToRaw(x)
gives a0 31 31 2e 31 33 32 35 39 32dput(charToRaw(x))
gives as.raw(c(0xa0, 0x31, 0x31, 0x2e, 0x31, 0x33, 0x32, 0x35, 0x39,
0x32))
Encoding(x)
gives "unknown"
(same as Encoding(" 11.132592")
)
The lstrip() method will remove leading whitespaces, newline and tab characters on a string beginning.
str_trim() removes whitespace from start and end of string; str_squish() also reduces repeated whitespace inside a string.
Method 1: Using gsub() The function used which is applied to each row in the dataframe is the gsub() function, this used to replace all the matches of a pattern from a string, we have used to gsub() function to find whitespace(\s), which is then replaced by “”, this removes the whitespaces.
0xa0
is encoding another type of space (the non-breaking space) in R
, while 0x20
is the white space.trimws
searches for white spaces or tabs or linebreaks or carriage returns (represented by [ \t\r\n]+
) but not for non-breaking spaces, hence it does not work.
You can use sub
(to suppress either leading or trailing spaces) or gsub
(to suppress both trailing and leading spaces) to remove any kind of trailing or leading space(s) (including the one represented by 0xa0
):
sub("^\\s+", "", x)
[1] "11.132592"
And for removing leading and trailing spaces:
gsub("(^\\s+)|(\\s+$)", "", x)
A possible solution is replace the wrongly encoded spaces with the right ones:
trimws(rawToChar(replace(x1, x1 == as.raw(0xa0), as.raw(0x20))))
which gives:
[1] "11.132592"
For conversion to numeric, just wrap above code in as.numeric
.
Used data:
x1 <- as.raw(c(0xa0, 0x31, 0x31, 0x2e, 0x31, 0x33, 0x32, 0x35, 0x39, 0x32))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With