Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

trimws bug? leading whitespace not removed

Tags:

r

trim

Edit: Thanks to R Yoda, I was finally able to create a reproducible example to the issue I am facing:

x = rawToChar(as.raw(c(0xa0, 0x31, 0x31, 0x2e, 0x31, 0x33, 0x32, 0x35, 0x39, 0x32)))
trimws(x)

=> Question: How can I trim x?

Old text of the question:
Please see attached screenshot. Unfortunately I am not able to create reproducible example as dput is affecting the result...

As anyone an idea how to investigate what's going wrong with x? The leading whitespace doesn't seem to be a standard one!

enter image description here

charToRaw(x) gives a0 31 31 2e 31 33 32 35 39 32
dput(charToRaw(x)) gives as.raw(c(0xa0, 0x31, 0x31, 0x2e, 0x31, 0x33, 0x32, 0x35, 0x39, 0x32))
Encoding(x) gives "unknown" (same as Encoding(" 11.132592"))

like image 485
RockScience Avatar asked Jul 12 '17 07:07

RockScience


People also ask

How do I get rid of leading white space?

The lstrip() method will remove leading whitespaces, newline and tab characters on a string beginning.

How do I remove whitespace at the end of a string in R?

str_trim() removes whitespace from start and end of string; str_squish() also reduces repeated whitespace inside a string.

How do I remove spaces from a row in R?

Method 1: Using gsub() The function used which is applied to each row in the dataframe is the gsub() function, this used to replace all the matches of a pattern from a string, we have used to gsub() function to find whitespace(\s), which is then replaced by “”, this removes the whitespaces.


Video Answer


2 Answers

0xa0 is encoding another type of space (the non-breaking space) in R, while 0x20 is the white space.
trimws searches for white spaces or tabs or linebreaks or carriage returns (represented by [ \t\r\n]+) but not for non-breaking spaces, hence it does not work.
You can use sub (to suppress either leading or trailing spaces) or gsub (to suppress both trailing and leading spaces) to remove any kind of trailing or leading space(s) (including the one represented by 0xa0):

sub("^\\s+", "", x)
[1] "11.132592"

And for removing leading and trailing spaces:

gsub("(^\\s+)|(\\s+$)", "", x)
like image 135
Cath Avatar answered Oct 12 '22 23:10

Cath


A possible solution is replace the wrongly encoded spaces with the right ones:

trimws(rawToChar(replace(x1, x1 == as.raw(0xa0), as.raw(0x20))))

which gives:

[1] "11.132592"

For conversion to numeric, just wrap above code in as.numeric.


Used data:

x1 <- as.raw(c(0xa0, 0x31, 0x31, 0x2e, 0x31, 0x33, 0x32, 0x35, 0x39, 0x32))
like image 33
Jaap Avatar answered Oct 12 '22 23:10

Jaap