I have a string like:
q <-"<U+00A6> 1000-66329"
I want to remove <U+00A6> and get only 1000 66329.
I tried using:
gsub("\u00a6"," ", q,perl=T)
But it is not removing anything. How should I do gsub in order to get only 1000 66329?
I just want to remove unicode
<U+00A6>which is at the beginning of string.
Then you do not need a gsub, you can use a sub with "^\\s*<U\\+\\w+>\\s*" pattern:
q <-"<U+00A6> 1000-66329"
sub("^\\s*<U\\+\\w+>\\s*", "", q)
Pattern details:
^ - start of string\\s* - zero or more whitespaces<U\\+ - a literal char sequence <U+
\\w+ - 1 or more letters, digits or underscores> - a literal >
\\s* - zero or more whitespaces.If you also need to replace the - with a space, add |- alternative and use gsub (since now we expect several replacements and the replacement must be a space - same is in akrun's answer):
trimws(gsub("^\\s*<U\\+\\w+>|-", " ", q))
See the R online demo
If always is the first character, you can try:
substring("\U00A6 1000-66B29", 2)
if R prints the string as <U+00A6> 1000-66329 instead of ¦ 1000-66B29 then <U+00A6> is interpreted as the string "<U+00A6>" instead of the unicode character. Then you can do:
substring("<U+00A6> 1000-66329",9)
Both ways the result is:
[1] " 1000-66329"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With