Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing certain characters from a string in R

Tags:

string

r

I have a string in R which contains a large amount of words. When viewing the string I get a large amount of text which includes text similar to the following:

>docs  ....  \u009cYes yes for ever for ever the boys cried in their ringing voices with softened faces  .... 

So I'm wondering how to remove these \u009 characters (all of them, some of which have slightly different numbers) from the string. I've tried using gsub(), but that wasn't effective in removing the content from the strings.

like image 859
Ryan Warnick Avatar asked Mar 02 '13 03:03

Ryan Warnick


People also ask

How do I remove a specific character from a string in R?

You can either use R base function gsub() or use str_replace() from stringr package to remove characters from a string or text. In this article, I will explain how to remove a single character or multiple characters from a String in R by using gsub() and str_replace() functions.

How do I remove a specific character from a string?

Using 'str. replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.

How do I remove the first 3 characters from a string in R?

To remove the string's first character, we can use the built-in substring() function in R. The substring() function accepts 3 arguments, the first one is a string, the second is start position, third is end position.

How do I remove the first two characters of a string in R?

If we need to remove the first character, use sub , match one character ( . represents a single character), replace it with '' . Or for the first and last character, match the character at the start of the string ( ^. ) or the end of the string ( . $ ) and replace it with '' .


2 Answers

This should work

gsub('\u009c','','\u009cYes yes for ever for ever the boys ') "Yes yes for ever for ever the boys " 

Here 009c is the hexadecimal number of unicode. You must always specify 4 hexadecimal digits. If you have many , one solution is to separate them by a pipe:

gsub('\u009c|\u00F0','','\u009cYes yes \u00F0for ever for ever the boys and the girls')  "Yes yes for ever for ever the boys and the girls" 
like image 129
agstudy Avatar answered Sep 26 '22 00:09

agstudy


try: gsub('\\$', '', '$5.00$')

like image 37
Nic Avatar answered Sep 23 '22 00:09

Nic