I want 'This Is A 101 Test' to be 'This Is A Test', but I can't get the syntax right. <pre class="prettyprint lang-ruby prettyprint-override"><code>src = 'This Is A 101 Test' puts "A) " + src # base => "This Is A 101 Test" puts "B) " + src[/([a-z]+)/] # only does first word => "his" puts "C) " + src.gsub!(/\D/, "") # Does digits, I want alphabetic => "101" puts "D) " + src.gsub!(/\W///g) # Nothing. => "" puts "E) " + src.gsub(/(\W|\d)/, "") # Nothing. => "" </code></pre>

First off, you need to be careful with <code>gsub</code> and <code>gsub!</code>. The latter is "dangerous!" and will modify the value of <code>src</code>. If you're executing these statements in order, be aware that <code>a.gsub!(/a/, "b")</code> and <code>a = a.gsub(/a/, "b")</code> will both do the same thing to <code>a</code>. Part of the issue with your code is that <code>src</code> is being modified. The B method returns <code>"his"</code> but makes no changes to <code>source</code> <pre class="prettyprint"><code>src[/([a-z]+)/] # => "his" src # => "This Is A 101 Test" </code></pre> The C method removes all characters that aren't numbers: <pre class="prettyprint"><code>src.gsub!(/\D/, "") # => "101" src # => "101" </code></pre> The D method doesn't work because the syntax is wrong. The <code>gsub</code> method accepts a regular expression/string to search and then a string to use for replacement. If you try it in IRB it will act as though you need another <code>/</code> somewhere. The E method replaces all non-word characters and all numbers: <pre class="prettyprint"><code>src.gsub(/(\W|\d)/, "") # => "This Is A Test" (note the two spaces) src # => "This Is A 101 Test" </code></pre> You point out that it's returning <code>""</code>. Well, what's actually happening is that C and D as listed (with syntax issues fixed) are destructive changes. (Also, if run on <code>"101"</code>, D will actually return <code>nil</code> as no substitutions were performed.) So E is just being run on <code>"101"</code>, and since you're replacing all non-words and all numbers with <code>""</code>, it becomes <code>"101"</code>. <hr> The answer you're looking for would be something like: <pre class="prettyprint"><code>src.gsub!(/\d\s?/, "") # => "This Is A Test" src # => "This Is A Test" </code></pre> And my favorite for dealing with all scenarios of double spaces (because <code>squeeze</code> is quite efficient at combining like characters, <code>strip</code> is quite efficient at stripping trailing whitespace, and those <code>!</code> return <code>nil</code> if they make no replacements): <pre class="prettyprint"><code>src = src.gsub(/\d+/, "").squeeze(" ").strip </code></pre>

To remove all "non word characters" you can instead keep only those. <pre class="prettyprint"><code>src = 'This Is A 101 Test' src.gsub(/[^a-zA-Z ]/,'').gsub(/ +/,' ') => "This Is A Test" </code></pre> I recommend Rubular for trying out Ruby regular expressions.

How can I remove non word characters from a text?

I want 'This Is A 101 Test' to be 'This Is A Test', but I can't get the syntax right.

src = 'This Is A 101 Test'
puts "A) " + src                       # base => "This Is A 101 Test"
puts "B) " + src[/([a-z]+)/]           # only does first word => "his"
puts "C) " + src.gsub!(/\D/, "")       # Does digits, I want alphabetic => "101"
puts "D) " + src.gsub!(/\W///g)        # Nothing. => ""
puts "E) " + src.gsub(/(\W|\d)/, "")   # Nothing. => ""

How do I remove non characters from a string?

To remove all non-alphanumeric characters from a string, call the replace() method, passing it a regular expression that matches all non-alphanumeric characters as the first parameter and an empty string as the second. The replace method returns a new string with all matches replaced. Copied!

How do I get rid of non alphabetic characters?

replaceAll() method. A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .

How do I remove non character characters from a string in Python?

A simple solution is to use regular expressions for removing non-alphanumeric characters from a string. The idea is to use the special character \W , which matches any character which is not a word character.

First off, you need to be careful with gsub and gsub!. The latter is "dangerous!" and will modify the value of src. If you're executing these statements in order, be aware that a.gsub!(/a/, "b") and a = a.gsub(/a/, "b") will both do the same thing to a. Part of the issue with your code is that src is being modified.

The B method returns "his" but makes no changes to source

src[/([a-z]+)/]     # => "his"
src                 # => "This Is A 101 Test"

The C method removes all characters that aren't numbers:

src.gsub!(/\D/, "") # => "101"
src                 # => "101"

The D method doesn't work because the syntax is wrong. The gsub method accepts a regular expression/string to search and then a string to use for replacement. If you try it in IRB it will act as though you need another / somewhere.

The E method replaces all non-word characters and all numbers:

src.gsub(/(\W|\d)/, "") # => "This Is A  Test" (note the two spaces)
src                     # => "This Is A 101 Test"

You point out that it's returning "". Well, what's actually happening is that C and D as listed (with syntax issues fixed) are destructive changes. (Also, if run on "101", D will actually return nil as no substitutions were performed.) So E is just being run on "101", and since you're replacing all non-words and all numbers with "", it becomes "101".

The answer you're looking for would be something like:

src.gsub!(/\d\s?/, "") # => "This Is A Test"
src                    # => "This Is A Test"

And my favorite for dealing with all scenarios of double spaces (because squeeze is quite efficient at combining like characters, strip is quite efficient at stripping trailing whitespace, and those ! return nil if they make no replacements):

src = src.gsub(/\d+/, "").squeeze(" ").strip

To remove all "non word characters" you can instead keep only those.

src = 'This Is A 101 Test'
src.gsub(/[^a-zA-Z ]/,'').gsub(/ +/,' ')
=> "This Is A Test"

I recommend Rubular for trying out Ruby regular expressions.

How can I remove non word characters from a text?

Tags:

regex

ruby

Michael Durrant

People also ask

2 Answers

brymck

Jonas Elfström

Recent Activity

Donate For Us

How can I remove non word characters from a text?

Tags:

regex

ruby

Michael Durrant

People also ask

2 Answers

brymck

Jonas Elfström

Related questions

Recent Activity

Donate For Us