Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count words in MySQL / regular expression replacer?

How can I, in a MySQL query, have the same behaviour as the Regex.Replace function (for instance in .NET/C#)?

I need that because, as many people, I would like to count the number of words in a field. However, I'm not satisfied with the following answer (given several times on that site):

SELECT LENGTH(name) - LENGTH(REPLACE(name, ' ', '') +1 FROM table

Because it doesn't give good results when there are more that one space between two words.

By the way, I think the Regex.Replace function may be interesting so all the good ideas are welcome !

like image 263
pierroz Avatar asked Nov 18 '09 11:11

pierroz


People also ask

How do I count words in MySQL?

To identify the number of words, we need to count the number of characters in a string and count the characters without the spaces and new lines. When we subtract the two, we get the word count. Let's look at an example. The query will return the number of words in each content row as wordcount .

What is regex replace in SQL?

The Oracle/PLSQL REGEXP_REPLACE function is an extension of the REPLACE function. This function, introduced in Oracle 10g, will allow you to replace a sequence of characters in a string with another set of characters using regular expression pattern matching.

How does regex replace work?

The REGEXREPLACE( ) function uses a regular expression to find matching patterns in data, and replaces any matching values with a new string. standardizes spacing in character data by replacing one or more spaces between text characters with a single space.

Why * is used in regex?

* - means "0 or more instances of the preceding regex token"


2 Answers

There's REGEXP_REPLACE available as MySQL user-defined functions.

Word counting: If you can control the data going into the database, you can remove double whitespace before insert. Also if you have to access the word count often, you can compute it once in your code and store the count in the database.

like image 195
laalto Avatar answered Oct 04 '22 23:10

laalto


UPDATE: Have now added a separate answer for MySQL 8.0+, which should be used in preference. (Retained this answer in case of being constrainted to using an earlier version.)

Almost a duplicate of this question but this answer will address the use case of counting words based on the advanced version of the custom regular expression replacer from this blog post.

Demo

Rextester online demo

For the sample text, this gives a count of 61 - the same as all online word counters I've tried (e.g. https://wordcounter.net/).

SQL (excluding function code for brevity):

SELECT txt,
       -- Count the number of gaps between words
       CHAR_LENGTH(txt) -
       CHAR_LENGTH(reg_replace(txt,
                               '[[:space:]]+', -- Look for a chunk of whitespace
                               '^.', -- Replace the first character from the chunk
                               '',   -- Replace with nothing (i.e. remove the character)
                               TRUE, -- Greedy matching
                               1,  -- Minimum match length
                               0,  -- No maximum match length
                               1,  -- Minimum sub-match length
                               0   -- No maximum sub-match length
                               ))
       + 1 -- The word count is 1 more than the number of gaps between words
       - IF (txt REGEXP '^[[:space:]]', 1, 0) -- Exclude whitespace at the start from count
       - IF (txt REGEXP '[[:space:]]$', 1, 0) -- Exclude whitespace at the end from count
       AS `word count`
FROM tbl;
like image 42
Steve Chambers Avatar answered Oct 04 '22 23:10

Steve Chambers