SQL optimisation - Word count in string - Postgresql

Tags:

I am trying to update a large table (about 1M rows) with the count of words in a field on Postgresql. This query works, and sets the token_count field counting the words (tokens) in longtext in table my_table:

UPDATE my_table mt SET token_count = 
    (select count(token) from 
      (select unnest(regexp_matches(t.longtext, E'\\w+','g')) as token
      from my_table as t where mt.myid = t.myid)
    as tokens);

myid is the primary key of the table. \\w+ is necessary because I want to count words, ignoring special characters. For example, A test . ; ) would return 5 with space-based count, while 2 is the right value. The issue is that it's horribly slow, and 2 days are not enough to complete it on 1M rows. What would you do to optimised it? Are there ways to avoid the join?

How can I split the batch into blocks, using for example limit and offset?

Thanks for any tips,

Mulone

UPDATE: I measured the performance of the array_split, and the update is gonna be slow anyway. So maybe a solution would consist of parallelising it. If I run different queries from psql, only one query works and the others wait for it to finish. How can I parallelise an update?

252

asked Jun 19 '13 17:06

Mulone

2 Answers

Have you tried using array_length?

UPDATE my_table mt
SET token_count = array_length(regexp_split_to_array(trim(longtext), E'\\W+','g'), 1)

http://www.postgresql.org/docs/current/static/functions-array.html

# select array_length(regexp_split_to_array(trim(' some long text  '), E'\\W+'), 1);
 array_length 
--------------
            3
(1 row)

111

answered Oct 18 '22 16:10

Denis de Bernardy

UPDATE my_table
SET token_count = array_length(regexp_split_to_array(longtext, E'\\s+'), 1)

Or your original query without a correlation

UPDATE my_table
SET token_count = (
    select count(*)
    from (select unnest(regexp_matches(longtext, E'\\w+','g'))) s
    );

answered Oct 18 '22 16:10

Clodoaldo Neto

Related questions
                            
                                PDO claims uncaught exception, even though it's inside of try/catch block?
                            
                                Match multiple columns with same value SQL
                            
                                SQL SELECT: I need a "IN ALL" Clause
                            
                                which is the better way to change the character set for huge data tables?
                            
                                Selecting most recent answers efficiently
                            
                                Computing running sum using hive udf functions
                            
                                Query for comma-separated ids to comma-separated values
                            
                                How do I grant access to SQL Server Agent to be able to write/modify system files?
                            
                                Getting Count and Rows in same query
                            
                                Specific day of current month and year
                            
                                How to write a MySQL if else endif statement?
                            
                                Select each record from table without repeating same record
                            
                                Find difference between two sets of records
                            
                                Query Linq Join Many to Many
                            
                                How not to exclude null from where not like condition?
                            
                                Prepared Statement SQL Exception "No Input Parameters"
                            
                                Select only distinct values from two columns from a table
                            
                                Why are the results for 1 = NULL and 1 != NULL the same?
                            
                                How to delete rows using CTE and INNER JOIN?
                            
                                How do I insert into a table from another table by matching on values?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL optimisation - Word count in string - Postgresql

Tags:

sql

optimization

postgresql

parallel-processing

Mulone

People also ask

2 Answers

Denis de Bernardy

Clodoaldo Neto

Recent Activity

Donate For Us