Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to tokenize text in PL/PGSQL using regular expressions?

I want to tokenize text in my database with RegEx and store the resulting tokens in a table. First I want to split the words by spaces, and then each token by punctuation.

I'm doing this in my application, but executing it in the database might speed it up.

Is it possible to do this?

like image 982
Renato Dinhani Avatar asked Dec 30 '25 08:12

Renato Dinhani


1 Answers

There is a number of functions for tasks like that.
To retrieve the 2nd word of a text:

SELECT split_part('split this up', ' ', 2);

Split the whole text and return one word per row:

SELECT regexp_split_to_table('split this up', E'\\s+');

Actually, the last example splits on any stretch of whitespace.)

like image 187
Erwin Brandstetter Avatar answered Jan 01 '26 23:01

Erwin Brandstetter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!