SQL Server 2012 : extract Regex groups

Tags:

I have text in my database in Markdown format. I'd like to extract links and count the number of matching links I have. I can get a listing of text blocks that contain links using a query similar to this:

SELECT post_text
FROM posts p
WHERE p.body like '%\[%](http%)%' ESCAPE '\'

How do I go to the next step though, and just extract the link portion of the text (the part that is in the parenthesis)? If I can get this, I can count the number of times this specific link is in my dataset.

Some sample data:

"Visit [Google](http://google.com)"    -> Should return "http://google.com"
"Get an [iPhone](http://www.apple.com) (I like it better than Android)"   -> Should return "http://www.apple.com"
"[Example](http://example.com)"    -> Should return "http://example.com"
"This is a message"    -> Nothing to return on this one, no link
"I like cookies (chocolate chip)"  -> Nothing to return on this one, no link
"[Frank] says 'Hello'" -> Nothing to return on this one, no link

I am using SQL Server 2012 (if there are differences between versions in this regard).

625

asked Sep 30 '14 19:09

Andy

1 Answers

Assuming the actual data is no more complex than the stated examples, this should work without resorting to RegEx:

DECLARE @posts TABLE
(
   post_id INT NOT NULL IDENTITY(1, 1),
   post_text NVARCHAR(4000) NOT NULL,
   body NVARCHAR(2048) NULL
);
INSERT INTO @posts (post_text, body) VALUES (N'first',
                                           N'Visit [Google](http://google.com)');
INSERT INTO @posts (post_text, body) VALUES (N'second',
                                           N'Get an [iPhone](http://www.apple.com)');
INSERT INTO @posts (post_text, body) VALUES (N'third',
                                           N'[Example](http://example.com)');
INSERT INTO @posts (post_text, body) VALUES (N'fourth',
                                           N'This is a message');
INSERT INTO @posts (post_text, body) VALUES (N'fifth',
                                           N'I like cookies (chocolate chip)');
INSERT INTO @posts (post_text, body) VALUES (N'sixth',
                                           N'[Frankie] says ''Relax''');
INSERT INTO @posts (post_text, body) VALUES (N'seventh',
                                           NULL);


SELECT p.post_text,
       SUBSTRING(
                  p.body,
                  CHARINDEX(N'](', p.body) + 2,
                  CHARINDEX(N')', p.body) - (CHARINDEX(N'](', p.body) + 2)
                ) AS [URL]
FROM   @posts p
WHERE  p.body like '%\[%](http%)%' ESCAPE '\';

Output:

post_text  URL
first      http://google.com
second     http://www.apple.com
third      http://example.com

PS:
If you really want to use Regular Expressions, they can only be done via SQLCLR. You can write your own or download pre-done libraries. I wrote one such library, SQL#, that has a Free version that includes the RegEx functions. But those should only be used if a T-SQL solution cannot be found, which so far is not the case here.

162

answered Oct 27 '22 00:10

Solomon Rutzky

Related questions
                            
                                Oracle SQL Developer String variable binding
                            
                                Order by highest average position across multiple columns
                            
                                SQL IN clause with AND logic in it
                            
                                Django Aggregation, sum of counts
                            
                                JDBC Select batching/fetch-size with MySQL
                            
                                Execute sql prepared statement with slice
                            
                                Best way of storing GPS coordinates in database
                            
                                MySQL: Alternate solution of SQL Server's HierarchyId datatype
                            
                                What's the right way to calculate table size in postgres?
                            
                                Python (pandas): store a data frame in hdf5 with a multi index
                            
                                Postgresql, add (years, months or days) to date based on another column
                            
                                Select columns, but casting all columns of given type
                            
                                Managing very large SQL queries
                            
                                Robust approach for building SQL queries programmatically
                            
                                Oracle default Timestamp format
                            
                                Compound primary key in HeidiSQL
                            
                                Remove Special Characters from an Oracle String
                            
                                dynamic view management query
                            
                                Firebird External Tables
                            
                                Multiple Update Statement in SQL Server MERGE [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

SQL Server 2012 : extract Regex groups

Tags:

regex

sql

sql-server

sql-server-2012

Andy

People also ask

1 Answers

Solomon Rutzky

Recent Activity

Donate For Us