Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: matching multiple patterns and getting middle of string

Tags:

regex

r

stringr

I am working on a code that takes a bunch of SQL queries and aims to break down the queries only into the table names.

For example I have the following queries:

delete from pear.admin where jdjdj
delete from pear.admin_user where blah
delete from ss_pear.admin_user where blah 

I am trying to get a regex that matches all these patterns, would that be through creating a list of multiple patterns first and then passing it through str_extract ?

I used a regex but it's giving me the following output:

delete from pear.admin 

how do I get rid of the first words before it? I tried (.*) but nothing seems to work.

sql_data$table_name <- 
str_extract(sql_data$Full.Sql, "[^_]+\\.[\\w]+\\_[\\w]+")
like image 816
aa710 Avatar asked Mar 05 '23 00:03

aa710


1 Answers

I am only familiar with the base R regex functions, so here is an option using sub:

queries <- c("delete from pear.admin where jdjdj",
             "delete from pear.admin_user where blah",
             "delete from ss_pear.admin_user where blah")

table_names <- sapply(queries, function(x) {
    sub(".*\\bfrom\\s+(\\S+).*", "\\1", x)
})
table_names

           1                    2                    3 
"pear.admin"    "pear.admin_user" "ss_pear.admin_user" 

This should perform at least somewhat reliably, since, as far as I know, what immediately followed the keyword FROM must be a table name.

like image 196
Tim Biegeleisen Avatar answered Mar 06 '23 23:03

Tim Biegeleisen