I am trying the following which should allow me to get everything between productUrl://
and the following ?
(?<=\"productUrl\"\:\"\/\/)(.*?)(?=\?)
The above works on https://regexr.com/
I am then trying to escape the backslashes to fit that string into the grep
function but with no luck. What is the proper way of doing it ?
See this example: link to example
I actually need to extract the substrings that match my pattern so grep
may be used in conjunction with another function.
Note you do not need to escape /
in R regex patterns as they are defined with string literals and /
is not a special regex metacharacter. If you want to write a "
inside "..."
string literal, you should escape it with a single \
, as you are already doing.
You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=\?)
into a negated character class:
grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)
The [^?]*
negated character class matches any 0 or more chars other than ?
.
If the string you are checking against has no double quotes remove them from the lookbehind:
grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)
Instead of the lookbehind, you may also use \K
to omit the part of text matched:
grep('productUrl://\\K[^?]*', x, perl=TRUE)
^^^
Actually, you do not even need the capturing group in your pattern.
Solving the actual task
You cannot extract substrings with grep
in R, you can only find/identify elements to fetch from a character vector using grep
. To extract substrings, you need to use base R regmatches
or stringr str_extract
/str_extract_all
or similar match
functions.
Example with base R:
> x <- '":"ppath","value":[],"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND\'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],\n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",\n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
With stringr:
> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With