I am trying the following which should allow me to get everything between productUrl:// and the following ?
(?<=\"productUrl\"\:\"\/\/)(.*?)(?=\?)
The above works on https://regexr.com/
I am then trying to escape the backslashes to fit that string into the grep function but with no luck. What is the proper way of doing it ?
See this example: link to example
I actually need to extract the substrings that match my pattern so grep may be used in conjunction with another function.
Note you do not need to escape / in R regex patterns as they are defined with string literals and / is not a special regex metacharacter. If you want to write a " inside "..." string literal, you should escape it with a single \, as you are already doing.
You may avoid overescaping here if you use single quotes to define the string literal and if you turn .*?(?=\?) into a negated character class:
grep('(?<="productUrl":"//)([^?]*)', x, perl=TRUE)
The [^?]* negated character class matches any 0 or more chars other than ?.
If the string you are checking against has no double quotes remove them from the lookbehind:
grep('(?<=productUrl://)([^?]*)', x, perl=TRUE)
Instead of the lookbehind, you may also use \K to omit the part of text matched:
grep('productUrl://\\K[^?]*', x, perl=TRUE)
^^^
Actually, you do not even need the capturing group in your pattern.
Solving the actual task
You cannot extract substrings with grep in R, you can only find/identify elements to fetch from a character vector using grep. To extract substrings, you need to use base R regmatches or stringr str_extract/str_extract_all or similar match functions.
Example with base R:
> x <- '":"ppath","value":[],"hidden":false,"locked":false}],"bizData":"","pos":0},"listItems":[{"name":"BRAND\'S® Lutein Essence 6 Bottles x 60ml","nid":"66765568","icons":[{"domClass":"lazMall","text":"LazMall","alias":"LazMallAlias","type":"img","group":"1","showType":"0","order":0}],\n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","image":"https://sg-test-11.slatic.net/p/5337f879236ece2f14158c055adcdef7.jpg",\n"productUrl":"//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html?search=1","sku":"BR924HBAB3R0N4SGAMZ","skuId":"167303363"}],"restrictedAge":0,"categories":[1438,1565,4776,7305'
> regmatches(x, gregexpr('"productUrl":"\\K[^?"]*', x, perl=TRUE))
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
With stringr:
> library(stringr)
> str_extract_all(x, '(?<="productUrl":")[^?"]*')
[[1]]
[1] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
[2] "//www.lazada.sg/products/brands-lutein-essence-6-bottles-x-60ml-i138897006-s167303363.html"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With