Regex with Chinese characters

Question

I'm searching text_ which is: 本周（3月25日-3月31日），国内油厂开机率继续下降，全国各地油厂大豆压榨总量1456000吨（出粕1157520吨，出油262080吨)，较上周的...[continued]

  crush <- str_extract(string = text_, pattern = perl("(?<=量).*(?=吨（出粕)"))
  meal <- str_extract(string = text_, pattern = perl("(?<=粕).*(?=吨，出)"))
  oil <-  str_extract(string = text_, pattern = perl("(?<=出油).*(?=吨）)"))

prints

[1] "1456000"   ## correct
[1] "1157520"   ## correct
[1] NA          ## looking for 262080 here

Why do the first two match but not the last one? I'm using the stringr library.

Wiktor Stribiżew · Accepted Answer

Note that current version of stringr package is based on ICU regex library, and using perl() is deprecated.

Note that lookbehind patterns are fixed-width, and it seems that there is a problem with how ICU parses the first letter in your lookbehind pattern (it cannot calculate its width for some unknown reason).

Since you are using stringr, you may just rely on capturing that can be achieved with str_match, to extract a part of the pattern:

> match <- str_match(s, "出油(\d+)吨")
> match[,2]
[1] "262080"

This way, you will avoid any eventual issues in the future. Also, these regexps are executed faster since there is no unanchored lookbehind in the pattern that is executed at every location in the searched string.

Also, you may just use your PCRE regex with base R:

> regmatches(s, regexpr("(?<=出油)\d+(?=吨)", s, perl=TRUE))
[1] "262080"

Rafael · Answer

For some reason, still don't know, I wasn't able to use @WiktorStribiżew 's commented solution, but this ended up working:

oil <-  str_extract(string = text_, pattern = perl("(?<=吨).*(?=吨)"))
# [1] "（出粕1157520吨，出油262080吨），较
oil <- str_extract(string = oil, pattern = perl("(?<=油)\d+(?=吨)"))
# [1] 262080

Regex with Chinese characters

Tags:

regex

r

stringr

Rafael

2 Answers

Wiktor Stribiżew

Rafael

Recent Activity

Donate For Us

Regex with Chinese characters

Tags:

regex

r

stringr

Rafael

2 Answers

Wiktor Stribiżew

Rafael

Related questions

Recent Activity

Donate For Us