I'm look to extract prices from a string of scraped data.
I'm using this at the moment:
re.findall(r'£(?:\d+\.)?\d+.\d+', '£1.01')
['1.01']
Which works fine 99% of the time. However, I occasionally see this:
re.findall(r'£(?:\d+\.)?\d+.\d+', '£1,444.01')
['1,444']
I'd like to see ['1444.01']
ideally.
This is an example of the string I'm extracting the prices from.
'\n £1,000.73 \n\n\n + £1.26\nUK delivery\n\n\n'
I'm after some help putting together the regex to get ['1000.73', '1.26']
from that above string
The /^$/ component is a regular expression that matches the empty string. More specifically, it looks for the beginning of a line ( ^ ) followed directly by the end of a line ( $ ), which is to say an empty line.
Extracts the first matching substrings according to a regular expression.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
You may grab all the values with '£(\d[\d.,]*)\b'
and then remove all the commas with
import re
s = '\n £1,000.73 \n\n\n + £1.26\nUK delivery\n\n\n'
r = re.compile(r'£(\d[\d.,]*)\b')
print([x.replace(',', '') for x in re.findall(r, s)])
# => ['1000.73', '1.26']
See the Python demo
The £(\d[\d.,]*)\b
pattern finds £
and then captures a digit and then any 0+ digits/,
/.
, as many as possible, but will backtrack to a position where a word boundary is.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With