Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting Prices with Regex

I'm look to extract prices from a string of scraped data.

I'm using this at the moment:

re.findall(r'£(?:\d+\.)?\d+.\d+', '£1.01')
['1.01']

Which works fine 99% of the time. However, I occasionally see this:

re.findall(r'£(?:\d+\.)?\d+.\d+', '£1,444.01')
['1,444']

I'd like to see ['1444.01'] ideally.

This is an example of the string I'm extracting the prices from.

'\n                £1,000.73                \n\n\n                + £1.26\nUK delivery\n\n\n'

I'm after some help putting together the regex to get ['1000.73', '1.26'] from that above string

like image 329
Leon Kyriacou Avatar asked Sep 15 '17 11:09

Leon Kyriacou


People also ask

What is $/ in regex?

The /^$/ component is a regular expression that matches the empty string. More specifically, it looks for the beginning of a line ( ^ ) followed directly by the end of a line ( $ ), which is to say an empty line.

What is extract regex?

Extracts the first matching substrings according to a regular expression.

How do I match a pattern in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).


1 Answers

You may grab all the values with '£(\d[\d.,]*)\b' and then remove all the commas with

import re
s = '\n                £1,000.73                \n\n\n                + £1.26\nUK delivery\n\n\n'
r = re.compile(r'£(\d[\d.,]*)\b')
print([x.replace(',', '') for x in re.findall(r, s)])
# => ['1000.73', '1.26']

See the Python demo

The £(\d[\d.,]*)\b pattern finds £ and then captures a digit and then any 0+ digits/,/., as many as possible, but will backtrack to a position where a word boundary is.

like image 172
Wiktor Stribiżew Avatar answered Oct 09 '22 06:10

Wiktor Stribiżew