I'm look to extract prices from a string of scraped data. I'm using this at the moment: <pre class="prettyprint"><code>re.findall(r'£(?:\d+\.)?\d+.\d+', '£1.01') ['1.01'] </code></pre> Which works fine 99% of the time. However, I occasionally see this: <pre class="prettyprint"><code>re.findall(r'£(?:\d+\.)?\d+.\d+', '£1,444.01') ['1,444'] </code></pre> I'd like to see <code>['1444.01']</code> ideally. This is an example of the string I'm extracting the prices from. <pre class="prettyprint"><code>'\n £1,000.73 \n\n\n + £1.26\nUK delivery\n\n\n' </code></pre> I'm after some help putting together the regex to get <code>['1000.73', '1.26']</code> from that above string

You may grab all the values with <code>'£(\d[\d.,]*)\b'</code> and then remove all the commas with <pre class="prettyprint"><code>import re s = '\n £1,000.73 \n\n\n + £1.26\nUK delivery\n\n\n' r = re.compile(r'£(\d[\d.,]*)\b') print([x.replace(',', '') for x in re.findall(r, s)]) # => ['1000.73', '1.26'] </code></pre> See the Python demo The <code>£(\d[\d.,]*)\b</code> pattern finds <code>£</code> and then captures a digit and then any 0+ digits/<code>,</code>/<code>.</code>, as many as possible, but will backtrack to a position where a word boundary is.

Extracting Prices with Regex

Tags:

regex

python-3.x

I'm look to extract prices from a string of scraped data.

I'm using this at the moment:

re.findall(r'£(?:\d+\.)?\d+.\d+', '£1.01')
['1.01']

Which works fine 99% of the time. However, I occasionally see this:

re.findall(r'£(?:\d+\.)?\d+.\d+', '£1,444.01')
['1,444']

I'd like to see ['1444.01'] ideally.

This is an example of the string I'm extracting the prices from.

'\n                £1,000.73                \n\n\n                + £1.26\nUK delivery\n\n\n'

I'm after some help putting together the regex to get ['1000.73', '1.26'] from that above string

329

asked Sep 15 '17 11:09

Leon Kyriacou

1 Answers

You may grab all the values with '£(\d[\d.,]*)\b' and then remove all the commas with

import re
s = '\n                £1,000.73                \n\n\n                + £1.26\nUK delivery\n\n\n'
r = re.compile(r'£(\d[\d.,]*)\b')
print([x.replace(',', '') for x in re.findall(r, s)])
# => ['1000.73', '1.26']

See the Python demo

The £(\d[\d.,]*)\b pattern finds £ and then captures a digit and then any 0+ digits/,/., as many as possible, but will backtrack to a position where a word boundary is.

172

answered Oct 09 '22 06:10

Wiktor Stribiżew

Related questions
                            
                                How do I combine the AND and OR operator in a pandas data frame?
                            
                                Python partition string with regular expressions
                            
                                Java - Pattern matches but fails to capture
                            
                                regexp_like similar function in MySQL?
                            
                                Regular expression for finding letters before or after a certain letter
                            
                                Regex to find all possible occurrences of text starting and ending with ~
                            
                                regex to match word (url) only if it does not contain character
                            
                                Regular expression to select all whitespace that IS in quotes?
                            
                                capturing complex names
                            
                                Using regex replace in SSMS 2016 to trim lines
                            
                                Local variable or instance field name doesn't match regex '[a-z]+'
                            
                                .net regex with condition lookbehind and capture group
                            
                                How to do a "raw" string search and replace in JavaScript, no REGEX [duplicate]
                            
                                finding all regex matches from a pandas dataframe column
                            
                                Exclude everything after the second occurrence of a certain string
                            
                                Java Regex : match whole word with word boundary
                            
                                Mod Rewrite Regex Negative Lookahead
                            
                                How to deal with English contractions programmatically [Regex, JS, Ruby]
                            
                                Python: regex match across file chunk boundaries
                            
                                pandas read_csv fix columns to read data with newline characters in data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With