Extracting numeric data Python

Question

If I have few lines which read:

1,000 barrels
5 Megawatts hours (MWh)
80 Megawatt hours (MWh) (5 MW per peak hour).

What is the best way to capture the numeric elements (namely just the first instance) and the first parentheses if it exists.

My current approach is to use split the string for every ' '. and str.isalpha to find the non alpha elements. But, not sure of how to obtain the first entry in the parantheses.

zmo · Accepted Answer

here's an approach using regexps:

import re

text = """1,000 barrels
5 Megawatts hours (MWh)
80 Megawatt hours (MWh) (...)"""

r_unit = re.compile("$(\w+)$")
r_value = re.compile("([\d,]+)")

for line in text.splitlines():
    unit = r_unit.search(line)
    if unit:
        unit = unit.groups()[0]
    else:
        unit = ""
    value = r_value.search(line)
    if value:
        value = value.groups()[0]
    else:
        value = ""
    print value, unit

or another simpler approach would be using a regexp like this:

r = re.compile("(([\d,]+).*$?(\w+)?$?)")
for line, value, unit in r.findall(text):
    print value, unit

(I thought about that one just after writing the previous one :-p)

full explanation of last regexp:

(      <- LINE GROUP
 (     <- VALUE GROUP
  [    <- character grouping (i.e. read char is one of the following characters)
   \d  <- any digit
   ,   <- a comma
  ]
  +    <- one or more of the previous expression
 )
 .     <- any character
 *     <- zero or more of the previous expression
 $    <- a real parenthesis
 ?     <- zero or one of the previous expression
 (     <- UNIT GROUP
  [
   \w  <- any alphabetic/in-word character
   +   <- one or more of the previous expression
  ]
 )
 ?     <- zero or one of the previous expression
 $    <- a real ending parenthesis
 ?     <- zero or one of the previous expression
 )
)

Ayaz Ahmad · Answer

For extraction numerical values you can use re

import re
value = """1,000 barrels
           5 Megawatts hours (MWh)
           80 Megawatt hours (MWh) (5 MW per peak hour)"""
re.findall("[0-9]+,?[0-9]*", value)

Extracting numeric data Python

Tags:

python

Max Kim

2 Answers

zmo

Ayaz Ahmad

Recent Activity

Donate For Us

Extracting numeric data Python

Tags:

python

Max Kim

2 Answers

zmo

Ayaz Ahmad

Related questions

Recent Activity

Donate For Us