Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Build a dictionary from successful regex matches in python

I'm pretty new to Python, and I'm trying to parse a file. Only certain lines in the file contain data of interest, and I want to end up with a dictionary of the stuff parsed from valid matching lines in the file.

The code below works, but it's a bit ugly and I'm trying to learn how it should be done, perhaps with a comprehension, or else with a multiline regex. I'm using Python 3.2.

file_data = open('x:\\path\\to\\file','r').readlines()
my_list = []
for line in file_data:
    # discard lines which don't match at all
    if re.search(pattern, line):
        # icky, repeating search!!
        one_tuple = re.search(pattern, line).group(3,2)
        my_list.append(one_tuple)
my_dict = dict(my_list)

Can you suggest a better implementation?

like image 1000
WiringHarness Avatar asked Jun 19 '12 06:06

WiringHarness


People also ask

Can you use regex in dictionary Python?

To use the dictionary as a regex pattern we use join() on the dictionary to turn the keys into a string separated by the regex or operator: |. With our string pattern we then compile it. Note the case insensitivity with re. IGNORECASE.

How do you create a dictionary in Python?

Creating Python Dictionary Creating a dictionary is as simple as placing items inside curly braces {} separated by commas. An item has a key and a corresponding value that is expressed as a pair (key: value).

Can you create a dictionary from a list in Python?

You can convert a Python list to a dictionary using the dict. fromkeys() method, a dictionary comprehension, or the zip() method. The zip() method is useful if you want to merge two lists into a dictionary.


3 Answers

Thanks for the replies. After putting them together I got

file_data = open('x:\\path\\to\\file','r').read()
my_list = re.findall(pattern, file_data, re.MULTILINE)
my_dict = {c:b for a,b,c in my_list}

but I don't think I could have gotten there today without the help.

like image 53
WiringHarness Avatar answered Nov 10 '22 07:11

WiringHarness


Here's some quick'n'dirty optimisations to your code:

my_dict = dict()

with open(r'x:\path\to\file', 'r') as data:
    for line in data:
        match = re.search(pattern, line)
        if match:
            one_tuple = match.group(3, 2)
            my_dict[one_tuple[0]] = one_tuple[1]
like image 27
srgerg Avatar answered Nov 10 '22 07:11

srgerg


In the spirit of EAFP I'd suggest

with open(r'x:\path\to\file', 'r') as data:
    for line in data:
        try:
            m = re.search(pattern, line)
            my_dict[m.group(2)] = m.group(3)
        except AttributeError:
            pass

Another way is to keep using lists, but redesign the pattern so that it contains only two groups (key, value). Then you could simply do:

  matches = [re.findall(pattern, line) for line in data]
  mydict = dict(x[0] for x in matches if x)
like image 21
georg Avatar answered Nov 10 '22 07:11

georg