Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split a line into a dictionary with multiple layers of key value pairs

I have a file that contains lines in this type of format.

Example 1:
nextline = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4 };"

Example 2:
nextline = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"

I first split the line by ':' which gives me a list with 2 entries. I'd like to split this line into a dictionary with a key and value, but where the score key has multiple sub keys with a value.

Hole 1
Par 4
Index 2
Distance 459
Score 
    Player1 4
    Player2 6
    Player3 4

So I am using something like this...

split_line_by_semicolon = nextline.split(":")
dictionary_of_line = dict((k.strip(), v.strip()) for k,v in (item.split('=')     
    for item in split_line_by_semicolon.split(';')))
        for keys,values in dictionary_of_line.items():
            print("{0} {1}".format(keys,values))

However I get an error on the score element of the line:

ValueError: too many values to unpack (expected 2)

I can adjust the split on '=' to this, so it stops after the first '='

dictionary_of_line = dict((k.strip(), v.strip()) for k,v in (item.split('=',1)     
    for item in split_line_by_semicolon.split(';')))
        for keys,values in dictionary_of_line.items():
            print("{0} {1}".format(keys,values))

However I lose the sub values within the curly brackets. Does anybody know how I can achieve this multi layer dictionary?

like image 976
John Avatar asked Oct 19 '22 00:10

John


2 Answers

A simpler way to do it (but I don't know if it is acceptable in your situation) would be:

import re

nextline = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"

# compiles the regular expression to get the info you want
my_regex = re.compile(r'\w+ \= \w+')

# builds the structure of the dict you expect to get 
final_dict = {'Hole':0, 'Par':0, 'Index':0, 'Distance':0, 'Score':{}}

# uses the compiled regular expression to filter out the info you want from the string
filtered_items = my_regex.findall(nextline)

for item in filtered_items:
    # for each filtered item (string in the form key = value)
    # splits out the 'key' and handles it to fill your final dictionary
    key = item.split(' = ')[0]
    if key.startswith('Player'):
        final_dict['Score'][key] = int(item.split(' = ')[1])
    else:
        final_dict[key] = int(item.split(' = ')[1])
like image 106
Lucas Infante Avatar answered Oct 22 '22 11:10

Lucas Infante


I would use regular expressions in the same manner as maccinza did (I like his answer), with one minor difference - a data with inner dictionary in it can be processed recursively:

#example strings:
nextline1 = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4 };"
nextline2 = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"

import re
lineRegexp = re.compile(r'.+\'WeeklyMedal:(.+)\'?') #this regexp returns WeeklyMedal record.
weeklyMedalRegexp = re.compile(r'(\w+) = (\{.+\}|\w+)') #this regexp parses WeeklyMedal

#helper recursive function to process WeeklyMedal record. returns dictionary
parseWeeklyMedal = lambda r, info: { k: (int(v) if v.isdigit() else parseWeeklyMedal(r, v)) for (k, v) in r.findall(info)}
parsedLines = []
for line in [nextline1, nextline2]:
    info = lineRegexp.search(line)
    if info:
        #process WeeklyMedal record
        parsedLines.append(parseWeeklyMedal(weeklyMedalRegexp, info.group(0)))
        #or do something with parsed dictionary in place

# do something here with entire result, print for example
print(parsedLines)
like image 43
merletta Avatar answered Oct 22 '22 10:10

merletta