How to create a dictionary from a line of text?

Tags:

I have a generated file with thousands of lines like the following:

CODE,XXX,DATE,20101201,TIME,070400,CONDITION_CODES,LTXT,PRICE,999.0000,QUANTITY,100,TSN,1510000001

Some lines have more fields and others have fewer, but all follow the same pattern of key-value pairs and each line has a TSN field.

When doing some analysis on the file, I wrote a loop like the following to read the file into a dictionary:

#!/usr/bin/env python

from sys import argv

records = {}
for line in open(argv[1]):
    fields = line.strip().split(',')
    record = dict(zip(fields[::2], fields[1::2]))
    records[record['TSN']] = record

print 'Found %d records in the file.' % len(records)

...which is fine and does exactly what I want it to (the print is just a trivial example).

However, it doesn't feel particularly "pythonic" to me and the line with:

dict(zip(fields[::2], fields[1::2]))

Which just feels "clunky" (how many times does it iterate over the fields?).

Is there a better way of doing this in Python 2.6 with just the standard modules to hand?

780

asked Dec 04 '10 23:12

Johnsyweb

1 Answers

In Python 2 you could use izip in the itertools module and the magic of generator objects to write your own function to simplify the creation of pairs of values for the dict records. I got the idea for pairwise() from a similarly named (although functionally different) recipe in the Python 2 itertools docs.

To use the approach in Python 3, you can just use plain zip() since it does what izip() did in Python 2 resulting in the latter's removal from itertools — the example below addresses this and should work in both versions.

try:
    from itertools import izip
except ImportError:  # Python 3
    izip = zip

def pairwise(iterable):
    "s -> (s0,s1), (s2,s3), (s4, s5), ..."
    a = iter(iterable)
    return izip(a, a)

Which can be used like this in your file reading for loop:

from sys import argv

records = {}
for line in open(argv[1]):
    fields = (field.strip() for field in line.split(','))  # generator expr
    record = dict(pairwise(fields))
    records[record['TSN']] = record

print('Found %d records in the file.' % len(records))

But wait, there's more!

It's possible to create a generalized version I'll call grouper(), which again corresponds to a similarly named itertools recipe (which is listed right below pairwise()):

def grouper(n, iterable):
    "s -> (s0,s1,...sn-1), (sn,sn+1,...s2n-1), (s2n,s2n+1,...s3n-1), ..."
    return izip(*[iter(iterable)]*n)

Which could be used like this in your for loop:

    record = dict(grouper(2, fields))

Of course, for specific cases like this, it's easy to use functools.partial() and create a similar pairwise() function with it (which will work in both Python 2 & 3):

import functools
pairwise = functools.partial(grouper, 2)

Postscript

Unless there's a really huge number of fields, you could instead create a actual sequence out of the pairs of line items (rather than using a generator expression which has no len()):

fields = tuple(field.strip() for field in line.split(','))

The advantage being that it would allow the grouping to be done using simple slicing:

try:
    xrange
except NameError:  # Python 3
    xrange = range

def grouper(n, sequence):
    for i in xrange(0, len(sequence), n):
        yield sequence[i:i+n]

pairwise = functools.partial(grouper, 2)

140

answered Oct 07 '22 14:10

martineau

Related questions
                            
                                python beautifulsoup new_tag: assign class as an attribute
                            
                                Check if a string contains substring at the end
                            
                                .arff files with scikit-learn?
                            
                                passing selenium response url to scrapy
                            
                                RemovedInDjango19Warning: Model doesn't declare an explicit app_label
                            
                                SQLAlchemy through Paramiko SSH
                            
                                Remove duplicate JSON objects from list in python
                            
                                Sort list of string based on number in string [duplicate]
                            
                                Python - how to read an image from a URL?
                            
                                All row sum with pandas except one
                            
                                Python in Google Cloud Functions
                            
                                SSL: CERTIFICATE_VERIFY_FAILED error with python3 on macOS 10.15
                            
                                CSV file with Arabic characters is displayed as symbols in Excel
                            
                                Cannot import SQLite with Python 2.6
                            
                                In Python, is there a concise way to use a list comprehension with multiple iterators?
                            
                                What is the best way to toggle python prints?
                            
                                How Do I Use A Decimal Number In A Django URL Pattern?
                            
                                CPython is bytecode interpreter?
                            
                                What is Ruby's analog to Python Metaclasses?
                            
                                Why don't you need a powerful ide for writing Python? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to create a dictionary from a line of text?

Tags:

python

dictionary

parsing

Johnsyweb

People also ask

1 Answers

martineau

Recent Activity

Donate For Us