Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Traversing a list of lists by index within a loop, to reformat strings

Tags:

python

loops

list

I have a list of lists that looks like this, that was pulled in from a poorly formatted csv file:

DF = [['Customer Number: 001 '],
 ['Notes: Bought a ton of stuff and was easy to deal with'],
 ['Customer Number: 666 '],
 ['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
 ['Customer Number: 103 '],
 ['Notes: bought a ton of stuff got a free keychain'],
 ['Notes: gave us a referral to his uncles cousins hairdresser'],
 ['Notes: name address birthday social security number on file'],
 ['Customer Number: 007 '],
 ['Notes: looked a lot like James Bond'],
 ['Notes: came in with a martini']]

I would like to end up with a new structure like this:

['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
 'Customer Number: 666 Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL',
 'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
 'Customer Number: 103 Notes: gave us a referral to his uncles cousins hairdresser',
 'Customer Number: 103 Notes: name address birthday social security number on file',
 'Customer Number: 007 Notes: looked a lot like James Bond',
 'Customer Number: 007 Notes: came in with a martini']

after which I can further split, strip, etc.

So, I used the facts that:

  • the customer number always starts with Customer Number
  • the Notes are always longer
  • the number of Notes never exceeds 5

to code up what is clearly an absurd solution, even though it works.

DF = [item for sublist in DF for item in sublist]
DF = DF + ['stophere']
DF2 = []

for record in DF:
    if (record[0:17]=="Customer Number: ") & (record !="stophere"):
        DF2.append(record + DF[DF.index(record)+1])
        if len(DF[DF.index(record)+2]) >21:
            DF2.append(record + DF[DF.index(record)+2])
            if len(DF[DF.index(record)+3]) >21:
                DF2.append(record + DF[DF.index(record)+3])
                if len(DF[DF.index(record)+4]) >21:
                    DF2.append(record + DF[DF.index(record)+4])
                    if len(DF[DF.index(record)+5]) >21:
                        DF2.append(record + DF[DF.index(record)+5])

Would anyone mind recommending a more stable and intelligent solution to this kind of problem?

like image 790
tumultous_rooster Avatar asked Mar 04 '15 02:03

tumultous_rooster


1 Answers

Just keep track of when we find a new customer:

from pprint import pprint as pp

out = []
for sub in DF:
    if sub[0].startswith("Customer Number"):
        cust = sub[0]
    else:
        out.append(cust + sub[0])
pp(out)

Output:

['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
 'Customer Number: 666 Notes: acted and looked like Chris Farley on that '
 'hidden decaf skit from SNL',
 'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
 'Customer Number: 103 Notes: gave us a referral to his uncles cousins '
 'hairdresser',
 'Customer Number: 103 Notes: name address birthday social security number '
 'on file',
 'Customer Number: 007 Notes: looked a lot like James Bond',
 'Customer Number: 007 Notes: came in with a martini']

If the customer can repeat again later and you want them grouped together use a dict:

from collections import defaultdict
d = defaultdict(list)
for sub in DF:
    if sub[0].startswith("Customer Number"):
        cust = sub[0]
    else:
        d[cust].append(cust + sub[0])
print(d)

Output:

pp(d)

{'Customer Number: 001 ': ['Customer Number: 001 Notes: Bought a ton of '
                           'stuff and was easy to deal with'],
 'Customer Number: 007 ': ['Customer Number: 007 Notes: looked a lot like '
                           'James Bond',
                           'Customer Number: 007 Notes: came in with a '
                           'martini'],
 'Customer Number: 103 ': ['Customer Number: 103 Notes: bought a ton of '
                           'stuff got a free keychain',
                           'Customer Number: 103 Notes: gave us a referral '
                           'to his uncles cousins hairdresser',
                           'Customer Number: 103 Notes: name address '
                           'birthday social security number on file'],
 'Customer Number: 666 ': ['Customer Number: 666 Notes: acted and looked '
                           'like Chris Farley on that hidden decaf skit '
                           'from SNL']}

Based on your comment and error you seem to have lines coming before an actual customer so we can add them to the first customer in the list:

# added ["foo"] before we see any customer

DF = [["foo"],['Customer Number: 001 '],
 ['Notes: Bought a ton of stuff and was easy to deal with'],
 ['Customer Number: 666 '],
 ['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
 ['Customer Number: 103 '],
 ['Notes: bought a ton of stuff got a free keychain'],
 ['Notes: gave us a referral to his uncles cousins hairdresser'],
 ['Notes: name address birthday social security number on file'],
 ['Customer Number: 007 '],
 ['Notes: looked a lot like James Bond'],
 ['Notes: came in with a martini']]


from pprint import pprint as pp

from itertools import takewhile, islice

# find lines up to first customer
start = list(takewhile(lambda x: "Customer Number:" not in x[0], DF))

out = []
ln = len(start)
# if we had data before we actually found a customer this will be True
if start: 
    # so set cust to first customer in list and start adding to out
    cust = DF[ln][0]
    for sub in start:
        out.append(cust + sub[0])
# ln will either be 0 if start is empty else we start at first customer
for sub in islice(DF, ln, None):
    if sub[0].startswith("Customer Number"):
        cust = sub[0]
    else:
        out.append(cust + sub[0])

Which outputs:

 ['Customer Number: 001 foo',
 'Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
 'Customer Number: 666 Notes: acted and looked like Chris Farley on that '
 'hidden decaf skit from SNL',
 'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
 'Customer Number: 103 Notes: gave us a referral to his uncles cousins '
 'hairdresser',
 'Customer Number: 103 Notes: name address birthday social security number '
 'on file',
 'Customer Number: 007 Notes: looked a lot like James Bond',
 'Customer Number: 007 Notes: came in with a martini']

I presumed you would consider lines that come before any customer to actually belong to that first customer.

like image 88
Padraic Cunningham Avatar answered Sep 28 '22 23:09

Padraic Cunningham