I have a list of lists that looks like this, that was pulled in from a poorly formatted csv file:
DF = [['Customer Number: 001 '],
['Notes: Bought a ton of stuff and was easy to deal with'],
['Customer Number: 666 '],
['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
['Customer Number: 103 '],
['Notes: bought a ton of stuff got a free keychain'],
['Notes: gave us a referral to his uncles cousins hairdresser'],
['Notes: name address birthday social security number on file'],
['Customer Number: 007 '],
['Notes: looked a lot like James Bond'],
['Notes: came in with a martini']]
I would like to end up with a new structure like this:
['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
'Customer Number: 666 Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL',
'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral to his uncles cousins hairdresser',
'Customer Number: 103 Notes: name address birthday social security number on file',
'Customer Number: 007 Notes: looked a lot like James Bond',
'Customer Number: 007 Notes: came in with a martini']
after which I can further split, strip, etc.
So, I used the facts that:
Customer Number
Notes
are always longerNotes
never exceeds 5to code up what is clearly an absurd solution, even though it works.
DF = [item for sublist in DF for item in sublist]
DF = DF + ['stophere']
DF2 = []
for record in DF:
if (record[0:17]=="Customer Number: ") & (record !="stophere"):
DF2.append(record + DF[DF.index(record)+1])
if len(DF[DF.index(record)+2]) >21:
DF2.append(record + DF[DF.index(record)+2])
if len(DF[DF.index(record)+3]) >21:
DF2.append(record + DF[DF.index(record)+3])
if len(DF[DF.index(record)+4]) >21:
DF2.append(record + DF[DF.index(record)+4])
if len(DF[DF.index(record)+5]) >21:
DF2.append(record + DF[DF.index(record)+5])
Would anyone mind recommending a more stable and intelligent solution to this kind of problem?
Just keep track of when we find a new customer:
from pprint import pprint as pp
out = []
for sub in DF:
if sub[0].startswith("Customer Number"):
cust = sub[0]
else:
out.append(cust + sub[0])
pp(out)
Output:
['Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
'Customer Number: 666 Notes: acted and looked like Chris Farley on that '
'hidden decaf skit from SNL',
'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral to his uncles cousins '
'hairdresser',
'Customer Number: 103 Notes: name address birthday social security number '
'on file',
'Customer Number: 007 Notes: looked a lot like James Bond',
'Customer Number: 007 Notes: came in with a martini']
If the customer can repeat again later and you want them grouped together use a dict:
from collections import defaultdict
d = defaultdict(list)
for sub in DF:
if sub[0].startswith("Customer Number"):
cust = sub[0]
else:
d[cust].append(cust + sub[0])
print(d)
Output:
pp(d)
{'Customer Number: 001 ': ['Customer Number: 001 Notes: Bought a ton of '
'stuff and was easy to deal with'],
'Customer Number: 007 ': ['Customer Number: 007 Notes: looked a lot like '
'James Bond',
'Customer Number: 007 Notes: came in with a '
'martini'],
'Customer Number: 103 ': ['Customer Number: 103 Notes: bought a ton of '
'stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral '
'to his uncles cousins hairdresser',
'Customer Number: 103 Notes: name address '
'birthday social security number on file'],
'Customer Number: 666 ': ['Customer Number: 666 Notes: acted and looked '
'like Chris Farley on that hidden decaf skit '
'from SNL']}
Based on your comment and error you seem to have lines coming before an actual customer so we can add them to the first customer in the list:
# added ["foo"] before we see any customer
DF = [["foo"],['Customer Number: 001 '],
['Notes: Bought a ton of stuff and was easy to deal with'],
['Customer Number: 666 '],
['Notes: acted and looked like Chris Farley on that hidden decaf skit from SNL'],
['Customer Number: 103 '],
['Notes: bought a ton of stuff got a free keychain'],
['Notes: gave us a referral to his uncles cousins hairdresser'],
['Notes: name address birthday social security number on file'],
['Customer Number: 007 '],
['Notes: looked a lot like James Bond'],
['Notes: came in with a martini']]
from pprint import pprint as pp
from itertools import takewhile, islice
# find lines up to first customer
start = list(takewhile(lambda x: "Customer Number:" not in x[0], DF))
out = []
ln = len(start)
# if we had data before we actually found a customer this will be True
if start:
# so set cust to first customer in list and start adding to out
cust = DF[ln][0]
for sub in start:
out.append(cust + sub[0])
# ln will either be 0 if start is empty else we start at first customer
for sub in islice(DF, ln, None):
if sub[0].startswith("Customer Number"):
cust = sub[0]
else:
out.append(cust + sub[0])
Which outputs:
['Customer Number: 001 foo',
'Customer Number: 001 Notes: Bought a ton of stuff and was easy to deal with',
'Customer Number: 666 Notes: acted and looked like Chris Farley on that '
'hidden decaf skit from SNL',
'Customer Number: 103 Notes: bought a ton of stuff got a free keychain',
'Customer Number: 103 Notes: gave us a referral to his uncles cousins '
'hairdresser',
'Customer Number: 103 Notes: name address birthday social security number '
'on file',
'Customer Number: 007 Notes: looked a lot like James Bond',
'Customer Number: 007 Notes: came in with a martini']
I presumed you would consider lines that come before any customer to actually belong to that first customer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With