Given the following list of lists:
a = [[2,3],[1,2,3],[1]]
I need each list within a to have the same number of elements. First, I need to get the longest length of any list in a. Then, I need to ensure all lists are at least that long. If not, I want to add a zero (0) to the end until that is true. The desired result is:
b = [[2,3,0],[1,2,3],[1,0,0]]
Thanks in advance!
P.S. I also need to apply this to a Pandas Data Frame like this one:
import pandas as pd
b = [[2,3,0],[1,2,3],[1,0,0]]
f=pd.DataFrame({'column':b})
How about
pd.DataFrame(a).fillna(0)
to get exactly what you asked for
pd.Series(pd.DataFrame(a).fillna(0).astype(int).values.tolist()).to_frame('column')
this is also related to this question
where you can get much better performance with
def box(v):
lens = np.array([len(item) for item in v])
mask = lens[:,None] > np.arange(lens.max())
out = np.full(mask.shape, 0, dtype=int)
out[mask] = np.concatenate(v)
return out
pd.DataFrame(dict(columns=box(a).tolist()))
timing
First, compute the maximum length of your elements:
maxlen=len(max(a,key=len)) # max element using sublist len criterion
or as Patrick suggested do it using generator comprehension on sublist lengths, probably a tad faster:
maxlen=max(len(sublist) for sublist in a) # max of all sublist lengths
then create a new list with 0 padding:
b = [sl+[0]*(maxlen-len(sl)) for sl in a] # list comp for padding
result with a = [[2,3],[1,2,3],[1]]
:
[[2, 3, 0], [1, 2, 3], [1, 0, 0]]
Note: could be done in one line but would not be very performant because of the recomputation of maxlen. One-liners are not always the best solution.
b = [sl+[0]*(len(max(a,key=len))-len(sl)) for sl in a] # not very performant
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With