Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Add Elements to Lists within List if Missing

Given the following list of lists:

a = [[2,3],[1,2,3],[1]]

I need each list within a to have the same number of elements. First, I need to get the longest length of any list in a. Then, I need to ensure all lists are at least that long. If not, I want to add a zero (0) to the end until that is true. The desired result is:

b = [[2,3,0],[1,2,3],[1,0,0]]

Thanks in advance!

P.S. I also need to apply this to a Pandas Data Frame like this one:

import pandas as pd
b = [[2,3,0],[1,2,3],[1,0,0]]
f=pd.DataFrame({'column':b})
like image 948
Dance Party2 Avatar asked Nov 16 '16 20:11

Dance Party2


2 Answers

How about

pd.DataFrame(a).fillna(0)

enter image description here


to get exactly what you asked for

pd.Series(pd.DataFrame(a).fillna(0).astype(int).values.tolist()).to_frame('column')

enter image description here


this is also related to this question

where you can get much better performance with

def box(v):
    lens = np.array([len(item) for item in v])
    mask = lens[:,None] > np.arange(lens.max())
    out = np.full(mask.shape, 0, dtype=int)
    out[mask] = np.concatenate(v)
    return out

pd.DataFrame(dict(columns=box(a).tolist()))

enter image description here


timing
enter image description here

like image 126
piRSquared Avatar answered Sep 28 '22 10:09

piRSquared


First, compute the maximum length of your elements:

maxlen=len(max(a,key=len))  # max element using sublist len criterion

or as Patrick suggested do it using generator comprehension on sublist lengths, probably a tad faster:

maxlen=max(len(sublist) for sublist in a)  # max of all sublist lengths

then create a new list with 0 padding:

b = [sl+[0]*(maxlen-len(sl)) for sl in a]  # list comp for padding

result with a = [[2,3],[1,2,3],[1]]:

[[2, 3, 0], [1, 2, 3], [1, 0, 0]]

Note: could be done in one line but would not be very performant because of the recomputation of maxlen. One-liners are not always the best solution.

b = [sl+[0]*(len(max(a,key=len))-len(sl)) for sl in a]  # not very performant
like image 38
Jean-François Fabre Avatar answered Sep 28 '22 11:09

Jean-François Fabre