Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to make list of lists from pandas dataframe, skipping nan values

I have a pandas dataframe that looks roughly like

    foo   foo2   foo3  foo4
a   NY    WA     AZ    NaN
b   DC    NaN    NaN   NaN
c   MA    CA     NaN   NaN

I'd like to make a nested list of the observations of this dataframe, but omit the NaN values, so I have something like [['NY','WA','AZ'],['DC'],['MA',CA'].

There is a pattern in this dataframe, if that makes a difference, such that if fooX is empty, the subsequent column fooY will also be empty.

I originally had something like this code below. I'm sure there's a nicer way to do this

A = [[i] for i in subset_label['label'].tolist()]
B = [i for i in subset_label['label2'].tolist()]
C = [i for i in subset_label['label3'].tolist()]
D = [i for i in subset_label['label4'].tolist()]
out_list = []
for index, row in subset_label.iterrows():
out_list.append([row.label, row.label2, row.label3, row.label4])
out_list
like image 218
Erin Avatar asked Dec 14 '22 21:12

Erin


2 Answers

Try this:

In [77]: df.T.apply(lambda x: x.dropna().tolist()).tolist()
Out[77]: [['NY', 'WA', 'AZ'], ['DC'], ['MA', 'CA']]
like image 85
MaxU - stop WAR against UA Avatar answered May 02 '23 03:05

MaxU - stop WAR against UA


Option 1
pd.DataFrame.stack drops na by default.

df.stack().groupby(level=0).apply(list).tolist()

[['NY', 'WA', 'AZ'], ['DC'], ['MA', 'CA']]

​___

Option 2
Fun alternative, because I think summing lists within pandas objects is fun.

df.applymap(lambda x: [x] if pd.notnull(x) else []).sum(1).tolist()

[['NY', 'WA', 'AZ'], ['DC'], ['MA', 'CA']]

Option 3
numpy experiment

nn = df.notnull().values
sliced = df.values.ravel()[nn.ravel()]
splits = nn.sum(1)[:-1].cumsum()
[s.tolist() for s in np.split(sliced, splits)]

[['NY', 'WA', 'AZ'], ['DC'], ['MA', 'CA']]
like image 28
piRSquared Avatar answered May 02 '23 03:05

piRSquared