Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove rows with empty lists from pandas data frame

I have a data frame with some columns with empty lists and others with lists of strings:

       donation_orgs                              donation_context
0            []                                           []
1   [the research of Dr. ...]   [In lieu of flowers , memorial donations ...]

I'm trying to return a data set without any of the rows where there are empty lists.

I've tried just checking for null values:

dfnotnull = df[df.donation_orgs != []]
dfnotnull

and

dfnotnull = df[df.notnull().any(axis=1)]
pd.options.display.max_rows=500
dfnotnull

And I've tried looping through and checking for values that exist, but I think the lists aren't returning Null or None like I thought they would:

dfnotnull = pd.DataFrame(columns=('donation_orgs', 'donation_context'))
for i in range(0,len(df)):
    if df['donation_orgs'].iloc(i):
        dfnotnull.loc[i] = df.iloc[i]

All three of the above methods simply return every row in the original data frame.=

like image 839
Ben Price Avatar asked Dec 08 '15 17:12

Ben Price


People also ask

How do I get rid of blank rows in pandas?

Drop Empty Rows or Columns If you're looking to drop rows (or columns) containing empty data, you're in luck: Pandas' dropna() method is specifically for this. Technically you could run df. dropna() without any parameters, and this would default to dropping all rows where are completely empty.

How do I delete multiple rows in pandas DataFrame?

To delete rows and columns from DataFrames, Pandas uses the “drop” function. To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1. Alternatively, as in the example below, the 'columns' parameter has been added in Pandas which cuts out the need for 'axis'.

How do I delete 10 rows in pandas?

Delete Top N Rows of DataFrame Using drop() By default axis = 0 meaning to delete rows. Use axis=1 or columns param to delete columns. Use inplace=True to delete row/column in place meaning on existing DataFrame with out creating copy.


4 Answers

To avoid converting to str and actually use the lists, you can do this:

df[df['donation_orgs'].map(lambda d: len(d)) > 0]

It maps the donation_orgs column to the length of the lists of each row and keeps only the ones that have at least one element, filtering out empty lists.

It returns

Out[1]: 
                            donation_context          donation_orgs
1  [In lieu of flowers , memorial donations]  [the research of Dr.]

as expected.

like image 182
Victor Avatar answered Oct 19 '22 19:10

Victor


You could try slicing as though the data frame were strings instead of lists:

import pandas as pd
df = pd.DataFrame({
'donation_orgs' : [[], ['the research of Dr.']],
'donation_context': [[], ['In lieu of flowers , memorial donations']]})

df[df.astype(str)['donation_orgs'] != '[]']

Out[9]: 
                            donation_context          donation_orgs
1  [In lieu of flowers , memorial donations]  [the research of Dr.]
like image 37
Woody Pride Avatar answered Oct 19 '22 17:10

Woody Pride


You can use the following one-liner:

df[(df['donation_orgs'].str.len() != 0) | (df['donation_context'].str.len() != 0)]
like image 11
Amir Imani Avatar answered Oct 19 '22 17:10

Amir Imani


Assuming that you read data from a CSV, the other possible solution could be this:

import pandas as pd

df = pd.read_csv('data.csv', na_filter=True, na_values='[]')
df.dropna()

na_filter defines additional string to recognize as NaN. I tested this on pandas-0.24.2.

like image 3
Mark Avatar answered Oct 19 '22 17:10

Mark