I have a data frame with some columns with empty lists and others with lists of strings:
donation_orgs donation_context
0 [] []
1 [the research of Dr. ...] [In lieu of flowers , memorial donations ...]
I'm trying to return a data set without any of the rows where there are empty lists.
I've tried just checking for null values:
dfnotnull = df[df.donation_orgs != []]
dfnotnull
and
dfnotnull = df[df.notnull().any(axis=1)]
pd.options.display.max_rows=500
dfnotnull
And I've tried looping through and checking for values that exist, but I think the lists aren't returning Null or None like I thought they would:
dfnotnull = pd.DataFrame(columns=('donation_orgs', 'donation_context'))
for i in range(0,len(df)):
if df['donation_orgs'].iloc(i):
dfnotnull.loc[i] = df.iloc[i]
All three of the above methods simply return every row in the original data frame.=
Drop Empty Rows or Columns If you're looking to drop rows (or columns) containing empty data, you're in luck: Pandas' dropna() method is specifically for this. Technically you could run df. dropna() without any parameters, and this would default to dropping all rows where are completely empty.
To delete rows and columns from DataFrames, Pandas uses the “drop” function. To delete a column, or multiple columns, use the name of the column(s), and specify the “axis” as 1. Alternatively, as in the example below, the 'columns' parameter has been added in Pandas which cuts out the need for 'axis'.
Delete Top N Rows of DataFrame Using drop() By default axis = 0 meaning to delete rows. Use axis=1 or columns param to delete columns. Use inplace=True to delete row/column in place meaning on existing DataFrame with out creating copy.
To avoid converting to str
and actually use the list
s, you can do this:
df[df['donation_orgs'].map(lambda d: len(d)) > 0]
It maps the donation_orgs
column to the length of the lists of each row and keeps only the ones that have at least one element, filtering out empty lists.
It returns
Out[1]:
donation_context donation_orgs
1 [In lieu of flowers , memorial donations] [the research of Dr.]
as expected.
You could try slicing as though the data frame were strings instead of lists:
import pandas as pd
df = pd.DataFrame({
'donation_orgs' : [[], ['the research of Dr.']],
'donation_context': [[], ['In lieu of flowers , memorial donations']]})
df[df.astype(str)['donation_orgs'] != '[]']
Out[9]:
donation_context donation_orgs
1 [In lieu of flowers , memorial donations] [the research of Dr.]
You can use the following one-liner:
df[(df['donation_orgs'].str.len() != 0) | (df['donation_context'].str.len() != 0)]
Assuming that you read data from a CSV, the other possible solution could be this:
import pandas as pd
df = pd.read_csv('data.csv', na_filter=True, na_values='[]')
df.dropna()
na_filter
defines additional string to recognize as NaN. I tested this on pandas-0.24.2
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With