Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Dataframes from List of Dataframes based on condition

Tags:

python

pandas

I have a script that creates a list of dataframes to concatenate. Before concatenation, I am checking a certain column in each dataframe for the presence of a '1' binary flag. If there is not a one, I want to delete the dataframe from the list of dataframes. I am having trouble because I am not sure how to properly index the list to remove the dataframe. I recreated the problem with this code.

data = {'Name':['Tom', 'Tom', 'Tom', 'Tom'], 'Age':[20, 21, 19, 18]} 
data2 = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]} 

# Create DataFrame 
df = pd.DataFrame(data)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data)
df4 = pd.DataFrame(data2)

dflist = [df, df2, df3, df4]


for frame in dflist:
        vals = frame["Name"].values
        if 'krish' not in vals:
             dflist.remove(frame)

But

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I also tried enumerating the list and deleting based off dflist[i], but that changes the index if something is deleted so subsequently the wrong frames will be removed.

What is the proper way to remove dataframes from a list of df's based on condition? Thank you!

like image 721
johnny1995 Avatar asked Jan 26 '23 03:01

johnny1995


2 Answers

Instead of removing items from a list while iterating, which is generally a bad practice, use a list comprehension to generate a new list with the dataframes of interest:

[i for i in dflist if 'krish' not in i['Name'].values]

  Name  Age
 0  Tom   20
 1  Tom   21
 2  Tom   19
 3  Tom   18,   Name  Age
 0  Tom   20
 1  Tom   21
 2  Tom   19
 3  Tom   18]

If the dataframes are very large, here's a safe way to remove the unwanted dataframes from the original list:

ix = []
for i, frame in enumerate(dflist):
        vals = frame["Name"]
        if not vals.isin(['krish']).any():
             ix.append(i)

# sort the indices of dataframes to drop
# by starting from higher to lower indices you're guaranteed
# that the indices on the dataframe will remain unmodified while deleting
for i in sorted(ix, reverse=True):
    del dflist[i]
like image 168
yatu Avatar answered Jan 28 '23 18:01

yatu


You should using del from index part rather than using remove

l=[]
for index,frame in enumerate(dflist):
        vals = frame["Name"].values
        if 'krish' not in vals:
             l.append(index)
for x in sorted(l, reverse=True):
    del dflist[x]
like image 20
BENY Avatar answered Jan 28 '23 18:01

BENY