Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter pandas dataframe that has arrays in cells

I have a Pandas dataframe with a column 'htgt' this column consists of array with numbers inside. The size of the array is not constant. An example of the data:

11                  [16, 69]
12                  [61, 79]
13                  [10, 69]
14                      [81]
15          [12, 30, 45, 68]
16                  [10, 76]
17                   [9, 39]
18              [67, 69, 77]

How can I filter all the rows that has the number 10 for example.

like image 851
Borut Flis Avatar asked Jan 27 '23 11:01

Borut Flis


2 Answers

You could do this by first creating a boolean index using list comprehension:

mask = [(10 in x) for x in df['htgt']]
df[mask]

Or one line if you prefer:

df.loc[[(10 in x) for x in df['htgt']]]

[output]

htgt
13  [10, 69]
16  [10, 76]
like image 99
Chris Adams Avatar answered Jan 31 '23 07:01

Chris Adams


Don't store type list in pandas columns, it's not efficient, and it will make your data harder to interact with. Just expand your lists to columns:

out = pd.DataFrame(df.htgt.values.tolist())

    0     1     2     3
0  16  69.0   NaN   NaN
1  61  79.0   NaN   NaN
2  10  69.0   NaN   NaN
3  81   NaN   NaN   NaN
4  12  30.0  45.0  68.0
5  10  76.0   NaN   NaN
6   9  39.0   NaN   NaN
7  67  69.0  77.0   NaN

Now you can use efficient pandas operations to find rows with 10:

out.loc[out.eq(10).any(1)]

    0     1   2   3
2  10  69.0 NaN NaN
5  10  76.0 NaN NaN

If you insist on the result being in list form, you can use stack and agg:

out.loc[out.eq(10).any(1)].stack().groupby(level=0).agg(list)

2    [10.0, 69.0]
5    [10.0, 76.0]
dtype: object
like image 42
user3483203 Avatar answered Jan 31 '23 08:01

user3483203