I have a Pandas dataframe with a column 'htgt'
this column consists of array with numbers inside. The size of the array is not constant. An example of the data:
11 [16, 69]
12 [61, 79]
13 [10, 69]
14 [81]
15 [12, 30, 45, 68]
16 [10, 76]
17 [9, 39]
18 [67, 69, 77]
How can I filter all the rows that has the number 10 for example.
You could do this by first creating a boolean index using list comprehension:
mask = [(10 in x) for x in df['htgt']]
df[mask]
Or one line if you prefer:
df.loc[[(10 in x) for x in df['htgt']]]
[output]
htgt
13 [10, 69]
16 [10, 76]
Don't store type list
in pandas
columns, it's not efficient, and it will make your data harder to interact with. Just expand your lists to columns:
out = pd.DataFrame(df.htgt.values.tolist())
0 1 2 3
0 16 69.0 NaN NaN
1 61 79.0 NaN NaN
2 10 69.0 NaN NaN
3 81 NaN NaN NaN
4 12 30.0 45.0 68.0
5 10 76.0 NaN NaN
6 9 39.0 NaN NaN
7 67 69.0 77.0 NaN
Now you can use efficient pandas
operations to find rows with 10
:
out.loc[out.eq(10).any(1)]
0 1 2 3
2 10 69.0 NaN NaN
5 10 76.0 NaN NaN
If you insist on the result being in list
form, you can use stack
and agg
:
out.loc[out.eq(10).any(1)].stack().groupby(level=0).agg(list)
2 [10.0, 69.0]
5 [10.0, 76.0]
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With