Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to index out value of number in pandas Dataframe KeyError: False

So I am trying to index out a value after it has been filtered to append it to a list. So far here is the code:

import pandas as pd
import numpy as np
arr_1 = np.array([7, 1, 6, 9, 2, 4])
arr_2 = np.array([5, 8, 9, 10, 2, 3])
arr_3 = np.array([1, 9, 3, 4, 5, 1])

dict_of_arrs = {
    'arr' : [arr_1, arr_2, arr_3]
}
df = pd.DataFrame(dict_of_arrs)

true_list = []
false_list = []
filt = df.arr.apply(lambda x: np.diff(x)>0)
for i in filt:
    for n in i:
        if n==True:
            true_list.append(df.arr[n])
        else:
            false_list.append(df.arr[n])

Though I get the error:

KeyError: False

I have also treid indexing by doing df.arr[i][n] instead but as expected that gives me the error:

IndexError: Boolean index has wrong length: 5 instead of 3

What I would like to do is filter out True or False as I already have, then I would like to append the orignal number of all the True values to true_list and the same with the False. So when I do print(true_list) the output is a list of lists, with each list having only the values where filt==True, and the same for false_list. Thank You.

EDIT: The expect output should look something like:

print(true_list)

then the output being:

[ 6, 9, 4]
[ 8, 9, 10, 3]
[ 9, 4, 5]

Because in each list the filt is looking for if the following value is greater than the last value. Therefore those that are True, have their int value added to the true_list. For the false_list it would look like:

[ 1, 2]
[2]
[3, 1]

Thank you

like image 704
benito.cano Avatar asked Dec 11 '22 00:12

benito.cano


2 Answers

This is same as @Scott Boston's answer but without using groupby and explode.

Using np.diff and boolean indexing.

import numpy as np

df.arr.map(lambda x:np.array(x)[1:][np.diff(x)>=0])
0        [6, 9, 4]
1    [8, 9, 10, 3]
2        [9, 4, 5]
Name: arr, dtype: object

df.arr.map(lambda x:np.array(x)[1:][np.diff(x)<0])
0    [1, 2]
1       [2]
2    [3, 1]
Name: arr, dtype: object

timeit results:

In [63]: %%timeit
    ...: dfe = df['arr'].explode()
    ...: grp = dfe.groupby(level=0).diff()
    ...: df_g = dfe[grp >= 0]
    ...: df_increasing = df_g.groupby(level=0).agg(list)
    ...: 
    ...: df_l = dfe[grp < 0]
    ...: df_decreasing = df_l.groupby(level=0).agg(list)
    ...:
    ...:
7.16 ms ± 565 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [65]: %%timeit
    ...: df_x = df.arr.map(lambda x:np.array(x)[1:][np.diff(x)>=0])
    ...: df_y =df.arr.map(lambda x:np.array(x)[1:][np.diff(x)<0])
    ...:
    ...:
384 µs ± 5.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
like image 124
Ch3steR Avatar answered May 18 '23 13:05

Ch3steR


Let's see if this helps any:

dfe = df['arr'].explode()
grp = dfe.groupby(level=0).diff()
df_g = dfe[grp >= 0]
df_increasing = df_g.groupby(level=0).agg(list)

df_l = dfe[grp < 0]
df_decreasing = df_l.groupby(level=0).agg(list)

print(df_increasing)

# 0        [6, 9, 4]
# 1    [8, 9, 10, 3]
# 2        [9, 4, 5]
# Name: arr, dtype: object

print(df_decreasing)

# 0    [1, 2]
# 1       [2]
# 2    [3, 1]
# Name: arr, dtype: object
like image 40
Scott Boston Avatar answered May 18 '23 14:05

Scott Boston