So I am trying to index out a value after it has been filtered to append it to a list. So far here is the code:
import pandas as pd
import numpy as np
arr_1 = np.array([7, 1, 6, 9, 2, 4])
arr_2 = np.array([5, 8, 9, 10, 2, 3])
arr_3 = np.array([1, 9, 3, 4, 5, 1])
dict_of_arrs = {
'arr' : [arr_1, arr_2, arr_3]
}
df = pd.DataFrame(dict_of_arrs)
true_list = []
false_list = []
filt = df.arr.apply(lambda x: np.diff(x)>0)
for i in filt:
for n in i:
if n==True:
true_list.append(df.arr[n])
else:
false_list.append(df.arr[n])
Though I get the error:
KeyError: False
I have also treid indexing by doing df.arr[i][n]
instead but as expected that gives me the error:
IndexError: Boolean index has wrong length: 5 instead of 3
What I would like to do is filter out True or False as I already have, then I would like to append the orignal number of all the True values to true_list
and the same with the False. So when I do print(true_list)
the output is a list of lists, with each list having only the values where filt==True, and the same for false_list. Thank You.
EDIT: The expect output should look something like:
print(true_list)
then the output being:
[ 6, 9, 4]
[ 8, 9, 10, 3]
[ 9, 4, 5]
Because in each list the filt is looking for if the following value is greater than the last value. Therefore those that are True, have their int value added to the true_list. For the false_list it would look like:
[ 1, 2]
[2]
[3, 1]
Thank you
This is same as @Scott Boston's answer but without using groupby
and explode
.
Using np.diff
and boolean indexing.
import numpy as np
df.arr.map(lambda x:np.array(x)[1:][np.diff(x)>=0])
0 [6, 9, 4]
1 [8, 9, 10, 3]
2 [9, 4, 5]
Name: arr, dtype: object
df.arr.map(lambda x:np.array(x)[1:][np.diff(x)<0])
0 [1, 2]
1 [2]
2 [3, 1]
Name: arr, dtype: object
timeit
results:
In [63]: %%timeit
...: dfe = df['arr'].explode()
...: grp = dfe.groupby(level=0).diff()
...: df_g = dfe[grp >= 0]
...: df_increasing = df_g.groupby(level=0).agg(list)
...:
...: df_l = dfe[grp < 0]
...: df_decreasing = df_l.groupby(level=0).agg(list)
...:
...:
7.16 ms ± 565 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [65]: %%timeit
...: df_x = df.arr.map(lambda x:np.array(x)[1:][np.diff(x)>=0])
...: df_y =df.arr.map(lambda x:np.array(x)[1:][np.diff(x)<0])
...:
...:
384 µs ± 5.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Let's see if this helps any:
dfe = df['arr'].explode()
grp = dfe.groupby(level=0).diff()
df_g = dfe[grp >= 0]
df_increasing = df_g.groupby(level=0).agg(list)
df_l = dfe[grp < 0]
df_decreasing = df_l.groupby(level=0).agg(list)
print(df_increasing)
# 0 [6, 9, 4]
# 1 [8, 9, 10, 3]
# 2 [9, 4, 5]
# Name: arr, dtype: object
print(df_decreasing)
# 0 [1, 2]
# 1 [2]
# 2 [3, 1]
# Name: arr, dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With