Vectorized indexing numpy arrays in pandas Series with Boolean numpy arrays in pandas Series

Question

The following reproducible code produces an example data set that mimics my data on a much smaller scale.

import numpy as np 
import pandas as pd

np.random.seed(142536)

df = pd.DataFrame({
        "vals": list(np.arange(12).reshape(3,4)),
        "idx" : list(np.random.choice([True, False], 12).reshape(3,4))})
df

                           idx            vals
0   [False, True, True, False]    [0, 1, 2, 3]
1    [True, True, False, True]    [4, 5, 6, 7] 
2  [False, True, False, False]  [8, 9, 10, 11]

The following reproducible code returns the results I want, but is very inefficient for large data sets.
How would I do this more efficiently?

sel = []
for i in range(len(df.vals)):
    sel.append(df.vals[i][df.idx[i]])

df['sel'] = sel
df

                           idx            vals        sel
0   [False, True, True, False]    [0, 1, 2, 3]     [1, 2]
1    [True, True, False, True]    [4, 5, 6, 7]  [4, 5, 7]
2  [False, True, False, False]  [8, 9, 10, 11]        [9]

I have tried np.apply_along_axis(), np.where(), df.apply(), and df.transform(), but can't get any of them to work for this case without errors.

cs95 · Accepted Answer

The premise is bad because you shouldn't store data like this. You can at least speed this up by joining your data with itertools.chain, indexing, and then splitting the result with np.array_split.

from itertools import chain

fn = lambda x: np.array(list(chain.from_iterable(x)))
df['sel'] = np.array_split(
    fn(df.vals)[fn(df.idx)], np.cumsum([sum(x) for x in df.idx][:-1]))

                           idx            vals      sel
0   [True, False, True, False]    [0, 1, 2, 3]   [0, 2]
1  [False, False, False, True]    [4, 5, 6, 7]      [7]
2   [False, True, True, False]  [8, 9, 10, 11]  [9, 10]

Vectorized indexing numpy arrays in pandas Series with Boolean numpy arrays in pandas Series

Tags:

python

pandas

numpy

Clay

1 Answers

cs95

Recent Activity

Donate For Us

Vectorized indexing numpy arrays in pandas Series with Boolean numpy arrays in pandas Series

Tags:

python

pandas

numpy

Clay

1 Answers

cs95

Related questions

Recent Activity

Donate For Us