I have a simple pandas question regarding extracting a single column value
df = DataFrame({'A' : [15,56,23,84], 'B' : [10,20,33,25]})
df
A B
0 15 10
1 56 20
2 23 33
3 84 55
x = df[df['A'] == 23]
x
outputs
A B
2 23 33
However, I only want to get the value in column B i.e. 33. How do I get that?
My preferred way is Jeff's using loc (it's generally good practice to avoid working on copies, especially if you might later do assignment).
You can eek some more performance by not creating a Series for the boolean mask, just a numpy array:
df = pd.DataFrame(np.random.randint(1, 100, 2000).reshape(-1, 2),
columns=list('AB'))
In [21]: %timeit df.loc[df.A == 23, 'B']
1000 loops, best of 3: 532 µs per loop
In [22]: %timeit df['B'][df.A == 23]
1000 loops, best of 3: 432 µs per loop
In [23]: %timeit df.loc[df.A.values == 23, 'B'] # preferred
1000 loops, best of 3: 294 µs per loop
In [24]: %timeit df['B'].loc[df.A.values == 23]
1000 loops, best of 3: 197 µs per loop
I'm not sure why this is so slow tbh, maybe this usecase could be improved...? (I'm not sure where the the extra 100us is spent)...
However, if you are just interested in the values of B and not their corresponding index (and the subframe) it's much faster just to use the numpy arrays directly:
In [25]: %timeit df.B.values[df.A.values == 23]
10000 loops, best of 3: 60.3 µs per loop
Simply: df['B'][df['A'] == 23]
Thanks @Jeff.
And the speed comparisons:
In [30]:
%timeit df['B'][df['A'] == 23].values
1000 loops, best of 3: 813 µs per loop
In [31]:
%timeit df.loc[df['A'] == 23, 'B']
1000 loops, best of 3: 976 µs per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With