Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting single value from column in pandas

Tags:

python

pandas

I have a simple pandas question regarding extracting a single column value

df = DataFrame({'A' : [15,56,23,84], 'B' : [10,20,33,25]})
df

     A    B
0    15   10
1    56   20
2    23   33
3    84   55

x = df[df['A'] == 23]
x

outputs

    A    B
2  23    33

However, I only want to get the value in column B i.e. 33. How do I get that?

like image 344
user308827 Avatar asked Feb 21 '14 01:02

user308827


2 Answers

My preferred way is Jeff's using loc (it's generally good practice to avoid working on copies, especially if you might later do assignment).

You can eek some more performance by not creating a Series for the boolean mask, just a numpy array:

df = pd.DataFrame(np.random.randint(1, 100, 2000).reshape(-1, 2),
                  columns=list('AB'))

In [21]: %timeit df.loc[df.A == 23, 'B']
1000 loops, best of 3: 532 µs per loop

In [22]: %timeit df['B'][df.A == 23]
1000 loops, best of 3: 432 µs per loop

In [23]: %timeit df.loc[df.A.values == 23, 'B']  # preferred
1000 loops, best of 3: 294 µs per loop

In [24]: %timeit df['B'].loc[df.A.values == 23]
1000 loops, best of 3: 197 µs per loop

I'm not sure why this is so slow tbh, maybe this usecase could be improved...? (I'm not sure where the the extra 100us is spent)...

However, if you are just interested in the values of B and not their corresponding index (and the subframe) it's much faster just to use the numpy arrays directly:

In [25]: %timeit df.B.values[df.A.values == 23]
10000 loops, best of 3: 60.3 µs per loop
like image 109
Andy Hayden Avatar answered Oct 03 '22 00:10

Andy Hayden


Simply: df['B'][df['A'] == 23]

Thanks @Jeff.

And the speed comparisons:

In [30]:

%timeit df['B'][df['A'] == 23].values
1000 loops, best of 3: 813 µs per loop
In [31]:

%timeit df.loc[df['A'] == 23, 'B']
1000 loops, best of 3: 976 µs per loop
like image 22
CT Zhu Avatar answered Oct 03 '22 00:10

CT Zhu