Let's say my data frame contains these data: <pre class="prettyprint"><code>>>> df = pd.DataFrame({'a':['l1','l2','l1','l2','l1','l2'], 'b':['1','2','2','1','2','2']}) >>> df a b 0 l1 1 1 l2 2 2 l1 2 3 l2 1 4 l1 2 5 l2 2 </code></pre> <code>l1</code> should correspond to <code>1</code> whereas <code>l2</code> should correspond to <code>2</code>. I'd like to create a new column '<code>c</code>' such that, for each row, <code>c = 1</code> if <code>a = l1</code> and <code>b = 1</code> (or <code>a = l2</code> and <code>b = 2</code>). If <code>a = l1</code> and <code>b = 2</code> (or <code>a = l2</code> and <code>b = 1</code>) then <code>c = 0</code>. The resulting data frame should look like this: <pre class="prettyprint"><code> a b c 0 l1 1 1 1 l2 2 1 2 l1 2 0 3 l2 1 0 4 l1 2 0 5 l2 2 1 </code></pre> My data frame is very large so I'm really looking for the most efficient way to do this using pandas.

<pre class="prettyprint"><code>df = pd.DataFrame({'a': numpy.random.choice(['l1', 'l2'], 1000000), 'b': numpy.random.choice(['1', '2'], 1000000)}) </code></pre> A fast solution assuming only two distinct values: <pre class="prettyprint"><code>%timeit df['c'] = ((df.a == 'l1') == (df.b == '1')).astype(int) </code></pre> 10 loops, best of 3: 178 ms per loop @Viktor Kerkes: <pre class="prettyprint"><code>%timeit df['c'] = (df.a.str[-1] == df.b).astype(int) </code></pre> 1 loops, best of 3: 412 ms per loop @user1470788: <pre class="prettyprint"><code>%timeit df['c'] = (((df['a'] == 'l1')&(df['b']=='1'))|((df['a'] == 'l2')&(df['b']=='2'))).astype(int) </code></pre> 1 loops, best of 3: 363 ms per loop @herrfz <pre class="prettyprint"><code>%timeit df['c'] = (df.a.apply(lambda x: x[1:])==df.b).astype(int) </code></pre> 1 loops, best of 3: 387 ms per loop

You can also use the string methods. <pre class="prettyprint"><code>df['c'] = (df.a.str[-1] == df.b).astype(int) </code></pre>

how to compute a new column based on the values of other columns in pandas - python

Tags:

python

pandas

dataframe

Let's say my data frame contains these data:

>>> df = pd.DataFrame({'a':['l1','l2','l1','l2','l1','l2'],
                       'b':['1','2','2','1','2','2']})
>>> df
    a       b
0  l1       1
1  l2       2
2  l1       2
3  l2       1
4  l1       2
5  l2       2

l1 should correspond to 1 whereas l2 should correspond to 2. I'd like to create a new column 'c' such that, for each row, c = 1 if a = l1 and b = 1 (or a = l2 and b = 2). If a = l1 and b = 2 (or a = l2 and b = 1) then c = 0.

The resulting data frame should look like this:

  a         b   c
0  l1       1   1
1  l2       2   1
2  l1       2   0
3  l2       1   0
4  l1       2   0
5  l2       2   1

My data frame is very large so I'm really looking for the most efficient way to do this using pandas.

876

asked Aug 27 '13 18:08

HappyPy

2 Answers

df = pd.DataFrame({'a': numpy.random.choice(['l1', 'l2'], 1000000),
                   'b': numpy.random.choice(['1', '2'], 1000000)})

A fast solution assuming only two distinct values:

%timeit df['c'] = ((df.a == 'l1') == (df.b == '1')).astype(int)

10 loops, best of 3: 178 ms per loop

@Viktor Kerkes:

%timeit df['c'] = (df.a.str[-1] == df.b).astype(int)

1 loops, best of 3: 412 ms per loop

@user1470788:

%timeit df['c'] = (((df['a'] == 'l1')&(df['b']=='1'))|((df['a'] == 'l2')&(df['b']=='2'))).astype(int)

1 loops, best of 3: 363 ms per loop

@herrfz

%timeit df['c'] = (df.a.apply(lambda x: x[1:])==df.b).astype(int)

1 loops, best of 3: 387 ms per loop

163

answered Sep 19 '22 00:09

chlunde

You can also use the string methods.

df['c'] = (df.a.str[-1] == df.b).astype(int)

answered Sep 22 '22 00:09

Viktor Kerkez

Related questions
                            
                                Exit while loop in Python
                            
                                python map function iteration
                            
                                Pickle Queue objects in python
                            
                                django-allauth configuration doubts
                            
                                Quick way to reject a list in Python
                            
                                Suppress unicode prefix on strings when using pprint
                            
                                Python decorator optional argument
                            
                                Finding All Defined Functions in Python Environment
                            
                                Returning a row from a CSV, if specified value within the row matches condition
                            
                                Matplotlib, adding text with more than one line. Adding text that can follow the curve
                            
                                Mongodb replica set auto reconect don't work after down and up for nginx + uwsgi with several processes
                            
                                web scraping dynamic content with python
                            
                                Composition - Reference to another class in Python
                            
                                ValueError: too many values to unpack in Python Dictionary [duplicate]
                            
                                Is python @decorator related to the decorator design pattern?
                            
                                Convert integer to binary in python and compare the bits
                            
                                How to use super() when subclassing Tkinter widgets? [duplicate]
                            
                                Cannot convert array to floats python
                            
                                Matplotlib: using a figure object to initialize a plot
                            
                                basemap: How to remove actual lat/lon lines while keeping the ticks on the axis

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With