Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to swap the 0 and 1 values for each other in a pandas data frame?

I am working with a pandas dataframe that has a column of all 0's and 1's and I am trying to switch each of the values (ie all of the 0's become 1's and all of the 1's become 0's). Is there an easy way to do this?

like image 476
jharkins Avatar asked Jul 14 '17 04:07

jharkins


People also ask

How can I replace two values in pandas?

Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.

How do you replace zeros in pandas?

Replace NaN Values with Zero on pandas DataFrame Use the DataFrame. fillna(0) method to replace NaN/None values with the 0 value. It doesn't change the object data but returns a new DataFrame.


2 Answers

Use replace:

df = df.replace({0:1, 1:0})

Or faster numpy.logical_xor:

df = np.logical_xor(df,1).astype(int)

Or more faster:

df = pd.DataFrame(np.logical_xor(df.values,1).astype(int),columns=df.columns, index=df.index)

Sample:

np.random.seed(12)
df = pd.DataFrame(np.random.choice([0,1], size=[10,3]))
print (df)
   0  1  2
0  1  1  0
1  1  1  0
2  1  1  0
3  0  0  1
4  0  1  1
5  1  0  1
6  0  0  0
7  1  0  0
8  1  0  1
9  1  0  0

df = df.replace({0:1, 1:0})
print (df)
   0  1  2
0  0  0  1
1  0  0  1
2  0  0  1
3  1  1  0
4  1  0  0
5  0  1  0
6  1  1  1
7  0  1  1
8  0  1  0
9  0  1  1

Another solution:

df = (~df.astype(bool)).astype(int)
print (df)
   0  1  2
0  0  0  1
1  0  0  1
2  0  0  1
3  1  1  0
4  1  0  0
5  0  1  0
6  1  1  1
7  0  1  1
8  0  1  0
9  0  1  1

Timings:

np.random.seed(12)
df = pd.DataFrame(np.random.choice([0,1], size=[10000,10000]))
print (df)

In [69]: %timeit (np.logical_xor(df,1).astype(int))
1 loop, best of 3: 1.42 s per loop

In [70]: %timeit (df ^ 1)
1 loop, best of 3: 2.53 s per loop

In [71]: %timeit ((~df.astype(bool)).astype(int))
1 loop, best of 3: 1.81 s per loop

In [72]: %timeit (df.replace({0:1, 1:0}))
1 loop, best of 3: 5.08 s per loop

In [73]: %timeit pd.DataFrame(np.logical_xor(df.values,1).astype(int), columns=df.columns, index=df.index)
1 loop, best of 3: 350 ms per loop

Edit: This should be faster:

import numexpr as ne
arr = df.values
df = pd.DataFrame(ne.evaluate('1 - arr'),columns=df.columns, index=df.index)
like image 142
jezrael Avatar answered Oct 26 '22 07:10

jezrael


One easy way would be -

df[:] = 1-df.values

For performance, we might want to work with underlying array data, for a modified version like so -

a = df.values
a[:] = 1-a

Sample run -

In [43]: df
Out[43]: 
   0  1  2
0  0  0  1
1  0  0  1
2  0  0  1
3  1  1  0
4  1  0  0

In [44]: df[:] = 1-df.values

In [45]: df
Out[45]: 
   0  1  2
0  1  1  0
1  1  1  0
2  1  1  0
3  0  0  1
4  0  1  1

Using @jezrael's timings setup with the best solution from that setup for comparison against the one proposed in this post -

In [46]: np.random.seed(12)
    ...: df = pd.DataFrame(np.random.choice([0,1], size=[10000,10000]))
    ...: 

# Proposed in this post
In [47]: def swap_0_1(df):
    ...:     a = df.values
    ...:     a[:] = 1-a
    ...:     

In [48]: %timeit pd.DataFrame(np.logical_xor(df.values,1).astype(int), columns=df.columns, index=df.index)
10 loops, best of 3: 218 ms per loop

In [49]: %timeit swap_0_1(df)
10 loops, best of 3: 198 ms per loop

Or even better to use the negation of the boolean version of input array data -

In [60]: def swap_0_1_bool(df):
    ...:     a = df.values
    ...:     a[:] = ~a.astype(bool)
    ...:     

In [63]: %timeit swap_0_1_bool(df)
10 loops, best of 3: 179 ms per loop
like image 6
Divakar Avatar answered Oct 26 '22 08:10

Divakar