I am working with a pandas dataframe that has a column of all 0's and 1's and I am trying to switch each of the values (ie all of the 0's become 1's and all of the 1's become 0's). Is there an easy way to do this?
Pandas replace multiple values in column replace. By using DataFrame. replace() method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.
Replace NaN Values with Zero on pandas DataFrame Use the DataFrame. fillna(0) method to replace NaN/None values with the 0 value. It doesn't change the object data but returns a new DataFrame.
Use replace
:
df = df.replace({0:1, 1:0})
Or faster numpy.logical_xor
:
df = np.logical_xor(df,1).astype(int)
Or more faster:
df = pd.DataFrame(np.logical_xor(df.values,1).astype(int),columns=df.columns, index=df.index)
Sample:
np.random.seed(12)
df = pd.DataFrame(np.random.choice([0,1], size=[10,3]))
print (df)
0 1 2
0 1 1 0
1 1 1 0
2 1 1 0
3 0 0 1
4 0 1 1
5 1 0 1
6 0 0 0
7 1 0 0
8 1 0 1
9 1 0 0
df = df.replace({0:1, 1:0})
print (df)
0 1 2
0 0 0 1
1 0 0 1
2 0 0 1
3 1 1 0
4 1 0 0
5 0 1 0
6 1 1 1
7 0 1 1
8 0 1 0
9 0 1 1
Another solution:
df = (~df.astype(bool)).astype(int)
print (df)
0 1 2
0 0 0 1
1 0 0 1
2 0 0 1
3 1 1 0
4 1 0 0
5 0 1 0
6 1 1 1
7 0 1 1
8 0 1 0
9 0 1 1
Timings:
np.random.seed(12)
df = pd.DataFrame(np.random.choice([0,1], size=[10000,10000]))
print (df)
In [69]: %timeit (np.logical_xor(df,1).astype(int))
1 loop, best of 3: 1.42 s per loop
In [70]: %timeit (df ^ 1)
1 loop, best of 3: 2.53 s per loop
In [71]: %timeit ((~df.astype(bool)).astype(int))
1 loop, best of 3: 1.81 s per loop
In [72]: %timeit (df.replace({0:1, 1:0}))
1 loop, best of 3: 5.08 s per loop
In [73]: %timeit pd.DataFrame(np.logical_xor(df.values,1).astype(int), columns=df.columns, index=df.index)
1 loop, best of 3: 350 ms per loop
Edit: This should be faster:
import numexpr as ne
arr = df.values
df = pd.DataFrame(ne.evaluate('1 - arr'),columns=df.columns, index=df.index)
One easy way would be -
df[:] = 1-df.values
For performance, we might want to work with underlying array data, for a modified version like so -
a = df.values
a[:] = 1-a
Sample run -
In [43]: df
Out[43]:
0 1 2
0 0 0 1
1 0 0 1
2 0 0 1
3 1 1 0
4 1 0 0
In [44]: df[:] = 1-df.values
In [45]: df
Out[45]:
0 1 2
0 1 1 0
1 1 1 0
2 1 1 0
3 0 0 1
4 0 1 1
Using @jezrael's timings setup
with the best solution from that setup for comparison against the one proposed in this post -
In [46]: np.random.seed(12)
...: df = pd.DataFrame(np.random.choice([0,1], size=[10000,10000]))
...:
# Proposed in this post
In [47]: def swap_0_1(df):
...: a = df.values
...: a[:] = 1-a
...:
In [48]: %timeit pd.DataFrame(np.logical_xor(df.values,1).astype(int), columns=df.columns, index=df.index)
10 loops, best of 3: 218 ms per loop
In [49]: %timeit swap_0_1(df)
10 loops, best of 3: 198 ms per loop
Or even better to use the negation of the boolean version of input array data -
In [60]: def swap_0_1_bool(df):
...: a = df.values
...: a[:] = ~a.astype(bool)
...:
In [63]: %timeit swap_0_1_bool(df)
10 loops, best of 3: 179 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With