I read a csv file into a pandas dataframe, and would like to convert the columns with binary answers from strings of yes/no to integers of 1/0. Below, I show one of such columns ("sampleDF" is the pandas dataframe). <pre class="prettyprint"><code>In [13]: sampleDF.housing[0:10] Out[13]: 0 no 1 no 2 yes 3 no 4 no 5 no 6 no 7 no 8 yes 9 yes Name: housing, dtype: object </code></pre> Help is much appreciated!

method 1 <pre class="prettyprint"><code>sample.housing.eq('yes').mul(1) </code></pre> method 2 <pre class="prettyprint"><code>pd.Series(np.where(sample.housing.values == 'yes', 1, 0), sample.index) </code></pre> method 3 <pre class="prettyprint"><code>sample.housing.map(dict(yes=1, no=0)) </code></pre> method 4 <pre class="prettyprint"><code>pd.Series(map(lambda x: dict(yes=1, no=0)[x], sample.housing.values.tolist()), sample.index) </code></pre> method 5 <pre class="prettyprint"><code>pd.Series(np.searchsorted(['no', 'yes'], sample.housing.values), sample.index) </code></pre> <hr> All yield <pre class="prettyprint"><code>0 0 1 0 2 1 3 0 4 0 5 0 6 0 7 0 8 1 9 1 </code></pre> <hr> timing given sample <img src="https://i.stack.imgur.com/NkimJ.png" alt="enter image description here"> timing long sample <code>sample = pd.DataFrame(dict(housing=np.random.choice(('yes', 'no'), size=100000)))</code> <img src="https://i.stack.imgur.com/Ztp87.png" alt="enter image description here">

Is there a simple way to change a column of yes/no to 1/0 in a Pandas dataframe?

Tags:

python

pandas

dataframe

series

I read a csv file into a pandas dataframe, and would like to convert the columns with binary answers from strings of yes/no to integers of 1/0. Below, I show one of such columns ("sampleDF" is the pandas dataframe).

In [13]: sampleDF.housing[0:10] Out[13]: 0     no 1     no 2    yes 3     no 4     no 5     no 6     no 7     no 8    yes 9    yes Name: housing, dtype: object

Help is much appreciated!

643

asked Dec 01 '16 02:12

Mushu909

1 Answers

method 1

sample.housing.eq('yes').mul(1)

method 2

pd.Series(np.where(sample.housing.values == 'yes', 1, 0),           sample.index)

method 3

sample.housing.map(dict(yes=1, no=0))

method 4

pd.Series(map(lambda x: dict(yes=1, no=0)[x],               sample.housing.values.tolist()), sample.index)

method 5

pd.Series(np.searchsorted(['no', 'yes'], sample.housing.values), sample.index)

All yield

0    0 1    0 2    1 3    0 4    0 5    0 6    0 7    0 8    1 9    1

timing
given sample

enter image description here

timing
long sample
sample = pd.DataFrame(dict(housing=np.random.choice(('yes', 'no'), size=100000)))

enter image description here

189

answered Sep 20 '22 18:09

piRSquared

Related questions
                            
                                Crontab not executing a Python script? [duplicate]
                            
                                Python vs. Java performance (runtime speed) [duplicate]
                            
                                ImportError: No module named _ssl
                            
                                How to take the nth digit of a number in python
                            
                                Get hard disk size in Python
                            
                                What are good ways to make my Python code run first time? [closed]
                            
                                Progress Bar while download file over http with Requests
                            
                                Matplotlib: Nose, Tornado
                            
                                Convert python long/int to fixed size byte array
                            
                                matplotlib: overlay plots with different scales?
                            
                                How to add readonly inline on django admin
                            
                                PySpark: multiple conditions in when clause
                            
                                pip install mysqlclient returns "fatal error C1083: Cannot open file: 'mysql.h': No such file or directory
                            
                                My matplotlib.pyplot legend is being cut off
                            
                                Rotate point about another point in degrees python
                            
                                python 2 instead of python 3 as the (temporary) default python?
                            
                                building Python from source with zlib support
                            
                                String similarity metrics in Python
                            
                                What does "TypeError: 'float' object cannot be interpreted as an integer" mean when using range?
                            
                                The 'pip==7.1.0' distribution was not found and is required by the application

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With