<p>I have a dataframe , which consists of three columns. And i want to append "Yes" or "No" to one of the column using python-pandas. Also the ratio between Yes:No is 7:3.</p> <p>Had anyone tried this??</p>

<h3>Quick and Dirty</h3> <pre class="prettyprint"><code>pd.Series(np.random.rand(100)).apply(lambda x: 'Yes' if x < .7 else 'No') </code></pre>

How to randomly append "Yes/No" (ratio of 7:3) to a column in pandas dataframe?

4 Answers

With numpy's random.choice:

df["new_column"] = np.random.choice(["Yes", "No"], len(df), p=[0.7, 0.3])

Note: np.random.choice consists of independent trials (unless you pass replace = False). In each trial, the probability of getting a "Yes" will be 0.7. In the end you might not end up exactly with a 70% ratio. However, with 2480500 rows this binomial distribution will approximate to a normal distribution with a mean 2480500 * 0.7 and a standard deviation sqrt(2480500 * 0.7 * 0.3). With +/-3 standard deviation (with 99.73% probability) you will end up with a ratio between (0.69913, 0.70087). But if you want exactly 70%, you can use pandas' sample as @EdChum suggested, I guess it has a correction factor.

165

answered Oct 14 '22 07:10

ayhan

You can use sample to achieve this:

In [11]:
df = pd.DataFrame(np.random.randn(20,3), columns=list('abc'))
df

Out[11]:
           a         b         c
0  -0.267704  1.030417 -0.494542
1  -0.830801  0.421847  1.296952
2  -1.165387 -0.381976 -0.178988
3  -0.800799 -0.240998 -0.900573
4   0.855965  0.765313 -0.125862
5   1.153730  1.323783 -0.113135
6   0.242592 -2.137141 -0.230177
7  -0.451582  0.267415  1.006564
8   0.071916  0.476523  1.326859
9  -1.168084  0.250367 -1.235262
10  0.238183  0.391661 -1.177926
11 -1.153294 -0.304811 -0.955384
12 -0.984470 -0.351073 -1.155049
13 -2.068388  1.294905  0.892136
14 -0.196381 -1.083988  0.203369
15 -1.430208  0.859933  1.152462
16 -0.250452  0.824815  0.425096
17  1.051399 -1.199689  0.487980
18  0.688910 -0.664028 -0.097302
19 -0.355774  0.064857  0.003731

In [12]:    
df.loc[df.index.to_series().sample(frac=0.7).index, 'new_col'] = 'Yes'
df['new_col'].fillna('No',inplace=True)
df

Out[12]:
           a         b         c new_col
0  -0.267704  1.030417 -0.494542     Yes
1  -0.830801  0.421847  1.296952     Yes
2  -1.165387 -0.381976 -0.178988      No
3  -0.800799 -0.240998 -0.900573      No
4   0.855965  0.765313 -0.125862      No
5   1.153730  1.323783 -0.113135     Yes
6   0.242592 -2.137141 -0.230177     Yes
7  -0.451582  0.267415  1.006564     Yes
8   0.071916  0.476523  1.326859      No
9  -1.168084  0.250367 -1.235262     Yes
10  0.238183  0.391661 -1.177926     Yes
11 -1.153294 -0.304811 -0.955384     Yes
12 -0.984470 -0.351073 -1.155049     Yes
13 -2.068388  1.294905  0.892136     Yes
14 -0.196381 -1.083988  0.203369      No
15 -1.430208  0.859933  1.152462     Yes
16 -0.250452  0.824815  0.425096     Yes
17  1.051399 -1.199689  0.487980     Yes
18  0.688910 -0.664028 -0.097302     Yes
19 -0.355774  0.064857  0.003731      No

Basically you can call sample and pass param frac=0.7 and then use the index to mask the df and assign the 'yes' value and then call fillna to assign the 'no' values

answered Oct 14 '22 06:10

EdChum

import pandas as pd
import random

arr = ['Yes'] * 7 + ['No'] * 3
arr *= number_of_rows // 10

random.shuffle(arr)

df['column_name'] = arr

answered Oct 14 '22 07:10

Vedang Mehta

Quick and Dirty

pd.Series(np.random.rand(100)).apply(lambda x: 'Yes' if x < .7 else 'No')

answered Oct 14 '22 06:10

piRSquared

Related questions
                            
                                I want to create something like a python dictionary in C++
                            
                                Force Django to use HTTPS URLs when reversing
                            
                                Flask-SocketIO server using polling instead of websockets
                            
                                Detect circle like shapes opencv
                            
                                Python's hasattr sometimes returns incorrect results
                            
                                How to average a signal to remove noise with Python
                            
                                Python dummy statement for nothing or nop when indent expected [duplicate]
                            
                                Pandas: Using the tilde operator to return inverse data with two filters
                            
                                Using tqdm on a for loop inside a function to check progress
                            
                                filtering numpy matrix on a column
                            
                                Convert all numeric columns of dataframe to absolute value
                            
                                how to get multiple conditional operations after a Pandas groupby?
                            
                                Can no Longer open Spyder IDE for Python Programming
                            
                                how to get img from selenium
                            
                                split bytes variable on newline
                            
                                Tails - Package 'python3-tk' has no installation candidate
                            
                                Read the written list of dictionaries from file in Python
                            
                                How to retrieve value of n-th element in pandas Series object?
                            
                                Pandas KeyError using pivot
                            
                                vim - Youcomplete me unable to find an appropriate Python library

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to randomly append "Yes/No" (ratio of 7:3) to a column in pandas dataframe?

Tags:

python

pandas

dataframe

Chandu codes

People also ask