I need to add some 'noise' to my data, so I would like to add a different random number to every cell in my pandas dataframe. This code works, but seems unpythonic. Is there a better way? <pre class="prettyprint"><code>import pandas as pd import numpy as np df = pd.DataFrame(0.0, index=[1,2,3,4,5], columns=list('ABC') ) print df for x,line in df.iterrows(): for col in df: line[col] = line[col] + (np.random.rand()-0.5)/1000.0 print df </code></pre>

For nonzero data: <pre class="prettyprint"><code>df + (np.random.rand(df.shape)-0.5)*0.001 </code></pre> OR <pre class="prettyprint"><code>df + np.random.uniform(-0.01,0.01,(df.shape))) </code></pre> For cases where your data frame contains zeros that you wish to keep as zero: <pre class="prettyprint"><code>df * (1 + (np.random.rand(df.shape)-0.5)*0.001) </code></pre> OR <pre class="prettyprint"><code>df * (1 + np.random.uniform(-0.01,0.01,(df.shape))) </code></pre> I think either of these should work, its a case of generating a same size "dataframe" (or perhaps array of arrays) as your existing df and adding it to your existing df (multiplying by 1 + random for cases where you wish zeros to remain zero). With the uniform function you can determine the scale of your noise by altering the 0.01 variable.

add a different random number to every cell in a pandas dataframe

Tags:

python

pandas

dataframe

I need to add some 'noise' to my data, so I would like to add a different random number to every cell in my pandas dataframe. This code works, but seems unpythonic. Is there a better way?

import pandas as pd
import numpy as np
df = pd.DataFrame(0.0, index=[1,2,3,4,5], columns=list('ABC') )
print df
for x,line in df.iterrows():
  for col in df:
     line[col] = line[col] + (np.random.rand()-0.5)/1000.0
 print df

449

asked May 04 '17 15:05

TPM

2 Answers

df + np.random.rand(*df.shape) / 10000.0

Let's use applymap:

df = pd.DataFrame(1.0, index=[1,2,3,4,5], columns=list('ABC') )

df.applymap(lambda x: x + np.random.rand()/10000.0)

output:

                                                   A  \
1  [[1.00006953418, 1.00009164785, 1.00003177706]...   
2  [[1.00007291245, 1.00004186046, 1.00006935173]...   
3  [[1.00000490127, 1.0000633115, 1.00004117181],...   
4  [[1.00007159622, 1.0000559506, 1.00007038891],...   
5  [[1.00000980335, 1.00004760836, 1.00004214422]...   

                                                   B  \
1  [[1.00000320322, 1.00006981682, 1.00008912557]...   
2  [[1.00007443802, 1.00009270815, 1.00007225764]...   
3  [[1.00001371778, 1.00001512412, 1.00007986851]...   
4  [[1.00005883343, 1.00007936509, 1.00009523334]...   
5  [[1.00009329606, 1.00003174878, 1.00006187704]...   

                                                   C  
1  [[1.00005894836, 1.00006592776, 1.0000171843],...  
2  [[1.00009085391, 1.00006606979, 1.00001755092]...  
3  [[1.00009736701, 1.00007240762, 1.00004558753]...  
4  [[1.00003981393, 1.00007505714, 1.00007209959]...  
5  [[1.0000031608, 1.00009372917, 1.00001960112],...

103

answered Sep 26 '22 01:09

Scott Boston

For nonzero data:

df + (np.random.rand(df.shape)-0.5)*0.001

df + np.random.uniform(-0.01,0.01,(df.shape)))

For cases where your data frame contains zeros that you wish to keep as zero:

df * (1 + (np.random.rand(df.shape)-0.5)*0.001)

df * (1 + np.random.uniform(-0.01,0.01,(df.shape)))

I think either of these should work, its a case of generating a same size "dataframe" (or perhaps array of arrays) as your existing df and adding it to your existing df (multiplying by 1 + random for cases where you wish zeros to remain zero). With the uniform function you can determine the scale of your noise by altering the 0.01 variable.

answered Sep 24 '22 01:09

tfcoe

Related questions
                            
                                Plotting multiple dataframes using pandas functionality [duplicate]
                            
                                Where is the _socket file?
                            
                                Fit a curve to the boundary of a scatterplot
                            
                                Combine trigram with ranked searching in django 1.10
                            
                                Plot a pandas dataframe using the dataframe index for x coordinate in bokeh
                            
                                unittests for infinite loop
                            
                                Python Psycopg2 cursor.execute returning None
                            
                                What is the Right Syntax When Using .notnull() in Pandas?
                            
                                Remove some x labels with Seaborn
                            
                                Anaconda selenium and Chrome
                            
                                Use ipdb instead of pdb with py.test --pdb option
                            
                                AWS Lambda read contents of file in zip uploaded as source code
                            
                                Calculate set difference using jinja2 (in ansible)
                            
                                Slicing multiple ranges of columns in Pandas, by list of names
                            
                                Import theano gives the AttributeError: module 'theano' has no attribute 'gof'
                            
                                What does `S` signify in sympy
                            
                                Remove annotation while keeping plot matplotlib
                            
                                Changing colors for decision tree plot created using export graphviz
                            
                                Django autoreload: add watched file
                            
                                django rest framework: Get url path variable in a view

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With