How to assign values randomly between dataframes

Tags:

I am trying to randomly assign values from one column in one dataframe, to another dataframe within 12 different categories (by agerange and gender). For example I have two dataframes; lets call one d1 and the other d2

  d1:
index agerange gender income
 0     2        1      56700
 1     2        0      25600
 2     4        0      3000
 3     4        0      106000
 4     3        0      200
 5     3        0      43000
 6     4        0      10000000

d2:
index agerange gender 
 0     3        0      
 1     2        0      
 2     4        0      
 3     4        0

I want to group both dataframes by agerange and gender i.e 0-1,2,3,4,5,6 & 1-1,2,3,4,5,6 then randomly chose one of the incomes within d1 and assign it to d2.

ie:

d1:
index agerange gender income
 0     2        1      56700
 1     2        0      25600
 2     4        0      3000
 3     4        0      106000
 4     3        0      200
 5     3        0      43000
 6     4        0      10000000

d2:
index agerange gender  income
 0     3        0      200  
 1     2        0      25600 
 2     4        0      10000000
 3     4        0      3000

818

asked Jul 31 '17 16:07

stav

1 Answers

Option 1
An approach with np.random.choice and pd.DataFrame.query
I'm making an implicit assumption that we replace randomly drawn values for every row.

def take_one(x):
    q = 'agerange == {agerange} and gender == {gender}'.format(**x)
    return np.random.choice(d1.query(q).income)

d2.assign(income=d2.apply(take_one, 1))

       agerange  gender  income
index                          
0             3       0     200
1             2       0   25600
2             4       0  106000
3             4       0  106000

Option 2
Attempting to make it more efficient to call np.random.choice once per group.

g = d1.groupby(['agerange', 'gender']).income.apply(list)
f = lambda x: pd.Series(np.random.choice(g.get(x.name, [0] * len(x)), len(x)), x.index)
d2.groupby(['agerange', 'gender'], group_keys=False).apply(f)

       agerange  gender    income
index                            
0             3       0       200
1             2       0     25600
2             4       0  10000000
3             4       0    106000

Debugging and Setup

import pandas as pd
import numpy as np

d1 = pd.DataFrame({
        'agerange': [2, 2, 4, 4, 3, 3, 4],
        'gender': [1, 0, 0, 0, 0, 0, 0],
        'income': [56700, 25600, 3000, 106000, 200, 43000, 10000000]
    }, pd.Index([0, 1, 2, 3, 4, 5, 6], name='index')
)

d2 = pd.DataFrame(
    {'agerange': [3, 2, 4, 4], 'gender': [0, 0, 0, 0]},
    pd.Index([0, 1, 2, 3], name='index')
)

g = d1.groupby(['agerange', 'gender']).income.apply(list)
f = lambda x: pd.Series(np.random.choice(g.loc[x.name], len(x)), x.index)
d2.assign(income=d2.groupby(['agerange', 'gender'], group_keys=False).apply(f))

       agerange  gender  income
index                          
0             3       0     200
1             2       0   25600
2             4       0  106000
3             4       0    3000

139

answered Oct 21 '22 11:10

piRSquared

Related questions
                            
                                Generating a recursive sitemap with relative href links
                            
                                Run scrapy in background (Ubuntu)
                            
                                Pandas join/merge/concat two DataFrames and combine rows of identical key/index [duplicate]
                            
                                How to join/merge a list of dataframes with common keys in PySpark?
                            
                                How to wrap or embed generators?
                            
                                Find first position of values in list
                            
                                Why do I get the error "Expected singleton" in spite of sending only one id?
                            
                                In python grpc I got a exception “No match found for server name”
                            
                                Primitive wrappers in Python for Protobuf
                            
                                ModuleNotFoundError: No module named 'requests'
                            
                                SaltStack modules vs states
                            
                                How to drop first row using pandas?
                            
                                how to replace a character INSIDE the text content of many files automatically?
                            
                                Convert mat file to pandas dataframe
                            
                                Convert date from int64 to datetime
                            
                                Matrix of all possible multipliable outcomes from two lists into dataframe
                            
                                Django Migration Database Column Order
                            
                                How can I implement an increment function for enum objects? [duplicate]
                            
                                Join/Merge two Pandas dataframes and use columns as multiindex
                            
                                Reverse string columns in a pandas subset dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to assign values randomly between dataframes

Tags:

python

pandas

stav

People also ask

1 Answers

piRSquared

Recent Activity

Donate For Us