Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas how to create random dummy data

Tags:

python

pandas

I often find myself in a situation, where I want to test some function on a sample dataframe.

Its super easy to create a random dataframe with numbers, like this:

pd.DataFrame(np.random.randn(5, 3), columns=list('ABC')) or pd.DataFrame(np.random.randint(2,10,(5,3)), columns=list('ABC')) if you want some more control over the values in your dummy data.

I am wondering if there is a more general library out there, that helps you to create dummy data of various types (e.g. datetime, categorial, ...)?

like image 209
Fabian Bosler Avatar asked Mar 15 '26 14:03

Fabian Bosler


1 Answers

looketh and you shall find

I changed it ever so slightly to get rid of the numpy warning:

import pandas as pd
import numpy as np
import datetime

dft = pd.DataFrame({
    'A' : ['spam', 'eggs', 'spam', 'eggs'] * 6,
    'B' : ['alpha', 'beta', 'gamma'] * 8,
    'C' : [np.random.choice(pd.date_range(datetime.datetime(2013,1,1),datetime.datetime(2013,1,3))) for i in range(24)],
    'D' : np.random.randn(24),
    'E' : np.random.randint(2,10,24),
    'F' : [np.random.choice(['rand_1', 'rand_2', 'rand_4', 'rand_6']) for i in range(24)],
})

dft
like image 52
Fabian Bosler Avatar answered Mar 17 '26 04:03

Fabian Bosler



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!