I am trying to generate random data(integers) with dates so that I can practice pandas data analytics commands on it and plot time series graphs.
temp depth acceleration
2019-01-1 -0.218062 -1.215978 -1.674843
2019-02-1 -0.465085 -0.188715 0.241956
2019-03-1 -1.464794 -1.354594 0.635196
2019-04-1 0.103813 0.194349 -0.450041
2019-05-1 0.437921 0.073829 1.346550
Is there any random dataframe generator that can generate something like this with each date having a gap of one month?
Time series data is quickly generated in Pandas with the ‘date_range’ function. Below is an example of generating a dataframe with one random value each day for the year 2019. While the data here is usable for time series models, no patterns are visible.
Again, for generating sample time-series data, you'd be hard-pressed to find an easier method! In this example, we generate 12 timestamps an hour apart, a random value representing CPU usage, and then a second series of four values that represent IDs for fake devices. This should produce 48 rows (eg. 12 timestamps x 4 device IDs = 48 rows).
The TEXT function is a handy function to generate random time and display. We combined it with the RAND function to get the random time only. After entering the formula you will notice that cell B5 now has a random time. After this, drag the Fill Handle icon in the corner of cell B5 and drag it to cell B10.
When creating and testing time-series models, it is beneficial to test your models on random data as a baseline. Random walks can simulate trends for stocks, capacity utilization rate, and even particle motion. Through the adjustments of each step probability, behavior is added to the random walks.
You can either use pandas.util.testing
import pandas.util.testing as testing
import numpy as np
np.random.seed(1)
testing.N, testing.K = 5, 3 # Setting the rows and columns of the desired data
print testing.makeTimeDataFrame(freq='MS')
>>>
A B C
2000-01-01 -0.488392 0.429949 -0.723245
2000-02-01 1.247192 -0.513568 -0.512677
2000-03-01 0.293828 0.284909 1.190453
2000-04-01 -0.326079 -1.274735 -0.008266
2000-05-01 -0.001980 0.745803 1.519243
Or, if you need more control over the random values being generated, you can use something like
import numpy as np
import pandas as pd
np.random.seed(1)
rows,cols = 5,3
data = np.random.rand(rows,cols) # You can use other random functions to generate values with constraints
tidx = pd.date_range('2019-01-01', periods=rows, freq='MS') # freq='MS'set the frequency of date in months and start from day 1. You can use 'T' for minutes and so on
data_frame = pd.DataFrame(data, columns=['a','b','c'], index=tidx)
print data_frame
>>>
a b c
2019-01-01 0.992856 0.217750 0.538663
2019-02-01 0.189226 0.847022 0.156730
2019-03-01 0.572417 0.722094 0.868219
2019-04-01 0.023791 0.653147 0.857148
2019-05-01 0.729236 0.076817 0.743955
Use numpy.random.rand
or numpy.random.randint
functions with DataFrame
constructor:
np.random.seed(2019)
N = 10
rng = pd.date_range('2019-01-01', freq='MS', periods=N)
df = pd.DataFrame(np.random.rand(N, 3), columns=['temp','depth','acceleration'], index=rng)
print (df)
temp depth acceleration
2019-01-01 0.903482 0.393081 0.623970
2019-02-01 0.637877 0.880499 0.299172
2019-03-01 0.702198 0.903206 0.881382
2019-04-01 0.405750 0.452447 0.267070
2019-05-01 0.162865 0.889215 0.148476
2019-06-01 0.984723 0.032361 0.515351
2019-07-01 0.201129 0.886011 0.513620
2019-08-01 0.578302 0.299283 0.837197
2019-09-01 0.526650 0.104844 0.278129
2019-10-01 0.046595 0.509076 0.472426
If need integers:
np.random.seed(2019)
N = 10
rng = pd.date_range('2019-01-01', freq='MS', periods=N)
df = pd.DataFrame(np.random.randint(20, size=(10, 3)),
columns=['temp','depth','acceleration'],
index=rng)
print (df)
temp depth acceleration
2019-01-01 8 18 5
2019-02-01 15 12 10
2019-03-01 16 16 7
2019-04-01 5 19 12
2019-05-01 16 18 5
2019-06-01 16 15 1
2019-07-01 14 12 10
2019-08-01 0 11 18
2019-09-01 15 19 1
2019-10-01 3 16 18
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With