Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate random timeseries data with dates

I am trying to generate random data(integers) with dates so that I can practice pandas data analytics commands on it and plot time series graphs.

             temp     depth   acceleration
2019-01-1 -0.218062 -1.215978 -1.674843
2019-02-1 -0.465085 -0.188715  0.241956
2019-03-1 -1.464794 -1.354594  0.635196
2019-04-1  0.103813  0.194349 -0.450041
2019-05-1  0.437921  0.073829  1.346550

Is there any random dataframe generator that can generate something like this with each date having a gap of one month?

like image 387
NodeSlayer Avatar asked May 26 '19 05:05

NodeSlayer


People also ask

How do you generate time series data in pandas?

Time series data is quickly generated in Pandas with the ‘date_range’ function. Below is an example of generating a dataframe with one random value each day for the year 2019. While the data here is usable for time series models, no patterns are visible.

How can I generate sample time-series data?

Again, for generating sample time-series data, you'd be hard-pressed to find an easier method! In this example, we generate 12 timestamps an hour apart, a random value representing CPU usage, and then a second series of four values that represent IDs for fake devices. This should produce 48 rows (eg. 12 timestamps x 4 device IDs = 48 rows).

How to generate random time and display in Excel?

The TEXT function is a handy function to generate random time and display. We combined it with the RAND function to get the random time only. After entering the formula you will notice that cell B5 now has a random time. After this, drag the Fill Handle icon in the corner of cell B5 and drag it to cell B10.

How do you test time series models with random data?

When creating and testing time-series models, it is beneficial to test your models on random data as a baseline. Random walks can simulate trends for stocks, capacity utilization rate, and even particle motion. Through the adjustments of each step probability, behavior is added to the random walks.


2 Answers

You can either use pandas.util.testing

import pandas.util.testing as testing
import numpy as np
np.random.seed(1)

testing.N, testing.K = 5, 3  # Setting the rows and columns of the desired data

print testing.makeTimeDataFrame(freq='MS')
>>>
                   A         B         C
2000-01-01 -0.488392  0.429949 -0.723245
2000-02-01  1.247192 -0.513568 -0.512677
2000-03-01  0.293828  0.284909  1.190453
2000-04-01 -0.326079 -1.274735 -0.008266
2000-05-01 -0.001980  0.745803  1.519243

Or, if you need more control over the random values being generated, you can use something like

import numpy as np
import pandas as pd
np.random.seed(1)

rows,cols = 5,3
data = np.random.rand(rows,cols) # You can use other random functions to generate values with constraints
tidx = pd.date_range('2019-01-01', periods=rows, freq='MS') # freq='MS'set the frequency of date in months and start from day 1. You can use 'T' for minutes and so on
data_frame = pd.DataFrame(data, columns=['a','b','c'], index=tidx)
print data_frame
>>>
                   a         b         c
2019-01-01  0.992856  0.217750  0.538663
2019-02-01  0.189226  0.847022  0.156730
2019-03-01  0.572417  0.722094  0.868219
2019-04-01  0.023791  0.653147  0.857148
2019-05-01  0.729236  0.076817  0.743955
like image 167
DataCruncher Avatar answered Oct 21 '22 14:10

DataCruncher


Use numpy.random.rand or numpy.random.randint functions with DataFrame constructor:

np.random.seed(2019)
N = 10
rng = pd.date_range('2019-01-01', freq='MS', periods=N)
df = pd.DataFrame(np.random.rand(N, 3), columns=['temp','depth','acceleration'], index=rng)

print (df)
                temp     depth  acceleration
2019-01-01  0.903482  0.393081      0.623970
2019-02-01  0.637877  0.880499      0.299172
2019-03-01  0.702198  0.903206      0.881382
2019-04-01  0.405750  0.452447      0.267070
2019-05-01  0.162865  0.889215      0.148476
2019-06-01  0.984723  0.032361      0.515351
2019-07-01  0.201129  0.886011      0.513620
2019-08-01  0.578302  0.299283      0.837197
2019-09-01  0.526650  0.104844      0.278129
2019-10-01  0.046595  0.509076      0.472426

If need integers:

np.random.seed(2019)
N = 10
rng = pd.date_range('2019-01-01', freq='MS', periods=N)
df = pd.DataFrame(np.random.randint(20, size=(10, 3)), 
                  columns=['temp','depth','acceleration'], 
                  index=rng)

print (df)
            temp  depth  acceleration
2019-01-01     8     18             5
2019-02-01    15     12            10
2019-03-01    16     16             7
2019-04-01     5     19            12
2019-05-01    16     18             5
2019-06-01    16     15             1
2019-07-01    14     12            10
2019-08-01     0     11            18
2019-09-01    15     19             1
2019-10-01     3     16            18
like image 39
jezrael Avatar answered Oct 21 '22 13:10

jezrael