Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate data by using existing dataset as the base dataset

I have a dataset consisting of 100k unique data records, to benchmark the code, I need to test on data with 5 million unique records, I don't want to generate random data. I would like to use the 100k data records which I have as the base dataset and generate the remaining data similar to it with unique values for certain columns, How can I do that using python or Scala ?

Here's the sample data

latitude   longitude  step count
25.696395   -80.297496  1   1
25.699544   -80.297055  1   1
25.698612   -80.292015  1   1
25.939942   -80.341607  1   1
25.939221   -80.349899  1   1
25.944992   -80.346589  1   1
27.938951   -82.492018  1   1
27.944691   -82.48961   1   3
28.355484   -81.55574   1   1

Each pair of latitude and longitude should be unique across the data generated, I should be able to set min and max values for these columns as well

like image 391
namrutha Avatar asked Oct 17 '25 13:10

namrutha


1 Answers

You can generate data conforming to normal distribution easily using R, you can follow the following steps

#Read the data into a dataframe
library(data.table)
data = data = fread("data.csv", sep=",", select = c("latitude", "longitude"))

#Remove duplicate and null values
df = data.frame("Lat"=data$"latitude", "Lon"=data$"longitude")
df1 = unique(df[1:2])
df2  <- na.omit(df1)

#Determine the mean and standard deviation of latitude and longitude values
meanLat = mean(df2$Lat)
meanLon = mean(df2$Lon)
sdLat = sd(df2$Lat)
sdLon = sd(df2$Lon)

#Use Normal distribution to generate new data of 1 million records

newData = list()
newData$Lat = sapply(rep(0, 1000000), function(x) (sum(runif(12))-6) * sdLat + meanLat)
newData$Lon = sapply(rep(0, 1000000), function(x) (sum(runif(12))-6) * sdLon + meanLon)

finalData = rbind(df2,newData)

now final data contains both old records and new records

Write the finalData dataframe to a CSV file and you can read it from Scala or python

like image 178
arjunsv3691 Avatar answered Oct 20 '25 03:10

arjunsv3691



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!