I have used Python Faker for generating fake data. But I need to know what is the maximum number of distinct fake data (eg: fake names) can be generated using faker (eg: fake.name() ).
I have generated 100,000 fake names and I got less than 76,000 distinct names. I need to know the maximum limit so that I can know how much we can scale using this package for generating data.
I need to generate huge dataset. I also want to know is Php faker, perl faker are all same for different environments?
Other packages for generating huge dataset will be highly appreciated.
Use faker. Faker() to create and initialize a faker generator, which can generate data by accessing properties named after the type of data you want. from faker import Faker fake = Faker() fake.name() # 'Lucy Cechtelar' fake.
The Faker allows to generate random digits and integers. The example generates random digits and integers. We can specify the bounds in the random_int method.
*Faker* is a Python package that generates fake data for you. Whether. you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from. a production service, Faker is for you.
I had this same issue and looked more into it.
In the en_US
provider there about 1000 last names and 750 first names for about 750000 unique combos. If you randomly select a first and last name, there is a chance you'll get duplicates. But in reality, that's how the real world works, there are many John Smiths and Robert Doyles out there.
There are 7203 first names and 473 last names in the en
profile which can kind of help. Faker chooses the combo of first name and last name meaning there are about 7203 * 473 = 3407019.
But still, there is a chance you'll get duplicates.
I solve this problem by adding numbers to names.
I need to generate huge dataset.
Keep in mind that in reality, any huge dataset of names will have duplicates. I work with large datasets (> 1 million names) and we see a ton of duplicate first and last names.
If you read the faker package code, you can probably figure out how to modify it so you get all 3M distinct names.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With