Where can I get, or how can I generate a large formatted collection of fake user data (names, email address, locations, etc.) that can be used for testing an application?
It can be clearly fake, this will be limited to the development server. But I'm sure anything would be better than what I could come up with.
In Informatics, dummy data is benign information that does not contain any useful data, but serves to reserve space where real data is nominally present. Dummy data can be used as a placeholder for both testing and operational purposes.
If a professional cannot provide clear examples of their experience with unstructured data, or mentions data science projects, but keeps their involvement very vague, then they are probably not a data scientist. If their specific role in or impact on a Big Data project is unclear, that is cause for concern.
You should call this "dummy data" or "test data." The terms artificial data and synthetic data are actually used for data that is constructed to be similar to real data but that manipulated such that the underlying real world items cannot be identified.
There are some tools built just for this. I've used http://www.generatedata.com/ before to generate data for MySql databases. RedGate has a nice tool to fill your SQL Server database with test data called SQL Data Generator. The RedGate tool costs about $300, but there is a free trial.
UPDATE: Faker.js is now available. It is a project built on node.js, and looks pretty comprehensive.
ANOTHER UPDATE: Mockaroo is great!
If you'd like an HTTP API of fake user data, check out Random User Generator
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With