I'm about to write some example applications and accompanying documents comparing ways of accessing information stored in relational databases. To demonstrate real-life requirements, I need to include a realistic dataset of hundreds of thousands of facts.
Is anyone aware of publicly available, free datasets of that magnitude, of datasets of human names with human-level variance, or hierarchical datasets of either large organizational hierarchies, or large hierarchical, categorized, product catalogues?
Please point me in the right direction, if you are.
Part 1, human names: http://timecenter.cs.aau.dk/software.htm
Part 2, hierarchical data: no answer yet
The IBM Information Management System (IMS) and the RDM Mobile are examples of a hierarchical database system with multiple hierarchies over the same data.
Computer Data Hierarchy: Bits, Characters, fields, records, files, database bigdata.
The wikipedia dump is pretty massive: obligatory wikipedia link.
Your own PC's directory tree is a large hierarchical structure with lots of facts. You probably have a few thousand "Facts" which are file names, modification dates, sizes, extra OS info, etc., etc.
If that's not large enough, find a server that you can login to. That will be larger.
Not large enough? Get a web crawler and start crawling a big web site. That can be as large as you have the patience to crawl.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With