I am looking for large text files for testing the compression and decompression in all sizes from 1kb to 100mb. Can someone please refer me to download it from some link ?
To be able to open such large CSV files, you need to download and use a third-party application. If all you want is to view such files, then Large Text File Viewer is the best choice for you. For actually editing them, you can try a feature-rich text editor like Emacs, or go for a premium tool like CSV Explorer.
Solution 1: Download a Dedicated Large File Viewer If all you need to do is read the large file, you can download a dedicated large file viewer such as the Large Text File Viewer. Such tools will open large text files with ease.
Over 108 Billion Lines.
And don't forget the collection of Corpus
The Canterbury Corpus
The Artificial Corpus
The Large Corpus
The Miscellaneous Corpus
The Calgary Corpus
The Canterbury Corpus
SEE: https://corpus.canterbury.ac.nz/descriptions/
there is a download links for the files available for each set
You can download enwik8 and enwik9 from here. They are respectively 100,000,000 and 1,000,000,000 bytes of text for compression benchmarks. You can always pull subsets of those for smaller tests.
*** Linux users only ***
Arbitrarily large text files can be generated on Linux with the following command:
tr -dc "A-Za-z 0-9" < /dev/urandom | fold -w100|head -n 100000 > bigfile.txt
This command will generate a text file that will contain 100,000 lines of random text and look like this:
NsQlhbisDW5JVlLSaZVtCLSUUrkBijbkc5f9gFFscDkoGnN0J6GgIFqdCLyhbdWLHxRVY8IwDCrWF555JeY0yD0GtgH21NotZAEe
iWJR1A4 bxqq9VKKAzMJ0tW7TCOqNtMzVtPB6NrtCIg8NSmhrO7QjNcOzi4N b VGc0HB5HMNXdyEoWroU464ChM5R Lqdsm3iPo
1mz0cPKqobhjDYkvRs5LZO8n92GxEKGeCtt oX53Qu6T7O2E9nJLKoUeJI6Ul7keLsNGI2BC55qs7fhqW8eFDsGsLPaImF7kFJiz
...
...
On my Ubuntu 18 its size it about 10MB. Bumping up the number of lines, and thereby bumping up the size, is easy. Just increase the head -n 100000
part. So, say, this command:
tr -dc "A-Za-z 0-9" < /dev/urandom | fold -w100|head -n 1000000 > bigfile.txt
will generate a file with 1,000,000 of random lines of text and be around 100MB. On my commodity hardware the latter command takes about 3 seconds to finish.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With