Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java text file size (before file is closed)

I am collecting full HTML from a service that provides access to a very large collection of blogs and news websites. I am checking the HTML as it comes (in real-time) to see if it contains some keywords. If it contains one of the keywords, I am writing the HTML to a text file to store it.

I want to do this for a week. Therefore I am collecting a large amount of data. Testing the program for 3 minutes yielded a text file of 100MB. I have 4 TB of space, and I can't use more than this.

Also, I don't want the text files to become too large, because I assume they'll become un-openable.

What I am proposing is to open a text file, and write HTML to it, frequently checking its size. If it becomes bigger than, let's say 200MB, I close the text file and open another. I also need to keep a running log of how much space I've used in total, so that I can make sure that I don't get close to 4 TB.

The question I have at this point is how to check the size of the text file before the file has been closed (using FileWriter.close()). Is there a function for this or should I count the number of characters written to the file and use that to estimate the file size?

A separate question: are there ways of minimising the amount of space my text files take up? I am working in Java.

like image 209
Andrew Avatar asked Dec 03 '22 00:12

Andrew


1 Answers

Create a writer which counts the number of characters written and use that to wrap your OutputStreamWriter.

[EDIT] Note: The correct way to save text to a file is:

new BufferedWriter( new OutputStreamWriter( new FileOutputStream( file ), encoding ) ) );

The encoding is important; it's usually "UTF-8".

This chain gives you two places where you can inject your wrapper: You can wrap the writer to get the number of characters or the inner OutputStream to get bytes written.

like image 80
Aaron Digulla Avatar answered Apr 06 '23 15:04

Aaron Digulla