Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it a good idea to store a lot of text that is acquired periodically to a cache before saving to a file?

So I'm trying to write messages from users on a messaging network to a file. I'm trying to build this program with good java practices and appropriate file IO technique.

Currently my program recognizes someone has posted a message, takes the message and immediately writes it to a file. Creating the file object, creating the writer object, appends the message, then closes the file.This seems like good practice if there aren't many messages coming in, but if there is a rapid stream of conversation this seems slow and requires a lot of unnecessary actions because the file is going to opened again immediately.

Then I thought what if I just left the file open and just wrote the messages as they came to the file, then closed it periodically. Is that good practice? Leaving a file open for extended periods of time? For instance after an hour or after some amount of data has been written?

Now, I'm thinking I should take the messages, store them in a "cache"(like a string array), then save the string array to a file when the "cache" is full. Is this better practice?

So I have two questions:

1) Is it good practice to leave a file open for an extended period of time( a few minutes to a few hours) if you aren't using the file?

2) What is good practice for a "Cache" like I'm talking about? Is a string array good? Is there something better I should use? How would you go about storing this information?

like image 429
Bnannerz Avatar asked Dec 20 '12 00:12

Bnannerz


3 Answers

In my opinion, best practice for logs (and similar) in server applications is to decide an acceptable time delay and stick to it. For example, if you set a 5 second delay, write code so that:

  • If you write something to the log, it will 'really' be written within 5 seconds.
  • If something else gets written before 5 seconds, it just gets added to the buffer (to be written when the time is up).

That way, you only do at maximum one disk write per 5 seconds, but it is definitely written. This compares well to the other approaches:

  • If you flush data to disk every time anything gets written, but load increases and there are, say, 10,000 events per second, then you'll be wasting I/O time with 10,000 disk writes per second.
  • If you leave it to Java/the OS to decide when to flush data, but load is very low (e.g. in middle of the night), the log could even be hours out of date. (If there is one event, not big enough to fill the buffer, then nothing for hours.)

I haven't looked at the APIs recently to see if there is a built-in way to do this strategy but it is easy to code. By the way, there is no need to manually cache output; you can just use a BufferedOutputStream, and call the flush() object whenever you want to write it to disk. (That way it'll also write automatically when it hits the buffer limit, but that's probably OK if you pick the limit sensibly.)

Regarding leaving a file open, you can leave files open as long as you like (just close it when you are not going to write to it any more). Assuming you don't have thousands of files open, and you don't need to have multiple applications writing to the same file, this doesn't cause any problems.

like image 106
sam Avatar answered Oct 13 '22 01:10

sam


It's absolutely fine to leave a file open for a long time. It's certainly much better than repeatedly opening and closing it. The amount of resource consumed by a single open file is negligible; your only concern would be if you had a lot of open files (hundreds or thousands). I would suggest you open the file when your program starts, and close it when it finishes.

If you use suitable tools to examine the open files held by your program, or other programs on your system, you will find that all of them hold some number of files (a few to dozens) open for their whole lifetimes - any files which contain the program's code (executables, shared libraries, and JAR files for Java programs), as these get opened and then memory-mapped, and often log files too. This is normal and safe.

Now, you will need to flush the stream (or writer, or RandomAccessFile, or whatever you use) during this time. You should do this whenever you need to ensure that all data written up to that point has been safely written to disk; that might be after every message, or after a given number of messages, amount of data, or period of time, as you see fit.

like image 21
Tom Anderson Avatar answered Oct 13 '22 01:10

Tom Anderson


1) Is it good practice to leave a file open for an extended period of time( a few minutes to a few hours) if you aren't using the file?

I think this depends on how many messages come to your program and each message size. If your memory can satisfy with your calculation you can think about it. But I'll think a bout to write on a databases when each message come (may be a blob). Also think about what happened if your program crash while you writing to file. You may lost whole messages stored on memory.

2) What is good practice for a "Cache" like I'm talking about? Is a string array good? Is there something better I should use? How would you go about storing this information?

If you are temporally stored data in memory array is ok when you know the size. Otherwise you could use ArrayList.

like image 38
someone Avatar answered Oct 13 '22 00:10

someone