Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JAVA : read and write a file together

I am trying to read a java file and modify it simultaneously. This is what I need to do : My file is of the format :

aaa
bbb
aaa
ccc
ddd
ddd

I need to read through the file and get the count of the # of occurrences and modify the duplicates to get the following file:

aaa -  2
bbb -  1
ccc -  1
ddd -  2

I tried using the RandomAccessFile to do this, but couldn't do it. Can somebody help me out with the code for this one?

like image 337
sharath Avatar asked Nov 22 '10 22:11

sharath


2 Answers

It's far easier if you don't do two things at the same time. The best way is to run through the entire file, count all the occurrences of each string in a hash and then write out all the results into another file. Then if you need to, move the new file over the old one.

You never want to read and write to the same file at the same time. Your offsets within the file will shift everytime you make a write and the read cursor will not keep track of that.

like image 59
J _ Avatar answered Nov 13 '22 17:11

J _


I'd do it this way: - Parse the original file and save all entries into a new file. Use fixed length data blocks to write entries to the new file (so, say your longest string is 10 bytes long, take 10 + x as block length, x is for the extra info you want to save along the entries. So the 10th entry in the file would be at byte position 10*(10+x)). You'd also have to know the number of entries to create the (so the file size would noOfEntries*blocklength, use a RandomAccesFile and setLength to set the this file length). - Now use quicksort algorithm to sort the entries in the file (my idea is to have a sorted file in the end which makes things far easier and faster finally. Hashing would theoretically work too, but you'd have to deal with rearranging duplicate entries then to have all duplicates grouped - not really a choice here). - Parse the file with the now sorted entries. Save a pointer to the entry of the first occurence of a entry. Increment the number of duplicates until there is a new entry. Change the first entry and add that additonal info you want to have there into a new "final result" file. Continue this way with all remaining entries in the sorted file.

Conclusions: I think this should be a reasonably fast and use reasonable amount of resources. However, it depends on the data you have. If you have a very large number of duplicates, quicksort performance will degrade. Also, if your longest data entry is way longer than the average, it will also waste file space.

like image 44
Benjamin Stadin Avatar answered Nov 13 '22 18:11

Benjamin Stadin