Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java File Replace Lines

Tags:

java

I have a 250 GB big .txt file and i have just 50 GB space left on my harddrive. Every line in this .txt file has a long prefix and i want to delete this prefix to make that file smaller.

First i wanted to read line by line, change it and write it into another file.

// read line out of first file
line = line.replace(prefix, "");
// write line into second file

The Problem is i have not enough space for that.

So how can i delete all prefixes out out of my file?

like image 935
user1882812 Avatar asked Jan 15 '14 09:01

user1882812


People also ask

How do you overwrite a file in Java?

If you write a file in Java which is already present in the location, it will be overwritten automatically. Unless you are writing to that file with an append flag set to True. FileWriter fw = new FileWriter(filename,false); It will overwrite the file i.e. clear the file and write to it again.

How do you delete a specific line from a text file in Java?

Deleting a text line directly in a file is not possible. We have to read the file into memory, remove the text line and rewrite the edited content. Although, of course, it isn't necessary that the entire file fits into memory, as you can read and write in the same loop.

How do you edit a text file in Java?

1) open the file for input 2) read the file 3) close the file 4) change the data in the file 5) open the file for output 6) write the changed data to the file 7) close the file Any book on Java will have the basics of Input and Output.

How do you read a specific line from a text file in Java?

Java supports several file-reading features. One such utility is reading a specific line in a file. We can do this by simply providing the desired line number; the stream will read the text at that location. The Files class can be used to read the n t h nth nth line of a file.


1 Answers

Check RandomAccessFile: http://docs.oracle.com/javase/7/docs/api/java/io/RandomAccessFile.html

You have to keep track of the position you are reading from and the position you are writing to. Initially both are at the start. Then you read N bytes (one line), shorten it, seek back N bytes and write M bytes (the shortened line). Then you seek forward (N - M) bytes to get back to the position where next line starts. Then you do this over and over again. In the end truncate excess with setLength(long).

You can also do it in batches (like read 4kb, process, write, repeat) to make it more efficient.

The process is identical in all languages. Some make it easier by hiding the seeking back and forth behind an API.

Of course you have to be absolutely sure that your program works flawlessly, since there is no way to undo this process.

Also, the RandomAccessFile is a bit limited, since it can not tell you at which position the file is at a given moment. Therefore you have to do conversion between "decoded strings" and "encoded bytes" as you go. If your file is in UTF-8, a given character in the string can take one ore many bytes in the file. So you can't just do seek(string.length()). You have to use seek(string.getBytes(encoding).length) and factor in possible line break conversions (Windows uses two characters for line break, Unix uses only one). But if you have ASCII, ISO-Latin-1 or similar trivial character encoding and know what line break chars the file has, then the problem should be pretty simple.

And as I edit my answer to match all possible corner cases, I think it would be better to read the file using BufferedReader and correct character encoding and also open a RandomAccessFile for doing the writing. If your OS supports having a file being opened twice. This way you would get complete Unicode support from BufferedReader and yuou wouldn't have to keep track of read and write positions. You have to do the writing with RandomAccessFile because using a Writer to the file may just truncate it (haven't tried it, though).

Something like this. It works on trivial examples but it has no error checking and I absolutely give no guarantees. Test it on a smaller file first.

public static void main(String[] args) throws IOException {
    File f = new File(args[0]);
    BufferedReader reader = new BufferedReader(new InputStreamReader(
            new FileInputStream(f), "UTF-8")); // Use correct encoding here.
    RandomAccessFile writer = new RandomAccessFile(f, "rw");

    String line = null;
    long totalWritten = 0;
    while ((line = reader.readLine()) != null) {
        line = line.trim() + "\n"; // Remove your prefix here.

        byte[] b = line.getBytes("UTF-8");
        writer.write(b);
        totalWritten += b.length;
    }

    reader.close();

    writer.setLength(totalWritten);
    writer.close();
}
like image 125
Torben Avatar answered Oct 17 '22 16:10

Torben