Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skip x last lines when reading a text file

Tags:

java

io

I read text data from big file line by line.
But I need to read just n-x lines(don't read last x lines) .

How can I do it without reading whole file more than 1 time?
(I read line and immediately process it, so i can't go back)

like image 456
markiz Avatar asked Dec 09 '22 04:12

markiz


1 Answers

In this post I'll provide you with two completely different approaches to solving your problem, and depending on your use case one of the solutions will fit better than the other.

Alternative #1

This method is memory efficient though quite complex, if you are going to skip a lot of contents this method is recommended since you only will store one line at a time in memory during processing.

The implementation of it in this post might not be super optimized, but the theory behind it stands clear.

You will start by reading the file backwards, searching for N number of line breaks. When you've successfully located where in the file you'd like to stop your processing later on you will jump back to the beginning of the file.

Alternative #2

This method is easy to comprehend and is very straight forward. During execution you will have N number of lines stored in memory, where N is the number of lines you'd like to skip in the end.

The lines will be stored in a FIFO container (First In, First Out). You'll append the last read line to your FIFO and then remove and process the first entry. This way you will always process lines at least N entries away from the end of your file.



Alternative #1

This might sound odd but it's definitely doable and the way I'd recommend you to do it; start by reading the file backwards.

  1. Seek to the end of the file
  2. Read (and discard) bytes (towards the beginning of the file) until you've found SKIP_N line breaks
  3. Save this position
  4. Seek to the beginning of the file
  5. Read (and process) lines until you've come down to the position you've stored away

Example code:

The code below will strip off the last 42 lines from /tmp/sample_file and print the rest using the method described earlier in this post.

import java.io.RandomAccessFile;
import java.io.File;

import java.lang.Math;

public class Example {
  protected static final int SKIP_N = 42;

  public static void main (String[] args)
    throws Exception
  {
    File fileHandle            = new File ("/tmp/sample_file");
    RandomAccessFile rafHandle = new RandomAccessFile (fileHandle, "r");
    String s1                  = new String ();

    long currentOffset = 0;
    long endOffset     = findEndOffset (SKIP_N, rafHandle);

    rafHandle.seek (0);

    while ((s1 = rafHandle.readLine ()) != null) {
      ;   currentOffset += s1.length () + 1; // (s1 + "\n").length
      if (currentOffset >= endOffset)
        break;

      System.out.println (s1);
    }
  }

  protected static long findEndOffset (int skipNLines, RandomAccessFile rafHandle)
    throws Exception
  {
    long currentOffset = rafHandle.length ();
    long endOffset     =  0;
    int  foundLines    =  0;

    byte [] buffer      = new byte[
      1024 > rafHandle.length () ? (int) rafHandle.length () : 1024
    ];

    while (foundLines < skipNLines && currentOffset != 0) {
      currentOffset = Math.max (currentOffset - buffer.length, 0);

      rafHandle.seek      (currentOffset);
      rafHandle.readFully (buffer);

      for (int i = buffer.length - 1; i > -1; --i) {
        if (buffer[i] == '\n') {
          ++foundLines;

          if (foundLines == skipNLines)
            endOffset = currentOffset + i - 1; // we want the end to be BEFORE the newline
        }
      }
    } 

    return endOffset;
  }
}


Alternative #2

  1. Read from your file line by line
  2. On every successfully read line, insert the line at the back of your LinkedList<String>
  3. If your LinkedList<String> contains more lines than you'd like to skip, remove the first entry and process it
  4. Repeat until there are no more lines to be read

Example code

import java.io.InputStreamReader;
import java.io.FileInputStream;
import java.io.DataInputStream;
import java.io.BufferedReader;

import java.util.LinkedList;

public class Example {
  protected static final int SKIP_N = 42; 

  public static void main (String[] args)
    throws Exception
  {
    String line;

    LinkedList<String> lli = new LinkedList<String> (); 

    FileInputStream   fis = new FileInputStream   ("/tmp/sample_file");
    DataInputStream   dis = new DataInputStream   (fis);
    InputStreamReader isr = new InputStreamReader (dis);
    BufferedReader    bre = new BufferedReader    (isr);

    while ((line = bre.readLine ()) != null) {
      lli.addLast (line);

      if (lli.size () > SKIP_N) {
        System.out.println (lli.removeFirst ());
      }   
    }   

    dis.close (); 
  }
}
like image 145
Filip Roséen - refp Avatar answered Dec 11 '22 18:12

Filip Roséen - refp