Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading the last n lines from a huge text file

Tags:

file-io

windows

r

I've tried something like this

file_in <- file("myfile.log","r")
x <- readLines(file_in, n=-100)

but I'm still waiting...

Any help would be greatly appreciated

like image 216
George Dontas Avatar asked Apr 08 '11 13:04

George Dontas


People also ask

How would you read last n lines of a file?

To look at the last few lines of a file, use the tail command. tail works the same way as head: type tail and the filename to see the last 10 lines of that file, or type tail -number filename to see the last number lines of the file. Try using tail to look at the last five lines of your .

How do I read the last 10 lines of a file in Python?

As we know, Python provides multiple in-built features and modules for handling files. Let's discuss different ways to read last N lines of a file using Python. In this approach, the idea is to use a negative iterator with the readlines() function to read all the lines requested by the user from the end of file.

How do I print the last 10 lines of a file?

tail [OPTION]... [ Tail is a command which prints the last few number of lines (10 lines by default) of a certain file, then terminates. Example 1: By default “tail” prints the last 10 lines of a file, then exits.


4 Answers

I'd use scan for this, in case you know how many lines the log has :

scan("foo.txt",sep="\n",what="char(0)",skip=100)

If you have no clue how many you need to skip, you have no choice but to move towards either

  • reading in everything and taking the last n lines (in case that's feasible),
  • using scan("foo.txt",sep="\n",what=list(NULL)) to figure out how many records there are, or
  • using some algorithm to go through the file, keeping only the last n lines every time

The last option could look like :

ReadLastLines <- function(x,n,...){    
  con <- file(x)
  open(con)
  out <- scan(con,n,what="char(0)",sep="\n",quiet=TRUE,...)

  while(TRUE){
    tmp <- scan(con,1,what="char(0)",sep="\n",quiet=TRUE)
    if(length(tmp)==0) {close(con) ; break }
    out <- c(out[-1],tmp)
  }
  out
}

allowing :

ReadLastLines("foo.txt",100)

or

ReadLastLines("foo.txt",100,skip=1e+7)

in case you know you have more than 10 million lines. This can save on the reading time when you start having extremely big logs.


EDIT : In fact, I'd not even use R for this, given the size of your file. On Unix, you can use the tail command. There is a windows version for that as well, somewhere in a toolkit. I didn't try that out yet though.

like image 67
Joris Meys Avatar answered Oct 06 '22 01:10

Joris Meys


You could do this with read.table by specifying the skip parameter. If your lines are not to be parsed to variables, specify the separator to be '\n' as @Joris Meys pointed out below, and also set as.is=TRUE to get character vectors instead of factors.

Small example (skipping the first 2000 lines):

df <- read.table('foo.txt', sep='\n', as.is=TRUE, skip=2000)
like image 24
daroczig Avatar answered Oct 06 '22 03:10

daroczig


You can read last n lines by following method

Step 1 - Open your file as your wish df <- read.csv("hw1_data.csv")

Step 2 - Now use tail function to read n lines from last

tail(df, 2)

like image 32
Mahesh Saini Avatar answered Oct 06 '22 03:10

Mahesh Saini


As @JorisMeys already mentioned the unix command tail would be the easiest way to solve this problem. However I want to propose a seek based R solution that starts reading the file from the end of the file:

tailfile <- function(file, n) {
  bufferSize <- 1024L
  size <- file.info(file)$size

  if (size < bufferSize) {
    bufferSize <- size
  }

  pos <- size - bufferSize
  text <- character()
  k <- 0L

  f <- file(file, "rb")
  on.exit(close(f))

  while(TRUE) {
    seek(f, where=pos)
    chars <- readChar(f, nchars=bufferSize)
    k <- k + length(gregexpr(pattern="\\n", text=chars)[[1L]])
    text <- paste0(text, chars)

    if (k > n || pos == 0L) {
      break
    }

    pos <- max(pos-bufferSize, 0L)
  }

  tail(strsplit(text, "\\n")[[1L]], n)
}

tailfile(file, n=100)
like image 41
sgibb Avatar answered Oct 06 '22 02:10

sgibb