Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get last 10 lines of very large text file > 10GB

What is the most efficient way to display the last 10 lines of a very large text file (this particular file is over 10GB). I was thinking of just writing a simple C# app but I'm not sure how to do this effectively.

like image 962
Chris Conway Avatar asked Dec 29 '08 19:12

Chris Conway


1 Answers

Read to the end of the file, then seek backwards until you find ten newlines, and then read forward to the end taking into consideration various encodings. Be sure to handle cases where the number of lines in the file is less than ten. Below is an implementation (in C# as you tagged this), generalized to find the last numberOfTokens in the file located at path encoded in encoding where the token separator is represented by tokenSeparator; the result is returned as a string (this could be improved by returning an IEnumerable<string> that enumerates the tokens).

public static string ReadEndTokens(string path, Int64 numberOfTokens, Encoding encoding, string tokenSeparator) {      int sizeOfChar = encoding.GetByteCount("\n");     byte[] buffer = encoding.GetBytes(tokenSeparator);       using (FileStream fs = new FileStream(path, FileMode.Open)) {         Int64 tokenCount = 0;         Int64 endPosition = fs.Length / sizeOfChar;          for (Int64 position = sizeOfChar; position < endPosition; position += sizeOfChar) {             fs.Seek(-position, SeekOrigin.End);             fs.Read(buffer, 0, buffer.Length);              if (encoding.GetString(buffer) == tokenSeparator) {                 tokenCount++;                 if (tokenCount == numberOfTokens) {                     byte[] returnBuffer = new byte[fs.Length - fs.Position];                     fs.Read(returnBuffer, 0, returnBuffer.Length);                     return encoding.GetString(returnBuffer);                 }             }         }          // handle case where number of tokens in file is less than numberOfTokens         fs.Seek(0, SeekOrigin.Begin);         buffer = new byte[fs.Length];         fs.Read(buffer, 0, buffer.Length);         return encoding.GetString(buffer);     } } 
like image 82
jason Avatar answered Sep 27 '22 20:09

jason