Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to read and parse a large text file over the network?

I have a problem which requires me to parse several log files from a remote machine. There are a few complications: 1) The file may be in use 2) The files can be quite large (100mb+) 3) Each entry may be multi-line

To solve the in-use issue, I need to copy it first. I'm currently copying it directly from the remote machine to the local machine, and parsing it there. That leads to issue 2. Since the files are quite large copying it locally can take quite a while.

To enhance parsing time, I'd like to make the parser multi-threaded, but that makes dealing with multi-lined entries a bit trickier.

The two main issues are: 1) How do i speed up the file transfer (Compression?, Is transferring locally even neccessary?, Can I read an in use file some other way?) 2) How do i deal with multi-line entries when splitting up the lines among threads?

UPDATE: The reason I didnt do the obvious parse on the server reason is that I want to have as little cpu impact as possible. I don't want to affect the performance of the system im testing.

like image 491
midas06 Avatar asked Sep 26 '08 00:09

midas06


1 Answers

If you are reading a sequential file you want to read it in line by line over the network. You need a transfer method capable of streaming. You'll need to review your IO streaming technology to figure this out.

Large IO operations like this won't benefit much by multithreading since you can probably process the items as fast as you can read them over the network.

Your other great option is to put the log parser on the server, and download the results.

like image 102
Wesley Tarle Avatar answered Sep 18 '22 16:09

Wesley Tarle