Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read a large file into string lines OCaml

I am basically trying to read a large file (around 10G) into a list of lines. The file contains a sequence of integer, something like this:

0x123456
0x123123
0x123123
..... 

I used the method below to read files by default for my codebase, but it turns out to be quit slow (~12 minutes) at this scenario

let lines_from_file (filename : string) : string list =                                                                                                                                                                                                                                                                                                                       
    let lines = ref [] in                                                                                                                                                                               
 let chan = open_in filename in                                                                                                                                                                      
  try                                                                                                                                                                                                 
      while true; do                                                                                                                                                                                    
       lines := input_line chan :: !lines                                                                                                                                                              
     done; []                                                                                                                                                                                          
  with End_of_file ->                                                                                                                                                                                 
     close_in chan;                                                                                                                                                                                    
     List.rev !lines;;        

I guess I need to read the file into memory, and then split them into lines (I am using a 128G server, so it should be fine for the memory space). But I still didn't understand whether OCaml provides such facility after searching the documents here.

So here is my question:

  1. Given my situation, how to read files into string list in a fast way?

  2. How about using stream? But I need to adjust related application code, then that could cause some time.

like image 382
lllllllllllll Avatar asked Aug 18 '15 16:08

lllllllllllll


2 Answers

First of all you should consider whether you really need to have all the information at once in your memory. Maybe it is better to process file line-by-line?

If you really want to have it all at once in memory, then you can use Bigarray's map_file function to map a file as an array of characters. And then do something with it.

Also, as I see, this file contains numbers. Maybe it is better to allocate the array (or even better a bigarray) and the process each line in order and store integers in the (big)array.

like image 59
ivg Avatar answered Nov 10 '22 03:11

ivg


I often use the two following function to read the lines of a file. Note that the function lines_from_files is tail-recursive.

let read_line i = try Some (input_line i) with End_of_file -> None 

let lines_from_files filename = 
  let rec lines_from_files_aux i acc = match (read_line i) with 
    | None -> List.rev acc
    | Some s -> lines_from_files_aux i (s :: acc) in 
  lines_from_files_aux (open_in filename) [] 

let () = 
  lines_from_files "foo"
  |> List.iter (Printf.printf "lines = %s\n")
like image 24
alifirat Avatar answered Nov 10 '22 01:11

alifirat