Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read line-by-line for big files

I'm trying to write reader for big files, based on iterations in Clojure. But how I can return line by line strings in Clojure? I want to make something like that:

(println (do_something(readFile (:file opts))) ; process and print first line
(println (do_something(readFile (:file opts))) ; process and print second line

Code:

(ns testapp.core
  (:gen-class)
  (:require [clojure.tools.cli :refer [cli]])
  (:require [clojure.java.io]))


(defn readFile [file, cnt]
  ; Iterate over opened file (read line by line)
  (with-open [rdr (clojure.java.io/reader file)]
    (let [seq (line-seq rdr)]
      ; how return only one line there? and after, when needed, take next line?
    )))

(defn -main [& args]
  ; Main function for project 
  (let [[opts args banner] 
        (cli args
          ["-h" "--help" "Print this help" :default false :flag true]
          ["-f" "--file" "REQUIRED: File with data"]
          ["-c" "--clusters" "Count of clusters" :default 3]
          ["-g" "--hamming" "Use Hamming algorithm"]
          ["-e" "--evklid" "Use Evklid algorithm"]
          )]
    ; Print help, when no typed args
    (when (:help opts)
      (println banner)
      (System/exit 0))
    ; Or process args and start work
    (if (and (:file opts) (or (:hamming opts) (:evklid opts)))
      (do
        ; Use Hamming algorithm
        (if (:hamming opts)
          (do
            (println (readFile (:file opts))
            (println (readFile (:file opts))
          )
          ;(count (readFile (:file opts)))
        ; Use Evklid algorithm
        (println "Evklid")))
      (println "Please, type path for file and algorithm!")))) 
like image 585
Relrin Avatar asked Sep 20 '14 12:09

Relrin


People also ask

How can I read large files efficiently?

BufferedReader is used to read the file line by line. Basically, BufferedReader() is used for the processing of large files. BufferedReader is very efficient for reading. Note: Specify the size of the BufferReader or keep that size as a Default size of BufferReader.

How do I read a 100 GB file in Python?

Reading Large Text Files in Python We can use the file object as an iterator. The iterator will return each line one by one, which can be processed. This will not read the whole file into memory and it's suitable to read large files in Python.

How do I read a file line by line?

Method 1: Read a File Line by Line using readlines() readlines() is used to read all the lines at a single go and then return them as each line a string element in a list. This function can be used for small files, as it reads the whole file content to the memory, then split it into separate lines.


1 Answers

May be i'm not understanding right what do you mean by "return line by line", but i'll suggest you to write function, which accepts file and processing function, then prints result of processing fuction for every line of your big file. Or, evem more general way, let's accept processing function and output function (println by default), so if we want not just print, but send it over network, save someplace, send to another thread, etc:

(defn process-file-by-lines
  "Process file reading it line-by-line"
  ([file]
   (process-file-by-lines file identity))
  ([file process-fn]
   (process-file-by-lines file process-fn println))
  ([file process-fn output-fn]
   (with-open [rdr (clojure.java.io/reader file)]
     (doseq [line (line-seq rdr)]
       (output-fn
         (process-fn line))))))

So

(process-file-by-lines "/tmp/tmp.txt") ;; Will just print file line by ine
(process-file-by-lines "/tmp/tmp.txt"
                       reverse) ;; Will print each line reversed
like image 172
coredump Avatar answered Sep 28 '22 16:09

coredump