Is there a way to get the number of lines in a file without importing it?
So far this is what I am doing
myfiles <- list.files(pattern="*.dat") myfilesContent <- lapply(myfiles, read.delim, header=F, quote="\"") for (i in 1:length(myfiles)){ test[[i]] <- length(myfilesContent[[i]]$V1) }
but is too time consuming since each file is quite big.
The wc command is used to find the number of lines, characters, words, and bytes of a file. To find the number of lines using wc, we add the -l option. This will give us the total number of lines and the name of the file.
readLines() function in R Language reads text lines from an input file. The readLines() function is perfect for text files since it reads the text line by line and creates character objects for each of the lines. Syntax: readLines(path)
The readLines function reads text lines from an input file. The n. readLines function of the reader package provides additional functionalities for reading lines, such as skipping ahead in a file or ignoring comments and headers. The readline function interactively reads a line from the terminal.
Let’s start with the basic R syntax of these three functions and some definitions: The readLines function reads text lines from an input file. The n.readLines function of the reader package provides additional functionalities for reading lines, such as skipping ahead in a file or ignoring comments and headers.
Count Number Of Lines Using wc Command As wc stands for “ word count “, it is the most suitable and easy command that has the sole purpose of counting words, characters, or lines in a file. Let’s suppose you want to count the number of lines in a text file called distros.txt.
More than a text stream editor, you can also use sed for counting the number of lines in a file using the command: Here, '=' prints the current line number to standard output.
Print the both number of lines and the number of words using the echo command. Input file: cat demo.txt cat command is used to show the content of the file. The first line tells the system that bash will be used as an interpreter. The wc command is used to find out the number of lines and number of words.
You can count the number of newline characters (\n
, will also work for \r\n
on Windows) in a file. This will give you a correct answer iff:
read.csv
gives a warning if this doesn't hold)I'll suffice to read the file in parts. Below I set chunk (tmp buf) size of 65536 bytes:
f <- file("filename.csv", open="rb") nlines <- 0L while (length(chunk <- readBin(f, "raw", 65536)) > 0) { nlines <- nlines + sum(chunk == as.raw(10L)) } print(nlines) close(f)
Benchmarks on a ca. 512 MB ASCII text file, 12101000 text lines, Linux:
readBin
: ca. 2.4 s.
@luis_js's wc
-based solution: 0.1 s.
read.delim
: 39.6 s.
EDIT: reading a file line by line with readLines
(f <- file("/tmp/test.txt", open="r"); nlines <- 0L; while (length(l <- readLines(f, 128)) > 0) nlines <- nlines + length(l); close(f)
): 32.0 s.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With