Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the number of lines in a text file using R

Tags:

file

r

text-files

Is there a way to get the number of lines in a file without importing it?

So far this is what I am doing

myfiles <- list.files(pattern="*.dat") myfilesContent <- lapply(myfiles, read.delim, header=F, quote="\"") for (i in 1:length(myfiles)){   test[[i]] <- length(myfilesContent[[i]]$V1) } 

but is too time consuming since each file is quite big.

like image 893
user3036416 Avatar asked May 04 '14 12:05

user3036416


People also ask

How do I find the number of lines in a file?

The wc command is used to find the number of lines, characters, words, and bytes of a file. To find the number of lines using wc, we add the -l option. This will give us the total number of lines and the name of the file.

What is the readline function in R?

readLines() function in R Language reads text lines from an input file. The readLines() function is perfect for text files since it reads the text line by line and creates character objects for each of the lines. Syntax: readLines(path)

How does the readLines function work?

The readLines function reads text lines from an input file. The n. readLines function of the reader package provides additional functionalities for reading lines, such as skipping ahead in a file or ignoring comments and headers. The readline function interactively reads a line from the terminal.

How do you read lines in R?

Let’s start with the basic R syntax of these three functions and some definitions: The readLines function reads text lines from an input file. The n.readLines function of the reader package provides additional functionalities for reading lines, such as skipping ahead in a file or ignoring comments and headers.

How do I Count the number of lines in a file?

Count Number Of Lines Using wc Command As wc stands for “ word count “, it is the most suitable and easy command that has the sole purpose of counting words, characters, or lines in a file. Let’s suppose you want to count the number of lines in a text file called distros.txt.

How to count the number of lines in a file using SED?

More than a text stream editor, you can also use sed for counting the number of lines in a file using the command: Here, '=' prints the current line number to standard output.

How do you print the number of words in a text file?

Print the both number of lines and the number of words using the echo command. Input file: cat demo.txt cat command is used to show the content of the file. The first line tells the system that bash will be used as an interpreter. The wc command is used to find out the number of lines and number of words.


1 Answers

You can count the number of newline characters (\n, will also work for \r\n on Windows) in a file. This will give you a correct answer iff:

  1. There is a newline char at the end of last line (BTW, read.csv gives a warning if this doesn't hold)
  2. The table does not contain a newline character in the data (e.g. within quotes)

I'll suffice to read the file in parts. Below I set chunk (tmp buf) size of 65536 bytes:

f <- file("filename.csv", open="rb") nlines <- 0L while (length(chunk <- readBin(f, "raw", 65536)) > 0) {    nlines <- nlines + sum(chunk == as.raw(10L)) } print(nlines) close(f) 

Benchmarks on a ca. 512 MB ASCII text file, 12101000 text lines, Linux:

  • readBin: ca. 2.4 s.

  • @luis_js's wc-based solution: 0.1 s.

  • read.delim: 39.6 s.

  • EDIT: reading a file line by line with readLines (f <- file("/tmp/test.txt", open="r"); nlines <- 0L; while (length(l <- readLines(f, 128)) > 0) nlines <- nlines + length(l); close(f)): 32.0 s.

like image 140
gagolews Avatar answered Sep 17 '22 15:09

gagolews