Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dealing with readLines() function in R

Tags:

r

readlines

I'm experiencing a very hard time with R lately.

I'm not an expert user but I'm trying to use R to read a plain text (.txt) file and capture each line of it. After that, I want to deal with those lines and make some breaks and changes in the text.

Here is the code I'm using:

fileName <- "C:/MyFolder/TEXT_TO_BE_PROCESSED.txt"
con <- file(fileName,open="r")
line <- readLines(con)
close(con)

It reads the text and the line breaks perfectly. But I don't understand how the created object line works.

The object line created with this code has the class: character and the length [57]. If I type line[1] it shows exactly the text of the first line. But if I type

length(line[1])

it returns me [1].

I would like to know how can I transform this string of length == 1 that contains 518 in fact into a string of length == 518.

Does anyone know what I'm doing wrong?

I don't need to necessarily use the readLines() function. I've did some research and also found the function scan(), but I ended with the same situation of a immutable string of 518 characters but length == 1.

Hope I've been clear enough about my doubt. Sorry for the bad English.

like image 937
user3521631 Avatar asked Apr 11 '14 00:04

user3521631


People also ask

What is use of readLines () in R?

readLines() function in R Language reads text lines from an input file. The readLines() function is perfect for text files since it reads the text line by line and creates character objects for each of the lines.

What is the purpose of the readLines () function?

readlines() is used to read all the lines at a single go and then return them as each line a string element in a list. This function can be used for small files, as it reads the whole file content to the memory, then split it into separate lines.

What is the difference between readline () and readLines () function?

What is Python readline()? Python readline() method will return a line from the file when called. readlines() method will return all the lines in a file in the format of a list where each element is a line in the file.

How does readline () know where each line is?

Readline uses operating system calls under the hood. The file object corresponds to a file descriptor in the OS, and it has a pointer that keeps track of where in the file we are at the moment. The next read will return the next chunk of data from the file from that point on.


1 Answers

Suppose txt is the text from line 1 of your data that you read in with readLines.
Then if you want to split it into separate strings, each of which is a word, then you can use strsplit, splitting at the space between each word.

> txt <- paste0(letters[1:10], LETTERS[1:10], collapse = " ")
> txt
## [1] "aA bB cC dD eE fF gG hH iI jJ"   ## character vector of length 1
> length(txt)
[1] 1
> newTxt <- unlist(strsplit(txt, split = "\\s"))  ## split the string at the spaces
> newTxt
## [1] "aA" "bB" "cC" "dD" "eE" "fF" "gG" "hH" "iI" "jJ"
## now the text is a character vector of length 10  
> length(newTxt)
[1] 10
like image 83
Rich Scriven Avatar answered Oct 18 '22 15:10

Rich Scriven