I'm experiencing a very hard time with R lately.
I'm not an expert user but I'm trying to use R to read a plain text (.txt
) file and capture each line of it. After that, I want to deal with those lines and make some breaks and changes in the text.
Here is the code I'm using:
fileName <- "C:/MyFolder/TEXT_TO_BE_PROCESSED.txt"
con <- file(fileName,open="r")
line <- readLines(con)
close(con)
It reads the text and the line breaks perfectly. But I don't understand how the created object line
works.
The object line
created with this code has the class: character
and the length [57]
.
If I type line[1]
it shows exactly the text of the first line. But if I type
length(line[1])
it returns me [1]
.
I would like to know how can I transform this string of length == 1
that contains 518 in fact into a string of length == 518
.
Does anyone know what I'm doing wrong?
I don't need to necessarily use the readLines()
function. I've did some research and also found the function scan()
, but I ended with the same situation of a immutable string of 518 characters but length == 1
.
Hope I've been clear enough about my doubt. Sorry for the bad English.
readLines() function in R Language reads text lines from an input file. The readLines() function is perfect for text files since it reads the text line by line and creates character objects for each of the lines.
readlines() is used to read all the lines at a single go and then return them as each line a string element in a list. This function can be used for small files, as it reads the whole file content to the memory, then split it into separate lines.
What is Python readline()? Python readline() method will return a line from the file when called. readlines() method will return all the lines in a file in the format of a list where each element is a line in the file.
Readline uses operating system calls under the hood. The file object corresponds to a file descriptor in the OS, and it has a pointer that keeps track of where in the file we are at the moment. The next read will return the next chunk of data from the file from that point on.
Suppose txt
is the text from line 1 of your data that you read in with readLines
.
Then if you want to split it into separate strings, each of which is a word, then you can use strsplit
, splitting at the space between each word.
> txt <- paste0(letters[1:10], LETTERS[1:10], collapse = " ")
> txt
## [1] "aA bB cC dD eE fF gG hH iI jJ" ## character vector of length 1
> length(txt)
[1] 1
> newTxt <- unlist(strsplit(txt, split = "\\s")) ## split the string at the spaces
> newTxt
## [1] "aA" "bB" "cC" "dD" "eE" "fF" "gG" "hH" "iI" "jJ"
## now the text is a character vector of length 10
> length(newTxt)
[1] 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With