I recently started taking R lectures and I'm currently working on scanning files. On a worksheet, one of my questions is like:
Read the file Table6.txt, check out the file first. Notice that the information is repeated, we only want the first non-repeated ones. Make sure to create only characters not factors this time around. Lastly, we don’t want the comments.
The file is called Table6.Txt
I managed to write the code that read the table properly, but the answer sheet has an extra part inside the scan function that says flush=TRUE
My code was like:
df <- read.table("Table6.txt",skip = 1,header = TRUE,row.names = "Name",nrow
= 7,comment.char = "@",stringsAsFactors = FALSE)
And the answer sheet shows
df <- read.table("Table6.txt",skip = 1,header = TRUE,row.names = "Name",nrow
= 7,flush = TRUE,comment.char = "@",stringsAsFactors = FALSE)
What does the flush function do here? The outputs on both codes give the same dataframe.
df <- read.table("Table6.txt",skip = 1,header = TRUE,row.names = "Name",nrow
= 7,flush = TRUE,comment.char = "@",stringsAsFactors = FALSE)
df
Age Height Weight Sex
Alex 25 177 57 F
Lilly 31 163 69 F
Mark 23 190 83 M
Oliver 52 179 75 M
Martha 76 163 70 F
Lucas 49 183 83 M
Caroline 26 164 53 F
df <- read.table("Table6.txt",skip = 1,header = TRUE,row.names = "Name",nrow
= 7,comment.char = "@",stringsAsFactors = FALSE)
df
Age Height Weight Sex
Alex 25 177 57 F
Lilly 31 163 69 F
Mark 23 190 83 M
Oliver 52 179 75 M
Martha 76 163 70 F
Lucas 49 183 83 M
Caroline 26 164 53 F
I read the documentation at read.table
and scan
and this is what I understood in simple words. flush
tries to make dataframe complete by ignoring extra characters if any.
For example, let's take the same data that you have shared
read.table(text = 'Age Height Weight Sex
Alex 25 177 57 F
Lilly 31 163 69 F
Mark 23 190 83 M
Oliver 52 179 75 M
Martha 76 163 70 F
Lucas 49 183 83 M
Caroline 26 164 53 F', header = TRUE)
this works as expected and returns
# Age Height Weight Sex
#Alex 25 177 57 F
#Lilly 31 163 69 F
#Mark 23 190 83 M
#Oliver 52 179 75 M
#Martha 76 163 70 F
#Lucas 49 183 83 M
#Caroline 26 164 53 F
Now let's add an extra character at the end.
read.table(text = "Age Height Weight Sex
Alex 25 177 57 F
Lilly 31 163 69 F
Mark 23 190 83 M
Oliver 52 179 75 M
Martha 76 163 70 F
Lucas 49 183 83 M
Caroline 26 164 53 F A", header = TRUE)
^ #Notice this A
It gives an error
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 7 did not have 5 elements
which makes sense since last row has an additional character in it.
We can add fill = TRUE
read.table(text = "Age Height Weight Sex
Alex 25 177 57 F
Lilly 31 163 69 F
Mark 23 190 83 M
Oliver 52 179 75 M
Martha 76 163 70 F
Lucas 49 183 83 M
Caroline 26 164 53 F A", header = TRUE, fill = TRUE)
# Age Height Weight Sex
#Alex 25 177 57 F
#Lilly 31 163 69 F
#Mark 23 190 83 M
#Oliver 52 179 75 M
#Martha 76 163 70 F
#Lucas 49 183 83 M
#Caroline 26 164 53 F
#A NA NA NA
This adds an additional row at the end by filling NA
's or empty characters based on type of the column.
Now if we add flush = TRUE
read.table(text = "Age Height Weight Sex
Alex 25 177 57 F
Lilly 31 163 69 F
Mark 23 190 83 M
Oliver 52 179 75 M
Martha 76 163 70 F
Lucas 49 183 83 M
Caroline 26 164 53 F A", header = TRUE, flush = TRUE)
# Age Height Weight Sex
#Alex 25 177 57 F
#Lilly 31 163 69 F
#Mark 23 190 83 M
#Oliver 52 179 75 M
#Martha 76 163 70 F
#Lucas 49 183 83 M
#Caroline 26 164 53 F
It ignores the additional "A"
at the end, considers it as a comment and makes a complete dataframe.
In your case, this did not make any difference in the final output since your data was complete and did not have any incomplete information. You can consider this as one of safe programming practices to follow in case you are reading data whose structure you are not aware of.
Hope this clarified a bit.
As commented by @Christoph, here is an example to demonstrate the difference between comment.char
and flush
read.table(text = 'Age Height Weight Sex
Alex 25 177 57 F
Lilly 31 163 69 F
Mark 23 190 83 M
Oliver 52 179 75 M
Martha 76 163 70 F
@ Lucas 49 183 83 M
Caroline 26 164 53 F @', header = TRUE,flush = TRUE)
# Age Height Weight Sex
#Alex 25 177 57 F
#Lilly 31 163 69 F
#Mark 23 190 83 M
#Oliver 52 179 75 M
#Martha 76 163 70 F
#@ Lucas 49 183 83
#Caroline 26 164 53 F
read.table(text = 'Age Height Weight Sex
Alex 25 177 57 F
Lilly 31 163 69 F
Mark 23 190 83 M
Oliver 52 179 75 M
Martha 76 163 70 F
@ Lucas 49 183 83 M
Caroline 26 164 53 F @', header = TRUE,comment.char = '@')
# Age Height Weight Sex
#Alex 25 177 57 F
#Lilly 31 163 69 F
#Mark 23 190 83 M
#Oliver 52 179 75 M
#Martha 76 163 70 F
#Caroline 26 164 53 F
With flush = TRUE
@
present at the beginning of second last line is not ignored instead the last character (M
) was ignored. However, with comment.char
we can ignore the exact characters present at any part of the text.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With