Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does the flush function do on read.table in R?

I recently started taking R lectures and I'm currently working on scanning files. On a worksheet, one of my questions is like:

Read the file Table6.txt, check out the file first. Notice that the information is repeated, we only want the first non-repeated ones. Make sure to create only characters not factors this time around. Lastly, we don’t want the comments.

The file is called Table6.Txt

I managed to write the code that read the table properly, but the answer sheet has an extra part inside the scan function that says flush=TRUE

My code was like:

df <- read.table("Table6.txt",skip = 1,header = TRUE,row.names = "Name",nrow
= 7,comment.char = "@",stringsAsFactors = FALSE)

And the answer sheet shows

df <- read.table("Table6.txt",skip = 1,header = TRUE,row.names = "Name",nrow
= 7,flush = TRUE,comment.char = "@",stringsAsFactors = FALSE)

What does the flush function do here? The outputs on both codes give the same dataframe.

df <- read.table("Table6.txt",skip = 1,header = TRUE,row.names = "Name",nrow
                  = 7,flush = TRUE,comment.char = "@",stringsAsFactors = FALSE)
 df
         Age Height Weight Sex
Alex      25    177     57   F
Lilly     31    163     69   F
Mark      23    190     83   M
Oliver    52    179     75   M
Martha    76    163     70   F
Lucas     49    183     83   M
Caroline  26    164     53   F
 df <- read.table("Table6.txt",skip = 1,header = TRUE,row.names = "Name",nrow
                  = 7,comment.char = "@",stringsAsFactors = FALSE)
 df
         Age Height Weight Sex
Alex      25    177     57   F
Lilly     31    163     69   F
Mark      23    190     83   M
Oliver    52    179     75   M
Martha    76    163     70   F
Lucas     49    183     83   M
Caroline  26    164     53   F
like image 299
Levy Avatar asked Nov 07 '22 11:11

Levy


1 Answers

I read the documentation at read.table and scan and this is what I understood in simple words. flush tries to make dataframe complete by ignoring extra characters if any.

For example, let's take the same data that you have shared

read.table(text = 'Age Height Weight Sex
          Alex      25    177     57   F
          Lilly     31    163     69   F
          Mark      23    190     83   M
          Oliver    52    179     75   M
          Martha    76    163     70   F
          Lucas     49    183     83   M
          Caroline  26    164     53   F', header = TRUE)

this works as expected and returns

#         Age Height Weight Sex
#Alex      25    177     57   F
#Lilly     31    163     69   F
#Mark      23    190     83   M
#Oliver    52    179     75   M
#Martha    76    163     70   F
#Lucas     49    183     83   M
#Caroline  26    164     53   F

Now let's add an extra character at the end.

read.table(text = "Age Height Weight Sex
          Alex      25    177     57   F
          Lilly     31    163     69   F
          Mark      23    190     83   M
          Oliver    52    179     75   M
          Martha    76    163     70   F
          Lucas     49    183     83   M
          Caroline  26    164     53   F A", header = TRUE)
                                         ^ #Notice this A

It gives an error

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 7 did not have 5 elements

which makes sense since last row has an additional character in it.

We can add fill = TRUE

read.table(text = "Age Height Weight Sex
          Alex      25    177     57   F
          Lilly     31    163     69   F
          Mark      23    190     83   M
          Oliver    52    179     75   M
          Martha    76    163     70   F
          Lucas     49    183     83   M
          Caroline  26    164     53   F A", header = TRUE, fill = TRUE)

#         Age Height Weight Sex
#Alex      25    177     57   F
#Lilly     31    163     69   F
#Mark      23    190     83   M
#Oliver    52    179     75   M
#Martha    76    163     70   F
#Lucas     49    183     83   M
#Caroline  26    164     53   F
#A         NA     NA     NA    

This adds an additional row at the end by filling NA's or empty characters based on type of the column.

Now if we add flush = TRUE

read.table(text = "Age Height Weight Sex
          Alex      25    177     57   F
          Lilly     31    163     69   F
          Mark      23    190     83   M
          Oliver    52    179     75   M
          Martha    76    163     70   F
          Lucas     49    183     83   M
          Caroline  26    164     53   F A", header = TRUE, flush = TRUE)

#         Age Height Weight Sex
#Alex      25    177     57   F
#Lilly     31    163     69   F
#Mark      23    190     83   M
#Oliver    52    179     75   M
#Martha    76    163     70   F
#Lucas     49    183     83   M
#Caroline  26    164     53   F

It ignores the additional "A" at the end, considers it as a comment and makes a complete dataframe.


In your case, this did not make any difference in the final output since your data was complete and did not have any incomplete information. You can consider this as one of safe programming practices to follow in case you are reading data whose structure you are not aware of.

Hope this clarified a bit.


As commented by @Christoph, here is an example to demonstrate the difference between comment.char and flush

read.table(text = 'Age Height Weight Sex
          Alex      25    177     57   F
          Lilly     31    163     69   F
          Mark      23    190     83   M
          Oliver    52    179     75   M
          Martha    76    163     70   F
        @ Lucas     49    183     83   M 
          Caroline  26    164     53   F @', header = TRUE,flush = TRUE)

#           Age Height Weight Sex
#Alex        25    177     57   F
#Lilly       31    163     69   F
#Mark        23    190     83   M
#Oliver      52    179     75   M
#Martha      76    163     70   F
#@        Lucas     49    183  83
#Caroline    26    164     53   F


read.table(text = 'Age Height Weight Sex
          Alex      25    177     57   F
          Lilly     31    163     69   F
          Mark      23    190     83   M
          Oliver    52    179     75   M
          Martha    76    163     70   F
        @ Lucas     49    183     83   M 
          Caroline  26    164     53   F @', header = TRUE,comment.char = '@')

#         Age Height Weight Sex
#Alex      25    177     57   F
#Lilly     31    163     69   F
#Mark      23    190     83   M
#Oliver    52    179     75   M
#Martha    76    163     70   F
#Caroline  26    164     53   F

With flush = TRUE @ present at the beginning of second last line is not ignored instead the last character (M) was ignored. However, with comment.char we can ignore the exact characters present at any part of the text.

like image 112
Ronak Shah Avatar answered Nov 14 '22 23:11

Ronak Shah