tbl_df and data.frame difference when using loops

Tags:

I've been looping over values in a dplyr tbl_df, trying to print unique combinations of two columns. After much trial and error I've only been able to get exactly the desired output by converting the tbl_df back to a standard data.frame. I'm aware of the main differences between the two structures but I still cant understand the differing output I'm seeing with each.

For example, using this data

hospital <- rep(c("Hospital 1", "Hospital 2", "Hospital 3"), 3)
ward <- LETTERS[1:2]
hospitals <- data.frame(cbind(hospital, ward))
hospitals[order(hospitals$hospital, hospitals$ward), ]

#     hospital ward
# 1 Hospital 1    A
# 7 Hospital 1    A
# 4 Hospital 1    B
# 5 Hospital 2    A
# 2 Hospital 2    B
# 8 Hospital 2    B
# 3 Hospital 3    A
# 9 Hospital 3    A
# 6 Hospital 3    B

and the following loop

for(hosp in unique(hospitals$hospital)){
  for(wa in unique(hospitals[hospitals$hospital==hosp, "ward"])){
    print(paste(hosp, wa, sep=" "))
    }
  }

I can get my desired output

#[1] "Hospital 1 A"
#[1] "Hospital 1 B"
#[1] "Hospital 2 B"
#[1] "Hospital 2 A"
#[1] "Hospital 3 A"
#[1] "Hospital 3 B"

But using a tbl_df of the same data I get a different output

hospitals2 <- tbl_df(hospitals)

for(hosp in unique(hospitals2$hospital)){
  for(wa in unique(hospitals2[hospitals2$hospital==hosp, "ward"])){
    print(paste(hosp, wa, sep=" "))
    }
  }


#[1] "Hospital 1 A" "Hospital 1 B"
#[1] "Hospital 2 B" "Hospital 2 A"
#[1] "Hospital 3 A" "Hospital 3 B"

It's not just a printing difference, this appears to be three two-element vectors instead of six one-element vectors, and my subsequent code only works as expected when I run the loop on a normal dataframe.

Can anyone explain why I'm seeing these differences?

270

asked Mar 02 '15 13:03

peter_w

1 Answers

You can't do for loop on tbl_df with subsetting[. Documentation says it all :

[ Never simplifies (drops), so always returns data.frame.

You see that hospitals2[hospitals2$hospital==hosp, "ward"] returns data.frame

hospitals2[hospitals2$hospital==hosp, "ward"]
#Source: local data frame [3 x 1]

#  ward
#1    A
#2    B
#3    A

whereas

hospitals[hospitals$hospital==hosp, "ward"]
#[1] A B A
#Levels: A B

Use [[ to extract a column vector, for instance

for(hosp in unique(hospitals2$hospital)){
    for(wa in unique(hospitals[hospitals$hospital==hosp,][["ward"]])){
        print(paste(hosp, wa, sep=" "))
    }
} 
#[1] "Hospital 1 A"
#[1] "Hospital 1 B"
#[1] "Hospital 2 B"
#[1] "Hospital 2 A"
#[1] "Hospital 3 A"
#[1] "Hospital 3 B"

185

answered Oct 19 '22 20:10

Khashaa

Related questions
                            
                                Mean value between dates based on dates in another dataset using R
                            
                                adding geom_text from different dataset to geom_bar
                            
                                Error: isTRUE(gpclibPermitStatus()) is not TRUE when using fortify function, rgdal package
                            
                                how to read a csv with a timestamp field?
                            
                                Passing Argument to lm in R within Function
                            
                                Find the minimum distance between two data frames, for each element in the second data frame
                            
                                Is it possible to catch error in C for Rf_eval R?
                            
                                From Stata to R: creating a scatterplot with vertical date lines on a subset
                            
                                disabling mapply automatically converting Dates to numeric
                            
                                How do I prevent R function "step" from outputing to the console?
                            
                                Why are the logistic regression results different between statsmodels and R?
                            
                                using fitdist from fitdistplus with binomial distribution
                            
                                Column widths not aligned with table data in pander tables sent from R with sendmailr
                            
                                Determine if data frame is empty
                            
                                Using a vector's print method in a data frame
                            
                                Change font sizes with style sheets for RStudio presentation
                            
                                R set variable equal to what function returns. Re-evaluate variable again each time it is called [duplicate]
                            
                                Topic modelling in R using phrases rather than single words
                            
                                ggplot2 : printing multiple plots in one page with a loop
                            
                                Rvest error: type 'externalptr'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

tbl_df and data.frame difference when using loops

Tags:

r

dplyr

peter_w

People also ask

1 Answers

Khashaa

Recent Activity

Donate For Us