Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What makes these two R data frames not identical?

I have two small data frames, this_tx and last_tx. They are, in every way that I can tell, completely identical. this_tx == last_tx results in a frame of identical dimensions, all TRUE. this_tx %in% last_tx, two TRUEs. Inspected visually, clearly identical. But when I call

identical(this_tx, last_tx)

I get a FALSE. Hilariously, even

identical(str(this_tx), str(last_tx))

will return a TRUE. If I set this_tx <- last_tx, I'll get a TRUE.

What is going on? I don't have the deepest understanding of R's internal mechanics, but I can't find a single difference between the two data frames. If it's relevant, the two variables in the frames are both factors - same levels, same numeric coding for the levels, both just subsets of the same original data frame. Converting them to character vectors doesn't help.

Background (because I wouldn't mind help on this, either): I have records of drug treatments given to patients. Each treatment record essentially specifies a person and a date. A second table has a record for each drug and dose given during a particular treatment (usually, a few drugs are given each treatment). I'm trying to identify contiguous periods during which the person was taking the same combinations of drugs at the same doses.

The best plan I've come up with is to check the treatments chronologically. If the combination of drugs and doses for treatment[i] is identical to the combination at treatment[i-1], then treatment[i] is a part of the same phase as treatment[i-1]. Of course, if I can't compare drug/dose combinations, that's right out.

like image 1000
Matt Parker Avatar asked Apr 22 '10 00:04

Matt Parker


1 Answers

Generally, in this situation it's useful to try all.equal which will give you some information about why two objects are not equivalent.

like image 138
hadley Avatar answered Oct 03 '22 22:10

hadley