Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use elements of a dataframe like hash keys / dictionary keys / primary keys?

Tags:

r

I have a dataframe in which I want to use certain values as hash keys / dictionary keys (or whatever you call it in your language of choice) for other values in that dataframe. Say I have a dataframe like this which I've read in from a large csv file (only first row shown):

  Plate.name QN.number Well Allele.X.Rn Allele.Y.Rn Call
1 Plate 1_A1    QN2200   A     1.766       2.791    Both 

which in R code would be:

 structure(list(Plate.name = structure(1L, .Label = "Plate 1_A1", class = "factor"), 
    QN.number = structure(1L, .Label = "QN2200", class = "factor"), 
    Well = structure(1L, .Label = "A1", class = "factor"), Allele.X.Rn = 1.766, 
    Allele.Y.Rn = 2.791, Call = structure(1L, .Label = "Both", class = "factor")), .Names = c("Plate.name", 
"QN.number", "Well", "Allele.X.Rn", "Allele.Y.Rn", "Call"), class = "data.frame", row.names = c(NA, 
-1L))

THe QN.numbers are unique IDs in my dataset. How do I then retrieve data using the QN.number as a reference for the other values, that is to say I want to know the Call or the Allele.X.Rn for a given QN.number? It seems row.names might do the trick but then how would I use them in this instance?

like image 826
arandomlypickedname Avatar asked Jul 25 '11 10:07

arandomlypickedname


2 Answers

Using row.names is like this:

> row.names(d)=d$QN.number
> d["QN2200",]
       Plate.name QN.number Well Allele.X.Rn Allele.Y.Rn Call
QN2200 Plate 1_A1    QN2200   A1       1.766       2.791 Both
> d["QN2201",]
   Plate.name QN.number Well Allele.X.Rn Allele.Y.Rn Call
NA       <NA>      <NA> <NA>          NA          NA <NA>

You just use the row name as the first parameter in the subsetting. You can also use multiple row names:

> d=data.frame(a=letters[1:10],b=runif(10))
> row.names(d)=d$a
> d[c("a","g","d"),]
  a         b
a a 0.6434431
g g 0.6724661
d d 0.9826392

Now I'm not sure how clever this is, and whether it does sequential search for each row name or faster indexing...

like image 148
Spacedman Avatar answered Sep 22 '22 09:09

Spacedman


Use subset.

 subset(your_data, QN.number == "QN2200", Allele.X.Rn)

with provides an alternative; here the output is a vector rather than another data frame.

with(your_data, Allele.X.Rn[QN.number == "QN2200"])
like image 39
Richie Cotton Avatar answered Sep 24 '22 09:09

Richie Cotton