Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

is there a concept of Shortcuts/Alias/Pointer in R?

Tags:

r

I am working on a rather large dataset in R which is split in several dataframes.

The problem is that I do some things with the whole set, sometimes I just need to work with or modify parts of the set and my selectors are getting very clunky, f.e.

aListOfItems$attribute4([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6
& aListOfItems$attribute3 == "C"),] <- aListOfItems([aListOfItems$attribute1 == true &   
aListOfItems$attribute2 == 6 & aListOfItems$attribute3 == "C"),aListOfItems$attribute5]
* aListOfItems([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6 &
aListOfItems$attribute3 == "C"),aListOfItems$attribute7]

(this sets attribute 4 to (attribute5 * attribute6) for a selected part of all entries.)

This is horrible to read, understand and edit.

Splitting these into a different dataframes is not really an option due to RAM and because I refresh this data regulary and rebuilding all the seperate dataframes would also be a pain.

So, is there any way to do something like

items_t6C <- &(aListOfItems([aListOfItems$attribute1 == true & aListOfItems$attribute2
 == 6 & aListOfItems$attribute3 == "C"),]

so I can use

items_t6C$attribute4 <- # do something

Alternatively, maybe it is possible to store such a selector in a string-variable and use it?

like image 960
racoonie Avatar asked Apr 05 '13 09:04

racoonie


2 Answers

You can first construct a logical vector, give it a meaningful name, and use that in the command. It makes your script a bit longer, but much easier to read:

interesting_bit = with(aListOfItems, attribute1 &   
                                     attribute2 == 6 & 
                                     attribute3 == "C")

In addition, using a bit of indentation also makes the code much more readable.

aListOfItems$attribute4[interesting_bit,] <- 
     aListOfItems[interesting_bit,aListOfItems$attribute5]
   * aListOfItems[interesting_bit,aListOfItems$attribute7]

And using within does more for readability:

aListOfItems[interesting_bit,] = within(aListOfItems[interesting_bit,], {
      attribute4 = attribute5 * attribute7
   }

Also, for a logical there is no need to explicitly test for == true:

interesting_bit = aListOfItems$attribute1 &   
         aListOfItems$attribute2 == 6 & aListOfItems$attribute3 == "C"

This ultimately reduces this:

aListOfItems$attribute4([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6
& aListOfItems$attribute3 == "C"),] <- aListOfItems([aListOfItems$attribute1 == true &   
aListOfItems$attribute2 == 6 & aListOfItems$attribute3 == "C"),aListOfItems$attribute5]
* aListOfItems([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6 &
aListOfItems$attribute3 == "C"),aListOfItems$attribute7]

to this (note the additional use of with):

interesting_bit = with(aListOfItems, attribute1 &   
                                     attribute2 == 6 & 
                                     attribute3 == "C")
aListOfItems[interesting_bit,] = within(aListOfItems[interesting_bit,], {
      attribute4 = attribute5 * attribute7
   }

This code does not only look less daunting, but also instantly conveys what it is you are doing, which is very hard to divine from your original code.

like image 173
Paul Hiemstra Avatar answered Oct 09 '22 17:10

Paul Hiemstra


The data.table package might be useful to you.

data.table works mostly by reference. Especially when assigning and modifying columns. Especially if you are hitting RAM limits, the efficiencies in data.table are dramatic

Additionally, data.table has built into the functionality of with within by subset etc, making calls much shorter and code more readable.

For example, the cummbersome statement above could be simplified to something like:

aDTofItems[attribute1 & attribute2==6 & attribute3=="C", # filter
           attribute4 := attribute5 * attribute6]        # assign

Furthermore, if the attributes you are filtering on are the key of the table, then the line is even shorter:

aDTofItems[.(TRUE, 6, "C"),     # filter
           attribute4 := attribute5 * attribute6]   # assign

Assuming the structure of each element is comparable, you can coerce your list into a data.table using

aDTofItems <- rbindlist(aListOfItems)
# note, if you have factors in your list you should convert them to character before calling rbindlist

# or similarly, although a bit slower 
aDTofItems <- data.table(do.call(rbind, aListOfItems))
like image 42
Ricardo Saporta Avatar answered Oct 09 '22 17:10

Ricardo Saporta