I am working on a rather large dataset in R which is split in several dataframes.
The problem is that I do some things with the whole set, sometimes I just need to work with or modify parts of the set and my selectors are getting very clunky, f.e.
aListOfItems$attribute4([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6
& aListOfItems$attribute3 == "C"),] <- aListOfItems([aListOfItems$attribute1 == true &
aListOfItems$attribute2 == 6 & aListOfItems$attribute3 == "C"),aListOfItems$attribute5]
* aListOfItems([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6 &
aListOfItems$attribute3 == "C"),aListOfItems$attribute7]
(this sets attribute 4 to (attribute5 * attribute6) for a selected part of all entries.)
This is horrible to read, understand and edit.
Splitting these into a different dataframes is not really an option due to RAM and because I refresh this data regulary and rebuilding all the seperate dataframes would also be a pain.
So, is there any way to do something like
items_t6C <- &(aListOfItems([aListOfItems$attribute1 == true & aListOfItems$attribute2
== 6 & aListOfItems$attribute3 == "C"),]
so I can use
items_t6C$attribute4 <- # do something
Alternatively, maybe it is possible to store such a selector in a string-variable and use it?
You can first construct a logical vector, give it a meaningful name, and use that in the command. It makes your script a bit longer, but much easier to read:
interesting_bit = with(aListOfItems, attribute1 &
attribute2 == 6 &
attribute3 == "C")
In addition, using a bit of indentation also makes the code much more readable.
aListOfItems$attribute4[interesting_bit,] <-
aListOfItems[interesting_bit,aListOfItems$attribute5]
* aListOfItems[interesting_bit,aListOfItems$attribute7]
And using within
does more for readability:
aListOfItems[interesting_bit,] = within(aListOfItems[interesting_bit,], {
attribute4 = attribute5 * attribute7
}
Also, for a logical there is no need to explicitly test for == true
:
interesting_bit = aListOfItems$attribute1 &
aListOfItems$attribute2 == 6 & aListOfItems$attribute3 == "C"
This ultimately reduces this:
aListOfItems$attribute4([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6
& aListOfItems$attribute3 == "C"),] <- aListOfItems([aListOfItems$attribute1 == true &
aListOfItems$attribute2 == 6 & aListOfItems$attribute3 == "C"),aListOfItems$attribute5]
* aListOfItems([aListOfItems$attribute1 == true & aListOfItems$attribute2 == 6 &
aListOfItems$attribute3 == "C"),aListOfItems$attribute7]
to this (note the additional use of with
):
interesting_bit = with(aListOfItems, attribute1 &
attribute2 == 6 &
attribute3 == "C")
aListOfItems[interesting_bit,] = within(aListOfItems[interesting_bit,], {
attribute4 = attribute5 * attribute7
}
This code does not only look less daunting, but also instantly conveys what it is you are doing, which is very hard to divine from your original code.
The data.table
package might be useful to you.
data.table
works mostly by reference. Especially when assigning and modifying columns.
Especially if you are hitting RAM limits, the efficiencies in data.table are dramatic
Additionally, data.table has built into the functionality of with
within
by
subset
etc, making calls much shorter and code more readable.
For example, the cummbersome statement above could be simplified to something like:
aDTofItems[attribute1 & attribute2==6 & attribute3=="C", # filter
attribute4 := attribute5 * attribute6] # assign
Furthermore, if the attributes you are filtering on are the key
of the table, then the line is even shorter:
aDTofItems[.(TRUE, 6, "C"), # filter
attribute4 := attribute5 * attribute6] # assign
Assuming the structure of each element is comparable, you can coerce your list into a data.table using
aDTofItems <- rbindlist(aListOfItems)
# note, if you have factors in your list you should convert them to character before calling rbindlist
# or similarly, although a bit slower
aDTofItems <- data.table(do.call(rbind, aListOfItems))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With