I have a dataframe which I read from a .csv file and looks like this:
              job    name `phone number`
            <chr>   <chr>          <int>
 1      developer    john            654
 2      developer    mike            321
 3      developer  albert            987
 4        manager    dana            741
 5        manager     guy            852
 6        manager    anna            936
 7      developer     dan            951
 8      developer   shean            841
 9 administrative  rebeca            357
10 administrative  krissy            984
11 administrative   hilma            651
12 administrative    otis            325
13 administrative   piper            654
14        manager   mendy            984
15        manager corliss            321
DT = structure(list(job = c("developer", "developer", "developer", 
"manager", "manager", "manager", "developer", "developer", "administrative", 
"administrative", "administrative", "administrative", "administrative", 
"manager", "manager"), name = c("john", "mike", "albert", "dana", 
"guy", "anna", "dan", "shean", "rebeca", "krissy", "hilma", "otis", 
"piper", "mendy", "corliss"), phone = c(654L, 321L, 987L, 741L, 
852L, 936L, 951L, 841L, 357L, 984L, 651L, 325L, 654L, 984L, 321L
)), .Names = c("job", "name", "phone"), row.names = c(NA, -15L
), class = "data.frame")
I want to transform it into a list of lists, where, for example:
myList$developer
would give me a list containing all developers, and then
myList$developer$john
would give me a list of phone numbers associated with developers named John. Is there any simple way of doing it?
If you're curious as to why I'd want to do something like that: the actual data frame I'm working with is huge, so finding a specific entry by 4 parameters (in this example I can find a specific entry with 2 parameters: job, name) takes way too much time using filter. I think that the hash table structure of a nested list might take a lot of time to build, but would be searchable in O(1), which definitely works for me. If I'm wrong and you have a better way of doing it I'd love to hear it too.
You can use a double split with lapply and the drop = TRUE-parameter for that. Using drop = TRUE will drop levels that do not occur, thus preventing the creation of empty list elements.
Using:
l <- split(dat, dat$job, drop = TRUE)
nestedlist <- lapply(l, function(x) split(x, x[['name']], drop = TRUE))
Or in one go:
nestedlist <- lapply(split(dat, dat$job, drop = TRUE),
                     function(x) split(x, x[['name']], drop = TRUE))
gives:
> nestedlist $administrative $administrative$hilma job name phonenumber 11 administrative hilma 651 $administrative$krissy job name phonenumber 10 administrative krissy 984 $administrative$otis job name phonenumber 12 administrative otis 325 $administrative$piper job name phonenumber 13 administrative piper 654 $administrative$rebeca job name phonenumber 9 administrative rebeca 357 $developer $developer$albert job name phonenumber 3 developer albert 987 $developer$dan job name phonenumber 7 developer dan 951 $developer$john job name phonenumber 1 developer john 654 $developer$mike job name phonenumber 2 developer mike 321 $developer$shean job name phonenumber 8 developer shean 841 $manager $manager$anna job name phonenumber 6 manager anna 936 $manager$corliss job name phonenumber 15 manager corliss 321 $manager$dana job name phonenumber 4 manager dana 741 $manager$guy job name phonenumber 5 manager guy 852 $manager$mendy job name phonenumber 14 manager mendy 984
Used data:
dat <- structure(list(job = c("developer", "developer", "developer", "manager", "manager", "manager", "developer", "developer", "administrative", "administrative", "administrative", "administrative", "administrative", "manager", "manager"),
                      name = c("john", "mike", "albert", "dana", "guy", "anna", "dan", "shean", "rebeca", "krissy", "hilma", "otis", "piper", "mendy", "corliss"),
                      phonenumber = c(654L, 321L, 987L, 741L, 852L, 936L, 951L, 841L, 357L, 984L, 651L, 325L, 654L, 984L, 321L)),
                 .Names = c("job", "name", "phonenumber"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"))
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With