Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Data frame to nested list

I have a dataframe which I read from a .csv file and looks like this:

              job    name `phone number`
            <chr>   <chr>          <int>
 1      developer    john            654
 2      developer    mike            321
 3      developer  albert            987
 4        manager    dana            741
 5        manager     guy            852
 6        manager    anna            936
 7      developer     dan            951
 8      developer   shean            841
 9 administrative  rebeca            357
10 administrative  krissy            984
11 administrative   hilma            651
12 administrative    otis            325
13 administrative   piper            654
14        manager   mendy            984
15        manager corliss            321

DT = structure(list(job = c("developer", "developer", "developer", 
"manager", "manager", "manager", "developer", "developer", "administrative", 
"administrative", "administrative", "administrative", "administrative", 
"manager", "manager"), name = c("john", "mike", "albert", "dana", 
"guy", "anna", "dan", "shean", "rebeca", "krissy", "hilma", "otis", 
"piper", "mendy", "corliss"), phone = c(654L, 321L, 987L, 741L, 
852L, 936L, 951L, 841L, 357L, 984L, 651L, 325L, 654L, 984L, 321L
)), .Names = c("job", "name", "phone"), row.names = c(NA, -15L
), class = "data.frame")

I want to transform it into a list of lists, where, for example:


would give me a list containing all developers, and then


would give me a list of phone numbers associated with developers named John. Is there any simple way of doing it?

If you're curious as to why I'd want to do something like that: the actual data frame I'm working with is huge, so finding a specific entry by 4 parameters (in this example I can find a specific entry with 2 parameters: job, name) takes way too much time using filter. I think that the hash table structure of a nested list might take a lot of time to build, but would be searchable in O(1), which definitely works for me. If I'm wrong and you have a better way of doing it I'd love to hear it too.

like image 282
shayelk Avatar asked Sep 25 '17 14:09


1 Answers

You can use a double split with lapply and the drop = TRUE-parameter for that. Using drop = TRUE will drop levels that do not occur, thus preventing the creation of empty list elements.


l <- split(dat, dat$job, drop = TRUE)
nestedlist <- lapply(l, function(x) split(x, x[['name']], drop = TRUE))

Or in one go:

nestedlist <- lapply(split(dat, dat$job, drop = TRUE),
                     function(x) split(x, x[['name']], drop = TRUE))


> nestedlist
              job  name phonenumber
11 administrative hilma         651

              job   name phonenumber
10 administrative krissy         984

              job name phonenumber
12 administrative otis         325

              job  name phonenumber
13 administrative piper         654

             job   name phonenumber
9 administrative rebeca         357

        job   name phonenumber
3 developer albert         987

        job name phonenumber
7 developer  dan         951

        job name phonenumber
1 developer john         654

        job name phonenumber
2 developer mike         321

        job  name phonenumber
8 developer shean         841

      job name phonenumber
6 manager anna         936

       job    name phonenumber
15 manager corliss         321

      job name phonenumber
4 manager dana         741

      job name phonenumber
5 manager  guy         852

       job  name phonenumber
14 manager mendy         984

Used data:

dat <- structure(list(job = c("developer", "developer", "developer", "manager", "manager", "manager", "developer", "developer", "administrative", "administrative", "administrative", "administrative", "administrative", "manager", "manager"),
                      name = c("john", "mike", "albert", "dana", "guy", "anna", "dan", "shean", "rebeca", "krissy", "hilma", "otis", "piper", "mendy", "corliss"),
                      phonenumber = c(654L, 321L, 987L, 741L, 852L, 936L, 951L, 841L, 357L, 984L, 651L, 325L, 654L, 984L, 321L)),
                 .Names = c("job", "name", "phonenumber"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"))
like image 183
Jaap Avatar answered Sep 27 '22 23:09
