Using the tidyverse
a lot i often face the challenge of turning named vectors into a data.frame
/tibble
with the columns being the names of the vector.
What is the prefered/tidyversey way of doing this?
EDIT: This is related to: this and this github-issue
So i want:
require(tidyverse) vec <- c("a" = 1, "b" = 2)
to become this:
# A tibble: 1 × 2 a b <dbl> <dbl> 1 1 2
I can do this via e.g.:
vec %>% enframe %>% spread(name, value) vec %>% t %>% as_tibble
Usecase example:
require(tidyverse) require(rvest) txt <- c('<node a="1" b="2"></node>', '<node a="1" c="3"></node>') txt %>% map(read_xml) %>% map(xml_attrs) %>% map_df(~t(.) %>% as_tibble)
Which gives
# A tibble: 2 × 3 a b c <chr> <chr> <chr> 1 1 2 <NA> 2 1 <NA> 3
For example, if we have a vector x then it can be converted to data frame by using as. data. frame(x) and this can be done for a matrix as well.
Tibbles vs data frames There are two main differences in the usage of a data frame vs a tibble: printing, and subsetting. Tibbles have a refined print method that shows only the first 10 rows, and all the columns that fit on screen. This makes it much easier to work with large data.
Use as_tibble() to turn an existing object into a tibble. Use enframe() to convert a named vector into a tibble.
Every tibble is a named list of vectors, each of the same length. These vectors form the tibble columns.
This is now directly supported using bind_rows
(introduced in dplyr 0.7.0
):
library(tidyverse)) vec <- c("a" = 1, "b" = 2) bind_rows(vec) #> # A tibble: 1 x 2 #> a b #> <dbl> <dbl> #> 1 1 2
This quote from https://cran.r-project.org/web/packages/dplyr/news.html explains the change:
bind_rows()
andbind_cols()
now accept vectors. They are treated as rows by the former and columns by the latter. Rows require inner names likec(col1 = 1, col2 = 2)
, while columns require outer names:col1 = c(1, 2)
. Lists are still treated as data frames but can be spliced explicitly with!!!
, e.g.bind_rows(!!! x)
(#1676).
With this change, it means that the following line in the use case example:
txt %>% map(read_xml) %>% map(xml_attrs) %>% map_df(~t(.) %>% as_tibble)
can be rewritten as
txt %>% map(read_xml) %>% map(xml_attrs) %>% map_df(bind_rows)
which is also equivalent to
txt %>% map(read_xml) %>% map(xml_attrs) %>% { bind_rows(!!! .) }
The equivalence of the different approaches is demonstrated in the following example:
library(tidyverse) library(rvest) txt <- c('<node a="1" b="2"></node>', '<node a="1" c="3"></node>') temp <- txt %>% map(read_xml) %>% map(xml_attrs) # x, y, and z are identical x <- temp %>% map_df(~t(.) %>% as_tibble) y <- temp %>% map_df(bind_rows) z <- bind_rows(!!! temp) identical(x, y) #> [1] TRUE identical(y, z) #> [1] TRUE z #> # A tibble: 2 x 3 #> a b c #> <chr> <chr> <chr> #> 1 1 2 <NA> #> 2 1 <NA> 3
The idiomatic way would be to splice the vector with !!!
within a tibble()
call so the named vector elements become column definitions :
library(tibble) vec <- c("a" = 1, "b" = 2) tibble(!!!vec) #> # A tibble: 1 x 2 #> a b #> <dbl> <dbl> #> 1 1 2
Created on 2019-09-14 by the reprex package (v0.3.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With