My data looks like this:
data("Titanic")
df <- as.data.frame(Titanic)
How can I de-aggregate or reverse-summarise count/freq and expand the data set back to it's original non-count observation state?
For instance, I want 3rd, Male, Child, No
repeated 35 times and 1st, Female, Adult, Yes
repeated 140 times, etc, etc, in the dataframe.
Thanks in advance.
Chiming in with another tidyr
approach, using tidyr::uncount
.
library(tidyverse)
original <- tibble(x = c(1,1,1,2,2,2,4,4,4))
aggregated <- original %>% count(x)
deaggregated <- aggregated %>% uncount(weights = n)
You can do this with list columns and a few dplyr
/tidyr
/purrr
verbs. It's not as compact as other base R solutions may be, but for me it's easier to understand how things fit together and it works in a larger tidyverse
pipeflow.
As a check first-off, we're expecting 2,201 rows in the final data frame:
library(dplyr)
library(tidyr)
library(purrr)
sum(df$Freq)
#> [1] 2201
Converting this to a tibble
makes it easier to see and work with the list column. I'm using purrr::map
to move along the Freq
column, creating a vector of a dummy marker the length of the Freq
value. In this case, that marker is just "1"; it could instead be TRUE
or anything else. The point it just that it will create a vector of the length Freq
.
df %>%
as_tibble() %>%
mutate(obs = map(Freq, ~rep_len(1, .x)))
#> # A tibble: 32 x 6
#> Class Sex Age Survived Freq obs
#> <fct> <fct> <fct> <fct> <dbl> <list>
#> 1 1st Male Child No 0 <dbl [0]>
#> 2 2nd Male Child No 0 <dbl [0]>
#> 3 3rd Male Child No 35 <dbl [35]>
#> 4 Crew Male Child No 0 <dbl [0]>
#> 5 1st Female Child No 0 <dbl [0]>
#> 6 2nd Female Child No 0 <dbl [0]>
#> 7 3rd Female Child No 17 <dbl [17]>
#> 8 Crew Female Child No 0 <dbl [0]>
#> 9 1st Male Adult No 118 <dbl [118]>
#> 10 2nd Male Adult No 154 <dbl [154]>
#> # … with 22 more rows
Then tidyr::unnest
creates a row for each element in that dummy vector. After that, I drop those last 2 columns to just have the important categories of class, sex, age, and survival.
df %>%
as_tibble() %>%
mutate(obs = map(Freq, ~rep_len(1, .x))) %>%
unnest() %>%
select(-Freq, -obs)
#> # A tibble: 2,201 x 4
#> Class Sex Age Survived
#> <fct> <fct> <fct> <fct>
#> 1 3rd Male Child No
#> 2 3rd Male Child No
#> 3 3rd Male Child No
#> 4 3rd Male Child No
#> 5 3rd Male Child No
#> 6 3rd Male Child No
#> 7 3rd Male Child No
#> 8 3rd Male Child No
#> 9 3rd Male Child No
#> 10 3rd Male Child No
#> # … with 2,191 more rows
In the end, it is in fact a data frame of 2,201 rows.
Without packages we can repeat each row according to the frequencies given:
df2 <- df[rep(1:nrow(df), df[,5]),-5]
You can do this using the function untable
in reshape.
data("Titanic")
df <- as.data.frame(Titanic)
library(reshape)
newDf = untable(df[,1:4], num = df[,5])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With