Is there a way to use pivot_longer
and pivot_wider
on a subset of a variable? Here's an example. First, I'll create a data frame with the desired starting structure.
library(tidyverse)
# Assume this as starting df
arrests <- USArrests %>%
as_tibble(rownames = "State") %>%
pivot_longer(-State, names_to = "Crime", values_to = "Value") %>%
group_by(State) %>%
mutate(Total = sum(Value)) %>%
ungroup()
arrests
# A tibble: 200 x 4
State Crime Value Total
<chr> <chr> <dbl> <dbl>
1 Alabama Murder 13.2 328.
2 Alabama Assault 236 328.
3 Alabama UrbanPop 58 328.
4 Alabama Rape 21.2 328.
5 Alaska Murder 10 366.
6 Alaska Assault 263 366.
7 Alaska UrbanPop 48 366.
8 Alaska Rape 44.5 366.
9 Arizona Murder 8.1 413.
10 Arizona Assault 294 413.
# ... with 190 more rows
So we are using the arrest
data frame. Now I would like fold "Total" into "Crime" so that "Total" is a value within Crime, just like "Murder."
I would also like to do the reverse. After "Total" is folded into "Crime", I want to use pivot_wider
on "Crime" but only on values where Crime == "Total"
.
Are these actions possible?
One option is add_row
. After doing a group split by 'State', loop over the list
with map
and add a row (add_row
from tibble
) with the first value of 'Total' column and remove the 'Total' column
library(dplyr)
library(purrr)
library(tibble)
arrests2 <- arrests %>%
group_split(State) %>%
map_dfr(~ .x %>%
add_row(State = .$State[1], Crime = 'Total',
Value = .$Total[1]) %>%
select(-Total))
arrests2
# A tibble: 250 x 3
# State Crime Value
# * <chr> <chr> <dbl>
# 1 Alabama Murder 13.2
# 2 Alabama Assault 236
# 3 Alabama UrbanPop 58
# 4 Alabama Rape 21.2
# 5 Alabama Total 328.
# 6 Alaska Murder 10
# 7 Alaska Assault 263
# 8 Alaska UrbanPop 48
# 9 Alaska Rape 44.5
#10 Alaska Total 366.
# … with 240 more rows
Or another option is to summarise
with the 'Total' value and then do a bind_rows
arrests %>%
group_by(State) %>%
summarise(Crime = 'Total', Value = first(Total)) %>%
bind_rows(arrests %>% select(-Total), .) %>%
arrange(State)
Or using pivot_longer
library(tidyr)
arrests %>%
pivot_longer(cols = Value:Total) %>%
mutate(Crime = replace(Crime, name == 'Total', 'Total')) %>%
select(-name) %>%
distinct()
# A tibble: 250 x 3
# State Crime value
# <chr> <chr> <dbl>
# 1 Alabama Murder 13.2
# 2 Alabama Total 328.
# 3 Alabama Assault 236
# 4 Alabama UrbanPop 58
# 5 Alabama Rape 21.2
# 6 Alaska Murder 10
# 7 Alaska Total 366.
# 8 Alaska Assault 263
# 9 Alaska UrbanPop 48
#10 Alaska Rape 44.5
# … with 240 more rows
If we need to do the reverse, then grouped by 'State', create the 'Total' column by extracting the 'Value' that corresponds to 'Crime' as 'Total', and filter
out the row where the Crime is 'Total'
arrests2 %>%
group_by(State) %>%
mutate(Total = Value[Crime == 'Total']) %>%
filter(Crime != 'Total')
# A tibble: 200 x 4
# Groups: State [50]
# State Crime Value Total
# <chr> <chr> <dbl> <dbl>
# 1 Alabama Murder 13.2 328.
# 2 Alabama Assault 236 328.
# 3 Alabama UrbanPop 58 328.
# 4 Alabama Rape 21.2 328.
# 5 Alaska Murder 10 366.
# 6 Alaska Assault 263 366.
# 7 Alaska UrbanPop 48 366.
# 8 Alaska Rape 44.5 366.
# 9 Arizona Murder 8.1 413.
#10 Arizona Assault 294 413.
# … with 190 more rows
1) janitor Use adorn_totals
from the janitor package ignoring the Total column. Note that within a group_by
section that dot refers to the entire data set, not just that group, unless we refer to it within a do
which is why we use do
.
library(janitor)
res1 <- arrests %>%
select(-Total) %>%
group_by(State) %>%
do(adorn_totals(select(., -State), "row")) %>%
ungroup
res1
giving:
# A tibble: 250 x 3
State Crime Value
<chr> <chr> <dbl>
1 Alabama Murder 13.2
2 Alabama Assault 236
3 Alabama UrbanPop 58
4 Alabama Rape 21.2
5 Alabama Total 328.
6 Alaska Murder 10
7 Alaska Assault 263
8 Alaska UrbanPop 48
9 Alaska Rape 44.5
10 Alaska Total 366.
# ... with 240 more rows
We can remove the Total rows and add a column
res1 %>% {
left <- filter(., Crime != "Total")
right <- filter(., Crime == "Total") %>% select(State, Total = Value)
left_join(left, right, by = "State")
}
2) reshape2 The reshape2 package is a forerunner of the pivot_* functions. It does have margins functionality built in which seems not to have been continued in subsequent iterations in spread/gather and pivot_*. This also works if we replace the library
statement with library(data.table)
.
library(reshape2)
res2 <- dcast(arrests, State + Crime ~ "Value", fun.aggregate = sum,
value.var = "Value", margins = "Crime")
res2
giving:
State Crime Value
1 Alabama Assault 236.0
2 Alabama Murder 13.2
3 Alabama Rape 21.2
4 Alabama UrbanPop 58.0
5 Alabama (all) 328.4
6 Alaska Assault 263.0
7 Alaska Murder 10.0
8 Alaska Rape 44.5
9 Alaska UrbanPop 48.0
10 Alaska (all) 365.5
...etc...
To create a Total column and remove the total rows, create a factor that identifies each row as a Value or Total row and then dcast the result to wide form filling in NAs with na.locf
.
library(reshape2)
library(zoo)
fac <- factor(res$Crime == '(all)', labels = c("Value", "Total"))
dc <- dcast(res2, State + Crime ~ fac, value.var = "Value")
subset(na.locf(dc, fromLast = TRUE), Crime != '(all)')
or
left <- subset(res2, Crime != "(all)")
right <- subset(res2, Crime == "(all)", c(State, Value))
names(right) <- c("State", "Total")
merge(left, right, by = "State")
3) sqldf To use SQL add a level column which is 0 for detail records and 1 for Total records and then union the details and totals and sort.
library(sqldf)
res3 <- sqldf("select State, Crime, Value from (
select 0 as level, State, Crime, Value from arrests
union
select 1 as level, State, 'Total' as Crime, sum(Value) as Total from arrests
group by State)
order by State, level")
To remove the total rows and insert a Total column
sqldf("select State, Crime, Value, Total
from res3 a
left join (
select State, sum(Value) as Total
from res3
where Crime != 'Total'
group by State) using (State)
where Crime != 'Total'")
4) Base R This is straight forward in base R using xtabs
and addmargins
.
Total <- sum
tab <- addmargins(xtabs(Value ~ State + Crime, arrests), 2, FUN = Total)
DF <- as.data.frame(tab, responseName = "Value")
res3 <- DF[order(DF$State, DF$Crime == "Total"), ]
and modifying (2) we can use the following to remove the Total rows and add a Total column:
left <- subset(res3, Crime != "Total")
right <- subset(res3, Crime == "Total", c(State, Value))
names(right) <- c("State", "Total")
merge(left, right, by = "State")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With