I have a dataframe with sample classifications:
Seq_ID Family Father Mother Sex Role Type
<chr> <dbl> <chr> <chr> <chr> <chr> <chr>
1 SSC02219 11000. 0 0 Male Father Parent
2 SSC02217 11000. 0 0 Female Mother Parent
3 SSC02254 11000. SSC02219 SSC02217 Male Proband Child
4 SSC02220 11000. SSC02219 SSC02217 Female Sibling Child
5 SSC02184 11001. 0 0 Male Father Parent
6 SSC02181 11001. 0 0 Female Mother Parent
7 SSC02178 11001. SSC02184 SSC02181 Male Proband Child
8 SSC03092 11002. 0 0 Male Father Parent
9 SSC03078 11002. 0 0 Female Mother Parent
10 SSC03070 11002. SSC03092 SSC03078 Female Proband Child
Currently, to go from a to b, I have to do this:
library(tidyverse)
library(janitor)
sample.df %>% tabyl(Role, Sex) %>%
adorn_totals(where=c("row", "col") ) %>%
as.tibble() %>% select(1,4,3,2) %>%
# Part 2
mutate(type=c("parent", "parent", "child", "child", " ")) %>%
inner_join(., group_by(., type) %>%
summarise(total=sum(Total))) %>%
select(5,6,1,2,3,4)
I feel like this is such a workaround for something very simple. Is there a more direct way to do the second part in dplyr?
a
b
Rotating or transposing R objects That is, you transpose the rows and columns. You simply use the t() command.
To create a frequency column for categorical variable in an R data frame, we can use the transform function by defining the length of categorical variable using ave function. The output will have the duplicated frequencies as one value in the categorical column is likely to be repeated.
Thus, to convert columns of an R data frame into rows we can use transpose function t. For example, if we have a data frame df with five columns and five rows then we can convert the columns of the df into rows by using as. data. frame(t(df)).
There are multiple ways to get the count of the frequency of all unique values in an R vector. To count the number of times each element or value is present in a vector use either table(), tabulate(), count() from plyr package, or aggregate() function.
Here is an option. as.tibble
is not necessary. mutate
with case_when
is more manageable when you have a lot of classes to assign to "parent" or "child". inner_join
is not required as we can use group_by
and mutate
to calculate the total
. Finally, I like to write down the column names when I use the select
function as it will easier for me to read in the future, but you can of course use column indices as long as you are confident that the column indices would be unchanged no matter what new analyses you may include in your pipe operation.
library(tidyverse)
library(janitor)
sample.df %>%
tabyl(Role, Sex) %>%
adorn_totals(where=c("row", "col")) %>%
select(Role, Total, Male, Female) %>%
# Part 2
mutate(type = case_when(
Role %in% c("Mother", "Father") ~"parent",
Role %in% c("Proband", "Sibling") ~"child",
TRUE ~" "
)) %>%
group_by(type) %>%
mutate(total = sum(Total)) %>%
ungroup() %>%
select(type, total, Role, Total, Male, Female)
# # A tibble: 5 x 6
# type total Role Total Male Female
# <chr> <dbl> <chr> <dbl> <dbl> <dbl>
# 1 parent 6. Father 3. 3. 0.
# 2 parent 6. Mother 3. 0. 3.
# 3 child 4. Proband 3. 2. 1.
# 4 child 4. Sibling 1. 0. 1.
# 5 " " 10. Total 10. 5. 5.
DATA
library(tidyverse)
library(janitor)
sample.df <- read.table(text = "Seq_ID Family Father Mother Sex Role Type
1 SSC02219 11000 0 0 Male Father Parent
2 SSC02217 11000 0 0 Female Mother Parent
3 SSC02254 11000 SSC02219 SSC02217 Male Proband Child
4 SSC02220 11000 SSC02219 SSC02217 Female Sibling Child
5 SSC02184 11001 0 0 Male Father Parent
6 SSC02181 11001 0 0 Female Mother Parent
7 SSC02178 11001 SSC02184 SSC02181 Male Proband Child
8 SSC03092 11002 0 0 Male Father Parent
9 SSC03078 11002 0 0 Female Mother Parent
10 SSC03070 11002 SSC03092 SSC03078 Female Proband Child ",
header = TRUE, stringsAsFactors = FALSE)
sample.df <- as_tibble(sample.df)
Another option could be using knitr
library(janitor)
library(tidyverse)
library(kableExtra)
library(knitr)
sample.df %>%
tabyl(Role, Sex) %>%
adorn_totals(where=c("row", "col")) %>%
# Part 2
mutate(type=case_when(
Role %in% c('Father', 'Mother') ~ 'parent',
Role %in% c('Proband', 'Sibling') ~ 'child',
TRUE ~ ''
)) %>%
group_by(type) %>%
mutate(total=sum(Total)) %>%
ungroup() %>%
kable("html") %>%
kable_styling(c("striped", "bordered")) %>%
collapse_rows(columns = c(5,6))
Output is:
Sample data:
sample.df <- structure(list(Seq_ID = c("SSC02219", "SSC02217", "SSC02254",
"SSC02220", "SSC02184", "SSC02181", "SSC02178", "SSC03092", "SSC03078",
"SSC03070"), Family = c(11000L, 11000L, 11000L, 11000L, 11001L,
11001L, 11001L, 11002L, 11002L, 11002L), Father = c("0", "0",
"SSC02219", "SSC02219", "0", "0", "SSC02184", "0", "0", "SSC03092"
), Mother = c("0", "0", "SSC02217", "SSC02217", "0", "0", "SSC02181",
"0", "0", "SSC03078"), Sex = c("Male", "Female", "Male", "Female",
"Male", "Female", "Male", "Male", "Female", "Female"), Role = c("Father",
"Mother", "Proband", "Sibling", "Father", "Mother", "Proband",
"Father", "Mother", "Proband"), Type = c("Parent", "Parent",
"Child", "Child", "Parent", "Parent", "Child", "Parent", "Parent",
"Child")), row.names = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10"), class = c("tbl_df", "tbl", "data.frame"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With