Assume I have a dataframe: Gender can take F as female or M as male Race can take A as Asian, W as White, B as Black and H as Hispanic <pre class="prettyprint"><code>| id | Gender | Race | | --- | ----- | ---- | | 1 | F | W | | 2 | F | B | | 3 | M | A | | 4 | F | B | | 5 | M | W | | 6 | M | B | | 7 | F | H | </code></pre> And I want to have a set of columns as dummies base on Gender and Race, the dataframe should be like <pre class="prettyprint"><code>| id | Gender | Race | F_W | F_B | F_A | F_H | M_W | M_B | M_A | M_H | | --- | ----- | ---- | --- | --- | --- | --- | --- | --- | --- | --- | | 1 | F | W | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 2 | F | B | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | 3 | M | A | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | | 4 | F | B | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | 5 | M | W | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | | 6 | M | B | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | | 7 | F | H | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | </code></pre> My actual data contains of much more categories than this example so I do appreciate if you can make it in a more neat way. The language is R. Thank you for your help.

Apart from the column names, you can get this with the <code>model.matrix</code> function and a formula expressing just the interaction terms, and subtracting an intercept: <pre class="prettyprint"><code>> dm = cbind(d,model.matrix(~Gender:Race-1, data=d)) > dm id Gender Race GenderF:RaceA GenderM:RaceA GenderF:RaceB GenderM:RaceB 1 1 F H 0 0 0 0 2 2 M H 0 0 0 0 3 3 M W 0 0 0 0 4 4 F H 0 0 0 0 5 5 M H 0 0 0 0 [etc] </code></pre> If you care about the exact names its easy enough to sort them out with a bit of string processing. <pre class="prettyprint"><code>> names(dm)[-(1:3)] = sub("Gender","",sub("Race","",sub(":","_",names(dm)[-(1:3)]))) > dm id Gender Race F_A M_A F_B M_B F_H M_H F_W M_W 1 1 F H 0 0 0 0 1 0 0 0 2 2 M H 0 0 0 0 0 1 0 0 3 3 M W 0 0 0 0 0 0 0 1 4 4 F H 0 0 0 0 1 0 0 0 5 5 M H 0 0 0 0 0 1 0 0 6 6 F H 0 0 0 0 1 0 0 0 7 7 F H 0 0 0 0 1 0 0 0 8 8 M A 0 1 0 0 0 0 0 0 9 9 M W 0 0 0 0 0 0 0 1 10 10 F B 0 0 1 0 0 0 0 0 </code></pre> If you care about the column order....

Another base R option with <code>xtabs</code> <pre class="prettyprint"><code>cbind( df, as.data.frame.matrix( xtabs( ~ id + q, transform( df, q = paste0(Gender, "_", Race) ) ) ) ) </code></pre> gives <pre class="prettyprint"><code> id Gender Race F_B F_H F_W M_A M_B M_W 1 1 F W 0 0 1 0 0 0 2 2 F B 1 0 0 0 0 0 3 3 M A 0 0 0 1 0 0 4 4 F B 1 0 0 0 0 0 5 5 M W 0 0 0 0 0 1 6 6 M B 0 0 0 0 1 0 7 7 F H 0 1 0 0 0 0 </code></pre>

I think you can use the following solution. It has actually 2 variables fewer than your desired output where the output will be zero nonetheless. Since <code>pivot_wider</code> will spread all the combinations that can be found in the data set. <pre class="prettyprint"><code>library(dplyr) library(tidyr) df %>% mutate(grp = 1) %>% pivot_wider(names_from = c(Gender, Race), values_from = grp, values_fill = 0, names_glue = "{Gender}_{Race}") %>% right_join(df, by = "id") %>% relocate(id, Gender, Race) # A tibble: 7 x 9 id Gender Race F_W F_B M_A M_W M_B F_H <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 F W 1 0 0 0 0 0 2 2 F B 0 1 0 0 0 0 3 3 M A 0 0 1 0 0 0 4 4 F B 0 1 0 0 0 0 5 5 M W 0 0 0 1 0 0 6 6 M B 0 0 0 0 1 0 7 7 F H 0 0 0 0 0 1 </code></pre>

In addtion to the tidyverse solution from Anoushiravan R. Here is another option with <code>unite</code>, <code>pivot_wider</code>, <code>across</code> and <code>case_when</code> <pre class="prettyprint"><code>library(tidyverse) df %>% unite(comb, Gender:Race, remove = FALSE) %>% pivot_wider( names_from = comb, values_from = comb ) %>% mutate(across(c(F_W, F_B, M_A, M_W, M_B, F_H), ~ case_when(is.na(.) ~ 0, TRUE ~ 1))) </code></pre> Output: <pre class="prettyprint"><code> id Gender Race F_W F_B M_A M_W M_B F_H <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 1 F W 1 0 0 0 0 0 2 2 F B 0 1 0 0 0 0 3 3 M A 0 0 1 0 0 0 4 4 F B 0 1 0 0 0 0 5 5 M W 0 0 0 1 0 0 6 6 M B 0 0 0 0 1 0 7 7 F H 0 0 0 0 0 1 </code></pre>

How to create dummies based on two columns in R

Tags:

r

dummy-variable

Assume I have a dataframe: Gender can take F as female or M as male Race can take A as Asian, W as White, B as Black and H as Hispanic

| id | Gender | Race |
| --- | ----- | ---- |
| 1   | F    | W |
| 2   | F    | B |
| 3   | M    | A |
| 4   | F    | B |
| 5   | M    | W |
| 6   | M    | B |
| 7   | F    | H |

And I want to have a set of columns as dummies base on Gender and Race, the dataframe should be like

| id | Gender | Race | F_W | F_B | F_A | F_H | M_W | M_B | M_A | M_H |
| --- | ----- | ---- | --- | --- | --- | --- | --- | --- | --- | --- |
| 1   | F    | W   |  1  |  0  |  0  |  0  |  0  |  0  |  0  |  0  |
| 2   | F    | B   |  0  |  1  |  0  |  0  |  0  |  0  |  0  |  0  |
| 3   | M    | A   |  0  |  0  |  0  |  0  |  0  |  0  |  1  |  0  |
| 4   | F    | B   |  0  |  1  |  0  |  0  |  0  |  0  |  0  |  0  |
| 5   | M    | W   |  0  |  0  |  0  |  0  |  1  |  0  |  0  |  0  |
| 6   | M    | B   |  0  |  0  |  0  |  0  |  0  |  1  |  0  |  0  |
| 7   | F    | H   |  0  |  0  |  0  |  1  |  0  |  0  |  0  |  0  |

My actual data contains of much more categories than this example so I do appreciate if you can make it in a more neat way. The language is R. Thank you for your help.

397

asked Jul 17 '21 13:07

xxx

5 Answers

Apart from the column names, you can get this with the model.matrix function and a formula expressing just the interaction terms, and subtracting an intercept:

> dm = cbind(d,model.matrix(~Gender:Race-1, data=d))
> dm
   id Gender Race GenderF:RaceA GenderM:RaceA GenderF:RaceB GenderM:RaceB
1   1      F    H             0             0             0             0
2   2      M    H             0             0             0             0
3   3      M    W             0             0             0             0
4   4      F    H             0             0             0             0
5   5      M    H             0             0             0             0
[etc]

If you care about the exact names its easy enough to sort them out with a bit of string processing.

> names(dm)[-(1:3)] = sub("Gender","",sub("Race","",sub(":","_",names(dm)[-(1:3)])))
> dm
   id Gender Race F_A M_A F_B M_B F_H M_H F_W M_W
1   1      F    H   0   0   0   0   1   0   0   0
2   2      M    H   0   0   0   0   0   1   0   0
3   3      M    W   0   0   0   0   0   0   0   1
4   4      F    H   0   0   0   0   1   0   0   0
5   5      M    H   0   0   0   0   0   1   0   0
6   6      F    H   0   0   0   0   1   0   0   0
7   7      F    H   0   0   0   0   1   0   0   0
8   8      M    A   0   1   0   0   0   0   0   0
9   9      M    W   0   0   0   0   0   0   0   1
10 10      F    B   0   0   1   0   0   0   0   0

If you care about the column order....

answered Nov 15 '22 05:11

Spacedman

A base R option with table

 cbind(df1, as.data.frame.matrix(table(transform(df1, 
    GenderRace = paste(Gender, Race, sep = "_"))[c("id", "GenderRace")])))
  id Gender Race F_B F_H F_W M_A M_B M_W
1  1      F    W   0   0   1   0   0   0
2  2      F    B   1   0   0   0   0   0
3  3      M    A   0   0   0   1   0   0
4  4      F    B   1   0   0   0   0   0
5  5      M    W   0   0   0   0   0   1
6  6      M    B   0   0   0   0   1   0
7  7      F    H   0   1   0   0   0   0

data

df1 <- structure(list(id = 1:7, Gender = c("F", "F", "M", "F", "M", 
"M", "F"), Race = c("W", "B", "A", "B", "W", "B", "H")), 
class = "data.frame", row.names = c(NA, 
-7L))

answered Nov 15 '22 05:11

akrun

Another base R option with xtabs

cbind(
    df,
    as.data.frame.matrix(
        xtabs(
            ~ id + q,
            transform(
                df,
                q = paste0(Gender, "_", Race)
            )
        )
    )
)

gives

  id Gender Race F_B F_H F_W M_A M_B M_W
1  1      F    W   0   0   1   0   0   0
2  2      F    B   1   0   0   0   0   0
3  3      M    A   0   0   0   1   0   0
4  4      F    B   1   0   0   0   0   0
5  5      M    W   0   0   0   0   0   1
6  6      M    B   0   0   0   0   1   0
7  7      F    H   0   1   0   0   0   0

answered Nov 15 '22 06:11

ThomasIsCoding

I think you can use the following solution. It has actually 2 variables fewer than your desired output where the output will be zero nonetheless. Since pivot_wider will spread all the combinations that can be found in the data set.

library(dplyr)
library(tidyr)

df %>%
  mutate(grp = 1) %>%
  pivot_wider(names_from = c(Gender, Race), values_from = grp, 
              values_fill = 0, names_glue = "{Gender}_{Race}") %>%
  right_join(df, by = "id") %>%
  relocate(id, Gender, Race)

# A tibble: 7 x 9
     id Gender Race    F_W   F_B   M_A   M_W   M_B   F_H
  <int> <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1 F      W         1     0     0     0     0     0
2     2 F      B         0     1     0     0     0     0
3     3 M      A         0     0     1     0     0     0
4     4 F      B         0     1     0     0     0     0
5     5 M      W         0     0     0     1     0     0
6     6 M      B         0     0     0     0     1     0
7     7 F      H         0     0     0     0     0     1

answered Nov 15 '22 07:11

Anoushiravan R

In addtion to the tidyverse solution from Anoushiravan R. Here is another option with unite, pivot_wider, across and case_when

library(tidyverse)
  df %>% 
    unite(comb, Gender:Race, remove = FALSE) %>% 
    pivot_wider(
      names_from = comb,
      values_from = comb
    ) %>% 
    mutate(across(c(F_W, F_B, M_A, M_W, M_B, F_H), 
                  ~ case_when(is.na(.) ~ 0, 
                              TRUE ~ 1)))

Output:

  id    Gender Race    F_W   F_B   M_A   M_W   M_B   F_H
  <chr> <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1     F      W         1     0     0     0     0     0
2 2     F      B         0     1     0     0     0     0
3 3     M      A         0     0     1     0     0     0
4 4     F      B         0     1     0     0     0     0
5 5     M      W         0     0     0     1     0     0
6 6     M      B         0     0     0     0     1     0
7 7     F      H         0     0     0     0     0     1

answered Nov 15 '22 05:11

TarJae

Related questions
                            
                                How to order a data.frame based on row.names in another data frame?
                            
                                R - Compute Cross Product of Vectors (Physics)
                            
                                Getting a matrix ordered
                            
                                diff on data.table column
                            
                                Unable to append to SQL Server table using sqlSave in R
                            
                                R: show ALL rows with duplicated elements in a column [duplicate]
                            
                                Tidyr how to spread into count of occurrence [duplicate]
                            
                                Check when R session have been started?
                            
                                Barplot with multiple columns in R
                            
                                list unique values for each column in a data frame
                            
                                Grouping of R dataframe by connected values
                            
                                Difference between mean and manual calculation in R?
                            
                                Extra statistics with summarize_at in dplyr
                            
                                use dplyr mutate() in programming
                            
                                Is it possible to add a third dummy variable using ifelse() in R?
                            
                                insert rows between dates by group
                            
                                dplyr::count() multiple columns
                            
                                R: How to recode multiple variables at once
                            
                                Geographical distance by group - Applying a function on each pair of rows
                            
                                Create a matrix of zeros and ones from R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to create dummies based on two columns in R

Tags:

r

dummy-variable

xxx

People also ask

5 Answers

Spacedman

data

akrun

ThomasIsCoding

Anoushiravan R

TarJae

Recent Activity

Donate For Us