I'm trying to transform each of my column factors in a column containing just <code>0</code> or <code>1</code>. Probably there is a function for that, or someone else already asked, but I couldn't found it. Here is a simple example to try to show what I need: <pre class="prettyprint"><code>test = data.frame(my_groups = c("A", "A", "A", "B", "B", "C", "C", "C", "C"), measure1 = c(1:9)) #as result: # group_A group_B group_C measure1 # 1 1 0 0 1 # 1 1 0 0 2 # 1 1 0 0 3 # 1 0 1 0 4 # 1 0 1 0 5 # 1 0 0 1 6 # 1 0 0 1 7 # 1 0 0 1 8 # 1 0 0 1 9 </code></pre> Any hint on how can I do that?

We may use <code>dummy_cols</code> from <code>fastDummies</code> <pre class="prettyprint"><code>library(fastDummies) library(dplyr) test %>% rename(group = 'my_groups') %>% dummy_cols('group', remove_selected_columns = TRUE) %>% select(starts_with('group'), measure1) </code></pre> -output <pre class="prettyprint"><code> group_A group_B group_C measure1 1 1 0 0 1 2 1 0 0 2 3 1 0 0 3 4 0 1 0 4 5 0 1 0 5 6 0 0 1 6 7 0 0 1 7 8 0 0 1 8 9 0 0 1 9 </code></pre>

We can try this using <code>dplyr</code> or <code>purrr</code>. <pre class="prettyprint lang-r prettyprint-override"><code>library(tidyverse) test = data.frame(my_groups = c("A", "A", "A", "B", "B", "C", "C", "C", "C"), measure1 = c(1:9)) dummyfy <- as_mapper(~{ len_row <- vector('numeric', nrow(test)) len_row[.] <- c(1) len_row} ) data <- pivot_wider(test, names_from = my_groups, values_from = measure1) #> Warning: Values are not uniquely identified; output will contain list-cols. #> * Use `values_fn = list` to suppress this warning. #> * Use `values_fn = length` to identify where the duplicates arise #> * Use `values_fn = {summary_fun}` to summarise duplicates map(data, ~reduce(., c)) %>% map_dfr(dummyfy) %>% bind_cols(test[-1]) #> # A tibble: 9 × 4 #> A B C measure1 #> <dbl> <dbl> <dbl> <int> #> 1 1 0 0 1 #> 2 1 0 0 2 #> 3 1 0 0 3 #> 4 0 1 0 4 #> 5 0 1 0 5 #> 6 0 0 1 6 #> 7 0 0 1 7 #> 8 0 0 1 8 #> 9 0 0 1 9 #equivalent using across: data %>% summarise(across(everything(), ~reduce(., c) %>% dummyfy)) %>% bind_cols(test[-1]) #> # A tibble: 9 × 4 #> A B C measure1 #> <dbl> <dbl> <dbl> <int> #> 1 1 0 0 1 #> 2 1 0 0 2 #> 3 1 0 0 3 #> 4 0 1 0 4 #> 5 0 1 0 5 #> 6 0 0 1 6 #> 7 0 0 1 7 #> 8 0 0 1 8 #> 9 0 0 1 9 </code></pre> Created on 2021-12-03 by the reprex package (v2.0.1)

Transform each column factors in a column containing just `0` or `1`

Tags:

dataframe

r

dplyr

I'm trying to transform each of my column factors in a column containing just 0 or 1. Probably there is a function for that, or someone else already asked, but I couldn't found it. Here is a simple example to try to show what I need:

test = data.frame(my_groups = c("A", "A", "A", "B", "B", "C", "C", "C", "C"),
                  measure1 = c(1:9))

#as result:
#     group_A   group_B  group_C   measure1
# 1         1        0         0          1
# 1         1        0         0          2
# 1         1        0         0          3
# 1         0        1         0          4
# 1         0        1         0          5
# 1         0        0         1          6
# 1         0        0         1          7
# 1         0        0         1          8
# 1         0        0         1          9

Any hint on how can I do that?

267

asked Dec 03 '21 19:12

DR15

Video Answer

4 Answers

We may use dummy_cols from fastDummies

library(fastDummies)
library(dplyr)
test %>% 
    rename(group = 'my_groups') %>%
    dummy_cols('group', remove_selected_columns = TRUE) %>%    
    select(starts_with('group'), measure1)

-output

 group_A group_B group_C measure1
1       1       0       0        1
2       1       0       0        2
3       1       0       0        3
4       0       1       0        4
5       0       1       0        5
6       0       0       1        6
7       0       0       1        7
8       0       0       1        8
9       0       0       1        9

140

answered Nov 15 '22 05:11

akrun

Fortunately, there's a one-function Base R solution.

This type of problem happens a lot, and model.matrix() is built exactly for this.

# the "+ 0" is to avoid adding a column for the intercept.

model.matrix(~ my_groups + measure1 + 0, data=test)

Output:

  my_groupsA my_groupsB my_groupsC measure1
1          1          0          0        1
2          1          0          0        2
3          1          0          0        3
4          0          1          0        4
5          0          1          0        5
6          0          0          1        6
7          0          0          1        7
8          0          0          1        8
9          0          0          1        9

answered Nov 15 '22 07:11

Jason

Here's a base R solution, constructing the matrix using expand.grid, then adding the required names.

res <- data.frame( t( unique( matrix( as.numeric( do.call("==", expand.grid(
   test$my_groups, test$my_groups) ) ), dim(test)[1] ) ) ), test$measure1 )

colnames(res) <- c( paste0( "group_", unique(test$my_groups) ), colnames(test)[2] )

res
  group_A group_B group_C measure1
1       1       0       0        1
2       1       0       0        2
3       1       0       0        3
4       0       1       0        4
5       0       1       0        5
6       0       0       1        6
7       0       0       1        7
8       0       0       1        8
9       0       0       1        9

answered Nov 15 '22 05:11

Andre Wildberg

We can try this using dplyr or purrr.

library(tidyverse)

test = data.frame(my_groups = c("A", "A", "A", "B", "B", "C", "C", "C", "C"),
                  measure1 = c(1:9))

dummyfy <- 
as_mapper(~{
  len_row <- vector('numeric', nrow(test))
  len_row[.] <- c(1)
  len_row}
)

data <- pivot_wider(test, names_from =  my_groups, values_from = measure1)
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates

map(data, ~reduce(., c)) %>%
  map_dfr(dummyfy) %>% 
  bind_cols(test[-1])
#> # A tibble: 9 × 4
#>       A     B     C measure1
#>   <dbl> <dbl> <dbl>    <int>
#> 1     1     0     0        1
#> 2     1     0     0        2
#> 3     1     0     0        3
#> 4     0     1     0        4
#> 5     0     1     0        5
#> 6     0     0     1        6
#> 7     0     0     1        7
#> 8     0     0     1        8
#> 9     0     0     1        9

#equivalent using across:

data %>% summarise(across(everything(), ~reduce(., c) %>% dummyfy)) %>% bind_cols(test[-1])
#> # A tibble: 9 × 4
#>       A     B     C measure1
#>   <dbl> <dbl> <dbl>    <int>
#> 1     1     0     0        1
#> 2     1     0     0        2
#> 3     1     0     0        3
#> 4     0     1     0        4
#> 5     0     1     0        5
#> 6     0     0     1        6
#> 7     0     0     1        7
#> 8     0     0     1        8
#> 9     0     0     1        9

^{Created on 2021-12-03 by the reprex package (v2.0.1)}

answered Nov 15 '22 05:11

jpdugo17

Related questions
                            
                                Filter all rows with word next to a specified word in R
                            
                                Using table() function from base with dplyr pipe-syntax?
                            
                                tidyverse not loaded, it says "namespace ‘vctrs’ 0.2.0 is already loaded, but >= 0.2.1 is required"
                            
                                filter data in shiny app but keeping values in selectInput when updating table
                            
                                In writing an R package, using the flowCore::transform function, can I both use a variable name as text and get the actual value?
                            
                                Passing `lm` result to `stepAIC` works in script, fails inside function
                            
                                dplyr alternative for plyr::mapvalues (recode using dictionary)
                            
                                Combine multiple facet strips across columns in ggplot2 facet_wrap
                            
                                converting function calls to characters in R [duplicate]
                            
                                curly curly tidy evaluation programming with multiple inputs and custom function across columns
                            
                                Invalid value at 'start_index' (TYPE_UINT64), "1e+05" [invalid] issue while downloading data to R from BigQuery
                            
                                Print Shiny App Screen not working Error: shinyjs: extendShinyjs: `functions` argument must be provided
                            
                                Selecting a default value in an R plotly plot using a selectize box via crosstalk in R, using static html not shiny
                            
                                Setting Up Visual Studio code to work with R - "win32 can't use R"
                            
                                Pie chart and Bar chart aligned on same plot
                            
                                Warning in every model of glmmTMB 'giveCsparse'
                            
                                R formula: wrap all variables in a transformation
                            
                                Create new dataframe by dividing all possibles columns combination from another table
                            
                                R - ggmap - calculate shortest distance between cities via geocoding
                            
                                Fast way to calculate values in cells based on values in previous rows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Transform each column factors in a column containing just `0` or `1`

Tags:

dataframe

r

dplyr

DR15

People also ask

Video Answer

4 Answers

akrun

Jason

Andre Wildberg

jpdugo17

Recent Activity

Donate For Us