Let's say I have this dataframe: <pre class="prettyprint"><code>> df <- data.frame(A=1:5, B=c(0, 0, 3, 0, 0), C=c(1, 0, 0, 1, 0), D=c(0, 2, 0, 0, 1)) > df A B C D 1 1 0 1 0 2 2 0 0 2 3 3 3 0 0 4 4 0 1 0 5 5 0 0 1 </code></pre> How would I go about converting it to: <pre class="prettyprint"><code> A B C D 1 1 0 1 0 2 2 0 0 1 3 2 0 0 1 4 3 1 0 0 5 3 1 0 0 6 3 1 0 0 7 4 0 1 0 8 5 0 0 1 </code></pre> As you can see there are value <code>2</code> and <code>3</code>, I want to repeat them by that length and change the values back to 1. How would I do that? I also want to duplicate the the <code>A</code> column as you can see. I tried: <pre class="prettyprint"><code>replace(df[rep(rownames(df), select(df, -A)),], 2, 1) </code></pre> But it gives me an error.

One option would be to get max value from columns <code>B</code>, <code>C</code> and <code>D</code> using <code>pmax</code>, use <code>uncount</code> to repeat the rows. Use <code>pmin</code> to replace the values greater than 1 to 1. <pre class="prettyprint"><code>library(dplyr) library(tidyr) df %>% mutate(repeat_row = pmax(B, C, D)) %>% uncount(repeat_row) %>% mutate(across(-A, pmin, 1)) # A B C D #1 1 0 1 0 #2 2 0 0 1 #3 2 0 0 1 #4 3 1 0 0 #5 3 1 0 0 #6 3 1 0 0 #7 4 0 1 0 #8 5 0 0 1 </code></pre>

Apparently, there's just one value > 0 in columns B to D, so we can exploit the partial <code>rowSums</code> for a <code>replicate</code> call on columns B to D binarized using <code>> 0</code>. So that we can use this in <code>Map</code>, we <code>t</code>ranspose twice. Rest is cosmetics. <pre class="prettyprint"><code>t(do.call(cbind, Map(replicate, rowSums(df[-1]), as.data.frame(t(cbind(df[1], df[-1] > 0)))))) |> as.data.frame() |> setNames(names(df)) # A B C D # 1 1 0 1 0 # 2 2 0 0 1 # 3 2 0 0 1 # 4 3 1 0 0 # 5 3 1 0 0 # 6 3 1 0 0 # 7 4 0 1 0 # 8 5 0 0 1 </code></pre> Note: R>=4.1 used.

How to repeat rows by their value by multiple columns and divide back

Tags:

dataframe

r

repeat

Let's say I have this dataframe:

> df <- data.frame(A=1:5, B=c(0, 0, 3, 0, 0), C=c(1, 0, 0, 1, 0), D=c(0, 2, 0, 0, 1))
> df
  A B C D
1 1 0 1 0
2 2 0 0 2
3 3 3 0 0
4 4 0 1 0
5 5 0 0 1

How would I go about converting it to:

As you can see there are value 2 and 3, I want to repeat them by that length and change the values back to 1. How would I do that?

I also want to duplicate the the A column as you can see.

I tried:

replace(df[rep(rownames(df), select(df, -A)),], 2, 1)

But it gives me an error.

530

asked Oct 13 '21 03:10

U12-Forward

Video Answer

2 Answers

One option would be to get max value from columns B, C and D using pmax, use uncount to repeat the rows. Use pmin to replace the values greater than 1 to 1.

library(dplyr)
library(tidyr)

df %>%
  mutate(repeat_row = pmax(B, C, D)) %>%
  uncount(repeat_row) %>%
  mutate(across(-A, pmin, 1))

#  A B C D
#1 1 0 1 0
#2 2 0 0 1
#3 2 0 0 1
#4 3 1 0 0
#5 3 1 0 0
#6 3 1 0 0
#7 4 0 1 0
#8 5 0 0 1

162

answered Nov 15 '22 00:11

Ronak Shah

Apparently, there's just one value > 0 in columns B to D, so we can exploit the partial rowSums for a replicate call on columns B to D binarized using > 0. So that we can use this in Map, we transpose twice. Rest is cosmetics.

t(do.call(cbind, Map(replicate,
                     rowSums(df[-1]), 
                     as.data.frame(t(cbind(df[1], df[-1] > 0)))))) |>
  as.data.frame() |>
  setNames(names(df))
#   A B C D
# 1 1 0 1 0
# 2 2 0 0 1
# 3 2 0 0 1
# 4 3 1 0 0
# 5 3 1 0 0
# 6 3 1 0 0
# 7 4 0 1 0
# 8 5 0 0 1

Note: R>=4.1 used.

answered Nov 15 '22 00:11

jay.sf

Related questions
                            
                                R not calculating large cubes correctly?
                            
                                Applying a matrix to a function [duplicate]
                            
                                Making pretty equations in RMarkdown with LaTeX
                            
                                Find groups of overlapping intervals with data.table
                            
                                R :: data.table: Generate a running balance by group using previous balance and row-wise iteration
                            
                                R doParallel Progress bar to monitor finished jobs
                            
                                Appending the row to data.table works differently than in data.frame: How and why?
                            
                                base R substitute names of the arguments to function call
                            
                                ggplot make legend symbols thinner
                            
                                How can you make tidyverse functions that support both quoted and unquoted arguments?
                            
                                mlogit.data() Error: Assigned data `ids` must be compatible with existing data
                            
                                Selected columns to new row
                            
                                Application of a recursive function within a dplyr context in R
                            
                                Add space above one of sidebar menu item
                            
                                R Blogdown Hugo academic theme not rendering site
                            
                                Fill a column with a vector if condition is met
                            
                                R - How to rearrange rows in a data frame while maintaining their grouping?
                            
                                What is a python/pandas equivalent to R's `with`?
                            
                                How to select entire matrix except certain rows and columns?
                            
                                xaringan set the document title dynamically

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With