I created these vectors: <pre class="prettyprint"><code>Letters <- c("A","C","E","G","H","J","K") Numbers <- c(0,1,2,3,4,6,7,9) AlphaNumeric <- c(Letters, Numbers) </code></pre> I would like to receive a dataframe of all 3-element combinations (e.g. AA1, G26 etc.) using all elements mentioned above following three conditions: 1.) The first element is a letter 2.) The second element is a number or the SAME letter as the first element 3.) The third element is a number Approach: I have tried to use <code>expand.grid()</code> and successfully managed to get ALL combinations with 3 elements. Then I tried <code>expand.grid(x = Letters, y = AlphaNumeric, z = Numbers)</code> and managed to achieve 1.) and 3.) but failed to manage 2.) so far. Unsatisfying Solution: I have figured out a way of doing this with a for-loop, but I guess there must be a way easier way of doing it other than: <pre class="prettyprint"><code> LNN <- expand.grid(x = Letters, y = Numbers, z = Numbers) for ( Element in Letters) { currentLLN <- expand.grid(x = Element, y = Element, z = Numbers) LNN <- merge(LNN, currentLLN, all = TRUE)} </code></pre> Any help would be greatly appreciated, thank you, Christian

You could create two dataframes, one where the second element is a number, and one where the second element is the same as the first element, and then <code>rbind</code> those. An example is given below, note that I have limited your example data for illustration purposes. <pre class="prettyprint"><code>Letters <- LETTERS[1:3] Numbers <- c(1,2) df1 = expand.grid(v1=Letters,v3=Numbers,stringsAsFactors = F) df1$v2 = df1$v1 df1 = df1[,c('v1','v2','v3')] df2 = expand.grid(v1=Letters,v2=as.character(Numbers),v3=Numbers, stringsAsFactors = F) df = rbind(df1,df2) </code></pre> Output: <pre class="prettyprint"><code>> df v1 v2 v3 1 A A 1 2 B B 1 3 C C 1 4 A A 2 5 B B 2 6 C C 2 7 A 1 1 8 B 1 1 9 C 1 1 10 A 2 1 11 B 2 1 12 C 2 1 13 A 1 2 14 B 1 2 15 C 1 2 16 A 2 2 17 B 2 2 18 C 2 2 </code></pre> Hope this helps! <hr> Although both answers run very fast and Parfait's solution is a nice solution to your problem and I certainly do not want to discredit his answer, I think it is good to point out that creating extra combinations and subsetting will become a larger issue when you data is larger. A benchmark is shown below. <pre class="prettyprint"><code>Letters <- c(LETTERS[1:26],letters[1:4]) Numbers <- seq(30) AlphaNumeric <- c(Letters, Numbers) f_flo <- function() { df1 = expand.grid(v1=Letters,v3=Numbers,stringsAsFactors = F) df1$v2 = df1$v1 df1 = df1[,c('v1','v2','v3')] df2 = expand.grid(v1=Letters,v2=as.character(Numbers),v3=Numbers, stringsAsFactors = F) df = rbind(df1,df2) } f_parfait <- function() { df <- expand.grid(x = Letters, y = AlphaNumeric, z = Numbers, stringsAsFactors = FALSE) sub <- subset(df, (x == y | grepl("[0-9]", y)) & grepl("[0-9]", z) ) sub <- with(sub, sub[order(x, y, z),]) # SORT DATAFRAME rownames(sub) <- NULL # RESET ROWNAMES } library(dplyr) one_letter <- function(l) { expand.grid(l, c(l, Numbers), Numbers, stringsAsFactors = FALSE) } f_stibu <- function(){ df <- bind_rows(lapply(Letters, one_letter)) } library(microbenchmark) library(ggplot2) run_times = microbenchmark(f_flo(),f_parfait(),f_stibu()) autoplot(run_times) </code></pre> Results: <pre class="prettyprint"><code>Unit: milliseconds expr min lq mean median uq max neval cld f_flo() 1.900719 2.047591 3.666935 2.314258 3.922053 78.74793 100 a f_parfait() 138.028364 142.529904 152.876116 144.159444 146.835958 246.92318 100 b f_stibu() 4.130464 4.333130 5.169664 4.585028 6.209233 10.23139 100 a </code></pre> <img src="https://i.stack.imgur.com/aIjq0.png" alt="enter image description here">

Simply subset your <code>expand.grid()</code> dataframe with <code>grepl</code> calls: <pre class="prettyprint"><code>df <- expand.grid(x = Letters, y = AlphaNumeric, z = Numbers, stringsAsFactors = FALSE) sub <- subset(df, (x == y | grepl("[0-9]", y)) ) sub <- with(sub, sub[order(x, y, z),]) # SORT DATAFRAME rownames(sub) <- NULL # RESET ROWNAMES head(sub, 10) # x y z # 1 A 0 0 # 2 A 0 1 # 3 A 0 2 # 4 A 0 3 # 5 A 0 4 # 6 A 0 6 # 7 A 0 7 # 8 A 0 9 # 9 A 1 0 </code></pre>

All combinations of letters/numbers under specific conditions

Tags:

for-loop

dataframe

r

I created these vectors:

Letters <- c("A","C","E","G","H","J","K")  
Numbers <- c(0,1,2,3,4,6,7,9) 
AlphaNumeric <- c(Letters, Numbers)

I would like to receive a dataframe of all 3-element combinations (e.g. AA1, G26 etc.) using all elements mentioned above following three conditions:

1.) The first element is a letter

2.) The second element is a number or the SAME letter as the first element

3.) The third element is a number

Approach: I have tried to use expand.grid() and successfully managed to get ALL combinations with 3 elements. Then I tried expand.grid(x = Letters, y = AlphaNumeric, z = Numbers) and managed to achieve 1.) and 3.) but failed to manage 2.) so far.

Unsatisfying Solution: I have figured out a way of doing this with a for-loop, but I guess there must be a way easier way of doing it other than:

   LNN <- expand.grid(x = Letters, y = Numbers, z = Numbers)

   for ( Element in Letters) {
       currentLLN <- expand.grid(x = Element, y = Element, z = Numbers)
       LNN <- merge(LNN, currentLLN, all = TRUE)}

Any help would be greatly appreciated, thank you, Christian

725

asked Feb 23 '18 15:02

Christian Schano

2 Answers

You could create two dataframes, one where the second element is a number, and one where the second element is the same as the first element, and then rbind those. An example is given below, note that I have limited your example data for illustration purposes.

Letters <- LETTERS[1:3]  
Numbers <- c(1,2)

df1 = expand.grid(v1=Letters,v3=Numbers,stringsAsFactors = F)
df1$v2 = df1$v1
df1 = df1[,c('v1','v2','v3')]
df2 = expand.grid(v1=Letters,v2=as.character(Numbers),v3=Numbers, stringsAsFactors = F)
df = rbind(df1,df2)

Output:

> df
   v1 v2 v3
1   A  A  1
2   B  B  1
3   C  C  1
4   A  A  2
5   B  B  2
6   C  C  2
7   A  1  1
8   B  1  1
9   C  1  1
10  A  2  1
11  B  2  1
12  C  2  1
13  A  1  2
14  B  1  2
15  C  1  2
16  A  2  2
17  B  2  2
18  C  2  2

Hope this helps!

Although both answers run very fast and Parfait's solution is a nice solution to your problem and I certainly do not want to discredit his answer, I think it is good to point out that creating extra combinations and subsetting will become a larger issue when you data is larger. A benchmark is shown below.

Letters <- c(LETTERS[1:26],letters[1:4])
Numbers <- seq(30)
AlphaNumeric <- c(Letters, Numbers)


f_flo <- function()
{
  df1 = expand.grid(v1=Letters,v3=Numbers,stringsAsFactors = F)
  df1$v2 = df1$v1
  df1 = df1[,c('v1','v2','v3')]
  df2 = expand.grid(v1=Letters,v2=as.character(Numbers),v3=Numbers, stringsAsFactors = F)
  df = rbind(df1,df2)
}

f_parfait <- function()
{
  df <- expand.grid(x = Letters, y = AlphaNumeric, z = Numbers, stringsAsFactors = FALSE)
  sub <- subset(df,  (x == y | grepl("[0-9]", y)) &  grepl("[0-9]", z) )
  sub <- with(sub, sub[order(x, y, z),])   # SORT DATAFRAME
  rownames(sub) <- NULL                    # RESET ROWNAMES
}

library(dplyr)
one_letter <- function(l) {
  expand.grid(l, c(l, Numbers), Numbers, stringsAsFactors = FALSE)
}

f_stibu <- function(){
  df <- bind_rows(lapply(Letters, one_letter))
}


library(microbenchmark)
library(ggplot2)

run_times = microbenchmark(f_flo(),f_parfait(),f_stibu())
autoplot(run_times)

Results:

Unit: milliseconds
        expr        min         lq       mean     median         uq       max neval cld
     f_flo()   1.900719   2.047591   3.666935   2.314258   3.922053  78.74793   100  a 
 f_parfait() 138.028364 142.529904 152.876116 144.159444 146.835958 246.92318   100   b
   f_stibu()   4.130464   4.333130   5.169664   4.585028   6.209233  10.23139   100  a

enter image description here

200

answered Nov 02 '22 22:11

Florian

Simply subset your expand.grid() dataframe with grepl calls:

df <- expand.grid(x = Letters, y = AlphaNumeric, z = Numbers, stringsAsFactors = FALSE)

sub <- subset(df,  (x == y | grepl("[0-9]", y)) )

sub <- with(sub, sub[order(x, y, z),])   # SORT DATAFRAME
rownames(sub) <- NULL                    # RESET ROWNAMES

head(sub, 10)    
#    x y z
# 1  A 0 0
# 2  A 0 1
# 3  A 0 2
# 4  A 0 3
# 5  A 0 4
# 6  A 0 6
# 7  A 0 7
# 8  A 0 9
# 9  A 1 0

answered Nov 02 '22 23:11

Parfait

Related questions
                            
                                remove all delimiters at beginning and end of string
                            
                                R shinydashboard custom CSS to valueBox
                            
                                Remove certain words in string from column in dataframe in R
                            
                                R collapse multiple rows into 1 row - same columns
                            
                                multiply two data.tables, keep all possibilities
                            
                                What does the "+" symbol mean on the left side of the R console?
                            
                                How to draw rainfall runoff graph in R using ggplot?
                            
                                R: Extracting non-duplicated values from vector (not keeping one value for duplicates) [duplicate]
                            
                                Delete rows based on multiple conditions in r [duplicate]
                            
                                Convert nested list elements into data frame and bind the result into one data frame
                            
                                trouble installing and loading rJava on mac El Capitan
                            
                                shiny app with module as a package
                            
                                How to interpret error "elements..... must be named" when sourcing an R6 class?
                            
                                image logo over TOC in Rmarkdown
                            
                                Split a vector into chunks such that sum of each chunk is approximately constant
                            
                                Indent without adding a bullet point or number in RMarkdown
                            
                                Convert Excel numeric to date
                            
                                wrapping long geom_text labels
                            
                                How to correctly output Plotly plots in shiny?
                            
                                Using dplyr summarize with different operations for multiple columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With