Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting binary Sequences with R

Tags:

math

r

statistics

Imagine the folowing Sequences:

0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111

I want to sort the Sequences in this order, because of the similarity:

0000
0001
0010
0100
1000
0011
...

Line 2,3,4,5 have the same similarity to line 1 because they only differ by one bit. So the order of line 2,3,4,5 can also be 3,2,5,4.

Line 6 comes next, because it differs by 2 bits from line1.

Could this be done with R?

like image 888
Hans-Christian Willibald Avatar asked Jul 14 '16 20:07

Hans-Christian Willibald


3 Answers

Let

x <- c("0000", "0001", "0010", "0011", "0100", "0101", "0110", "0111", 
       "1000", "1001", "1010", "1011", "1100", "1101", "1110", "1111")

1) Using digitsum function from this answer:

digitsum <- function(x) sum(floor(x / 10^(0:(nchar(x) - 1))) %% 10)
x[order(sapply(as.numeric(x), digitsum))]
#  [1] "0000" "0001" "0010" "0100" "1000" "0011" "0101" "0110" "1001" "1010" "1100"
# [12] "0111" "1011" "1101" "1110" "1111"

2) Using regular expressions:

x[order(gsub(0, "", x))]
#  [1] "0000" "0001" "0010" "0100" "1000" "0011" "0101" "0110" "1001" "1010" "1100"
# [12] "0111" "1011" "1101" "1110" "1111"
like image 104
Julius Vainora Avatar answered Sep 30 '22 10:09

Julius Vainora


Since we are talking about string distances you might want to use the stringdist function from the stringdist package to do this:

library(stringdist)
x <- c("0000", "0001", "0010", "0011", "0100", "0101", "0110", "0111", 
       "1000", "1001", "1010", "1011", "1100", "1101", "1110", "1111")

#stringdistmatrix(x) will calculate the pairwise distances from the lowest value
#0000 in this case
distances <- stringdistmatrix(x, '0000')

#use the distances to order the vector
x[order(distances)]
#[1] "0000" "0001" "0010" "0100" "1000" "0011" "0101" "0110" 
#    "1001" "1010" "1100" "0111" "1011" "1101" "1110" "1111"

Or in one go:

x[order(stringdist(x, '0000'))]
like image 35
LyzandeR Avatar answered Sep 30 '22 09:09

LyzandeR


Well, here's what I tried. Give it a shot and see if it suits your needs. It does depend on the stringr package

library('stringr')
# Creates a small test data frame to mimic the data you have.
df <- data.frame(numbers = c('0000', '0001', '0010', '0011', '0100', '0101', '0111', '1000'), stringsAsFactors = FALSE)
df$count <- str_count(df$numbers, '1') # Counts instances of 1 occurring in each string
df[with(df, order(count)), ] # Orders data frame by number of counts.

  numbers count
1    0000     0
2    0001     1
3    0010     1
5    0100     1
8    1000     1
4    0011     2
6    0101     2
7    0111     3
like image 24
Sam Avatar answered Sep 30 '22 10:09

Sam