Given the following three sequences of numbers, I would like to figure out how to group the numbers to find the closest relations between them. <pre class="prettyprint"><code>1,2,3,4 4,3,5 2,1,3 ... </code></pre> I'm not sure what the algorithm(s) I'm looking for are called, but we can see stronger relations with some of the numbers than with others. These numbers appear together twice: <pre class="prettyprint"><code>1 & 2 1 & 3 2 & 3 3 & 4 </code></pre> Together once: <pre class="prettyprint"><code>1 & 4 2 & 4 3 & 5 4 & 5 </code></pre> So for example, we can see there must be a relationship between <code>1, 2, & 3</code> since they all appear together at least twice. You could also say that <code>3 & 4</code> are closely related since they also appear twice. However, the algorithm might pick <code>[1,2,3]</code> (over <code>[3,4]</code>) since it's a bigger grouping (more inclusive). We can form any of the following groupings if we stick the numbers used most often together in a group: <pre class="prettyprint"><code>[1,2,3] & [4,5] [1,2] & [3,4] & [5] [1,2] & [3,4,5] [1,2] & [3,4] & [5] </code></pre> If duplicates are allowed, you could even end up with the following groups: <pre class="prettyprint"><code>[1,2,3,4] [1,2,3] [3,4] [5] </code></pre> I can't say which grouping is most "correct", but all four of these combos all find different ways of semi-correctly grouping the numbers. I'm not looking for a specific grouping - just a general cluster algorithm that works fairly well and is easy to understand. I'm sure there are many other ways to use the occurrence count to group them as well. What would be a good base grouping algorithm for these? Samples in Go, Javascript, or PHP are preferred.

As already been mentioned it's about clique. If you want exact answer you will face Maximum Clique Problem which is NP-complete. So all below make any sense only if alphabet of your symbols(numbers) has reasonable size. In this case strait-forward, not very optimised algorithm for Maximum Clique Problem in pseudo-code would be <pre class="prettyprint"><code>Function Main Cm ← ∅ // the maximum clique Clique(∅,V) // V vertices set return Cm End function Main Function Clique(set C, set P) // C the current clique, P candidat set if (|C| > |Cm|) then Cm ← C End if if (|C|+|P|>|Cm|)then for all p ∈ P in predetermined order, do P ← P \ {p} Cp ←C ∪ {p} Pp ←P ∩ N(p) //N(p) set of the vertices adjacent to p Clique(Cp,Pp) End for End if End function Clique </code></pre> Because of Go is my language of choice here is implementation <pre class="prettyprint"><code>package main import ( "bufio" "fmt" "sort" "strconv" "strings" ) var adjmatrix map[int]map[int]int = make(map[int]map[int]int) var Cm []int = make([]int, 0) var frequency int //For filter type resoult [][]int var res resoult var filter map[int]bool = make(map[int]bool) var bf int //For filter //That's for sorting func (r resoult) Less(i, j int) bool { return len(r[i]) > len(r[j]) } func (r resoult) Swap(i, j int) { r[i], r[j] = r[j], r[i] } func (r resoult) Len() int { return len(r) } //That's for sorting //Work done here func Clique(C []int, P map[int]bool) { if len(C) >= len(Cm) { Cm = make([]int, len(C)) copy(Cm, C) } if len(C)+len(P) >= len(Cm) { for k, _ := range P { delete(P, k) Cp := make([]int, len(C)+1) copy(Cp, append(C, k)) Pp := make(map[int]bool) for n, m := range adjmatrix[k] { _, ok := P[n] if ok && m >= frequency { Pp[n] = true } } Clique(Cp, Pp) res = append(res, Cp) //Cleanup resoult bf := 0 for _, v := range Cp { bf += 1 << uint(v) } _, ok := filter[bf] if !ok { filter[bf] = true res = append(res, Cp) } //Cleanup resoult } } } //Work done here func main() { var toks []string var numbers []int var number int //Input parsing StrReader := strings.NewReader(`1,2,3 4,3,5 4,1,6 4,2,7 4,1,7 2,1,3 5,1,2 3,6`) scanner := bufio.NewScanner(StrReader) for scanner.Scan() { toks = strings.Split(scanner.Text(), ",") numbers = []int{} for _, v := range toks { number, _ = strconv.Atoi(v) numbers = append(numbers, number) } for k, v := range numbers { for _, m := range numbers[k:] { _, ok := adjmatrix[v] if !ok { adjmatrix[v] = make(map[int]int) } _, ok = adjmatrix[m] if !ok { adjmatrix[m] = make(map[int]int) } if m != v { adjmatrix[v][m]++ adjmatrix[m][v]++ if adjmatrix[v][m] > frequency { frequency = adjmatrix[v][m] } } } } } //Input parsing P1 := make(map[int]bool) //Iterating for frequency of appearance in group for ; frequency > 0; frequency-- { for k, _ := range adjmatrix { P1[k] = true } Cm = make([]int, 0) res = make(resoult, 0) Clique(make([]int, 0), P1) sort.Sort(res) fmt.Print(frequency, "x-times ", res, " ") } //Iterating for frequency of appearing together } </code></pre> And here you can see it works https://play.golang.org/p/ZiJfH4Q6GJ and play with input data. But once more, this approach is for reasonable size alphabet(and input data of any size).

Grouping numbers based on occurrences?

Tags:

algorithm

go

graph-algorithm

Given the following three sequences of numbers, I would like to figure out how to group the numbers to find the closest relations between them.

1,2,3,4 4,3,5 2,1,3 ...

I'm not sure what the algorithm(s) I'm looking for are called, but we can see stronger relations with some of the numbers than with others.

These numbers appear together twice:

1 & 2 1 & 3 2 & 3 3 & 4

Together once:

1 & 4 2 & 4 3 & 5 4 & 5

So for example, we can see there must be a relationship between 1, 2, & 3 since they all appear together at least twice. You could also say that 3 & 4 are closely related since they also appear twice. However, the algorithm might pick [1,2,3] (over [3,4]) since it's a bigger grouping (more inclusive).

We can form any of the following groupings if we stick the numbers used most often together in a group:

[1,2,3] & [4,5] [1,2]   & [3,4]   & [5] [1,2]   & [3,4,5] [1,2]   & [3,4]   & [5]

If duplicates are allowed, you could even end up with the following groups:

[1,2,3,4] [1,2,3] [3,4] [5]

I can't say which grouping is most "correct", but all four of these combos all find different ways of semi-correctly grouping the numbers. I'm not looking for a specific grouping - just a general cluster algorithm that works fairly well and is easy to understand.

I'm sure there are many other ways to use the occurrence count to group them as well. What would be a good base grouping algorithm for these? Samples in Go, Javascript, or PHP are preferred.

804

asked Apr 18 '15 17:04

Xeoncross

2 Answers

Each of your three sequences can be understood as a clique in a multigraph. Within a clique, every vertex is connected to every other vertex.

The following graph represents your sample case with the edges in each clique colored red, blue, and green, respectively.

Multigraph with five vertices and three cliques

As you have already shown, we can classify pairs of vertices according to the number of edges between them. In the illustration, we can see that four pairs of vertices are connected by two edges each, and four other pairs of vertices are connected by one edge each.

We can go on to classify vertices according to the number of cliques in which they appear. In some sense we are ranking vertices according to their connectedness. A vertex that appears in k cliques can be thought of as connected to the same degree as other vertices that appear in k cliques. In the image, we see three groups of vertices: vertex 3 appears in three cliques; vertices 1, 2, and 4 each appear in two cliques; vertex 5 appears in one clique.

The Go program below computes the edge classification as well as the vertex classification. The input to the program contains, on the first line, the number of vertices n and the number of cliques m. We assume that the vertices are numbered from 1 to n. Each of the succeeding m lines of input is a space-separated list of vertices belonging to a clique. Thus, the problem instance given in the question is represented by this input:

5 3 1 2 3 4 4 3 5 2 1 3

The corresponding output is:

Number of edges between pairs of vertices:     2 edges: (1, 2) (1, 3) (2, 3) (3, 4)     1 edge:  (1, 4) (2, 4) (3, 5) (4, 5)  Number of cliques in which a vertex appears:     3 cliques: 3     2 cliques: 1 2 4     1 clique:  5

And here is the Go program:

package main  import (         "bufio"         "fmt"         "os"         "strconv"         "strings" )  func main() {         // Set up input and output.         reader := bufio.NewReader(os.Stdin)         writer := bufio.NewWriter(os.Stdout)         defer writer.Flush()          // Get the number of vertices and number of cliques from the first line.         line, err := reader.ReadString('\n')         if err != nil {                 fmt.Fprintf(os.Stderr, "Error reading first line: %s\n", err)                 return         }         var numVertices, numCliques int         numScanned, err := fmt.Sscanf(line, "%d %d", &numVertices, &numCliques)         if numScanned != 2 || err != nil {                 fmt.Fprintf(os.Stderr, "Error parsing input parameters: %s\n", err)                    return         }          // Initialize the edge counts and vertex counts.         edgeCounts := make([][]int, numVertices+1)         for u := 1; u <= numVertices; u++ {                 edgeCounts[u] = make([]int, numVertices+1)         }         vertexCounts := make([]int, numVertices+1)          // Read each clique and update the edge counts.         for c := 0; c < numCliques; c++ {                 line, err = reader.ReadString('\n')                 if err != nil {                         fmt.Fprintf(os.Stderr, "Error reading clique: %s\n", err)                         return                 }                 tokens := strings.Split(strings.TrimSpace(line), " ")                 clique := make([]int, len(tokens))                 for i, token := range tokens {                         u, err := strconv.Atoi(token)                         if err != nil {                                 fmt.Fprintf(os.Stderr, "Atoi error: %s\n", err)                                 return                         }                         vertexCounts[u]++                         clique[i] = u                         for j := 0; j < i; j++ {                                 v := clique[j]                                 edgeCounts[u][v]++                                 edgeCounts[v][u]++                         }                 }         }          // Compute the number of edges between each pair of vertices.         count2edges := make([][][]int, numCliques+1)         for u := 1; u < numVertices; u++ {                 for v := u + 1; v <= numVertices; v++ {                         count := edgeCounts[u][v]                         count2edges[count] = append(count2edges[count],                                 []int{u, v})                 }         }         writer.WriteString("Number of edges between pairs of vertices:\n")         for count := numCliques; count >= 1; count-- {                 edges := count2edges[count]                 if len(edges) == 0 {                         continue                 }                 label := "edge"                 if count > 1 {                         label += "s:"                 } else {                         label += ": "                 }                 writer.WriteString(fmt.Sprintf("%5d %s", count, label))                 for _, edge := range edges {                         writer.WriteString(fmt.Sprintf(" (%d, %d)",                                 edge[0], edge[1]))                 }                 writer.WriteString("\n")         }          // Group vertices according to the number of clique memberships.         count2vertices := make([][]int, numCliques+1)         for u := 1; u <= numVertices; u++ {                 count := vertexCounts[u]                 count2vertices[count] = append(count2vertices[count], u)         }         writer.WriteString("\nNumber of cliques in which a vertex appears:\n")         for count := numCliques; count >= 1; count-- {                 vertices := count2vertices[count]                 if len(vertices) == 0 {                         continue                 }                 label := "clique"                 if count > 1 {                         label += "s:"                 } else {                         label += ": "                 }                 writer.WriteString(fmt.Sprintf("%5d %s", count, label))                 for _, u := range vertices {                         writer.WriteString(fmt.Sprintf(" %d", u))                 }                 writer.WriteString("\n")         } }

199

answered Nov 09 '22 16:11

Michael Laszlo

As already been mentioned it's about clique. If you want exact answer you will face Maximum Clique Problem which is NP-complete. So all below make any sense only if alphabet of your symbols(numbers) has reasonable size. In this case strait-forward, not very optimised algorithm for Maximum Clique Problem in pseudo-code would be

Function Main     Cm ← ∅                   // the maximum clique     Clique(∅,V)              // V vertices set     return Cm End function Main  Function Clique(set C, set P) // C the current clique, P candidat set     if (|C| > |Cm|) then         Cm ← C     End if     if (|C|+|P|>|Cm|)then         for all p ∈ P in predetermined order, do             P ← P \ {p}             Cp ←C ∪ {p}             Pp ←P ∩ N(p)        //N(p) set of the vertices adjacent to p             Clique(Cp,Pp)         End for     End if End function Clique

Because of Go is my language of choice here is implementation

package main  import (     "bufio"     "fmt"     "sort"     "strconv"     "strings" )  var adjmatrix map[int]map[int]int = make(map[int]map[int]int) var Cm []int = make([]int, 0) var frequency int   //For filter type resoult [][]int var res resoult var filter map[int]bool = make(map[int]bool) var bf int //For filter   //That's for sorting func (r resoult) Less(i, j int) bool {     return len(r[i]) > len(r[j]) }  func (r resoult) Swap(i, j int) {     r[i], r[j] = r[j], r[i] }  func (r resoult) Len() int {     return len(r) } //That's for sorting   //Work done here func Clique(C []int, P map[int]bool) {     if len(C) >= len(Cm) {          Cm = make([]int, len(C))         copy(Cm, C)     }     if len(C)+len(P) >= len(Cm) {         for k, _ := range P {             delete(P, k)             Cp := make([]int, len(C)+1)             copy(Cp, append(C, k))             Pp := make(map[int]bool)             for n, m := range adjmatrix[k] {                 _, ok := P[n]                 if ok && m >= frequency {                     Pp[n] = true                 }             }             Clique(Cp, Pp)              res = append(res, Cp)             //Cleanup resoult             bf := 0             for _, v := range Cp {                 bf += 1 << uint(v)             }             _, ok := filter[bf]             if !ok {                 filter[bf] = true                 res = append(res, Cp)             }             //Cleanup resoult         }     } } //Work done here  func main() {     var toks []string     var numbers []int     var number int   //Input parsing     StrReader := strings.NewReader(`1,2,3 4,3,5 4,1,6 4,2,7 4,1,7 2,1,3 5,1,2 3,6`)     scanner := bufio.NewScanner(StrReader)     for scanner.Scan() {         toks = strings.Split(scanner.Text(), ",")         numbers = []int{}         for _, v := range toks {             number, _ = strconv.Atoi(v)             numbers = append(numbers, number)          }         for k, v := range numbers {             for _, m := range numbers[k:] {                 _, ok := adjmatrix[v]                 if !ok {                     adjmatrix[v] = make(map[int]int)                 }                 _, ok = adjmatrix[m]                 if !ok {                     adjmatrix[m] = make(map[int]int)                 }                 if m != v {                     adjmatrix[v][m]++                     adjmatrix[m][v]++                     if adjmatrix[v][m] > frequency {                         frequency = adjmatrix[v][m]                     }                 }              }         }     }     //Input parsing      P1 := make(map[int]bool)       //Iterating for frequency of appearance in group     for ; frequency > 0; frequency-- {         for k, _ := range adjmatrix {             P1[k] = true         }         Cm = make([]int, 0)         res = make(resoult, 0)         Clique(make([]int, 0), P1)         sort.Sort(res)         fmt.Print(frequency, "x-times ", res, " ")     }     //Iterating for frequency of appearing together }

And here you can see it works https://play.golang.org/p/ZiJfH4Q6GJ and play with input data. But once more, this approach is for reasonable size alphabet(and input data of any size).

answered Nov 09 '22 15:11

Uvelichitel

Related questions
                            
                                What is the fastest way to find Nth biggest number of an INT array?
                            
                                Algorithm to find k smallest numbers in array of n items
                            
                                All factors of a given number
                            
                                Changing integer to binary string of digits
                            
                                How to calculate a standard deviation [array] [duplicate]
                            
                                Given a list of numbers and a number k, return whether any two numbers from the list add up to k
                            
                                Help me understand Inorder Traversal without using recursion
                            
                                How to change a negative number to zero in python without using decision structures
                            
                                Boyer Moore Algorithm Understanding and Example?
                            
                                How to efficiently rack up billiards for an 8-ball game?
                            
                                Is there an efficient algorithm for segmentation of handwritten text?
                            
                                What is a good open source B-tree implementation in C? [closed]
                            
                                How to optimally solve the flood fill puzzle?
                            
                                What's the fastest way to brush up on algorithms for a technical interview (on Monday)? [closed]
                            
                                Fast way of getting the dominant color of an image [closed]
                            
                                Real world applications of Binary heaps and Fibonacci Heaps [closed]
                            
                                How does heap compaction work quickly?
                            
                                Implementation of a work stealing queue in C/C++? [closed]
                            
                                Help Understanding Cross Validation and Decision Trees
                            
                                Bad implementation of Enumerable.Single?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With