Suppose I want to predict if a person is of <code>class1=healthy</code> or of <code>class2= fever</code>. I have a data set with the following domain: <code>{normal,cold,dizzy}</code> The transition matrix would contain the probability of transition generated from our training dataset while the initial vector would contain the probability that a person starts(day1) with a state x from the domain <code>{normal,cold,dizzy}</code>, again this is also generated from our training set. If I want to build a first order markov chain, I would generate a 3x3 transition matrix and a 1x3 initial vector per class like so: <pre class="prettyprint"><code>> TransitionMatrix normal cold dizzy normal NA NA NA cold NA NA NA dizzy NA NA NA >Initial Vector normal cold dizzy [1,] NA NA NA </code></pre> The NA will be filled with the corresponding probabilities. 1-My question is about transition matrices in higher order chain. For example in second order MC would we have a transition matrix of size <code>domain²xdomain²</code> like so: <pre class="prettyprint"><code> normal->normal normal->cold normal->dizzy cold->normal cold->cold cold->dizzy dizzy->normal dizzy->cold dizzy->dizzy normal->normal NA NA NA NA NA NA NA NA NA normal->cold NA NA NA NA NA NA NA NA NA normal->dizzy NA NA NA NA NA NA NA NA NA cold->normal NA NA NA NA NA NA NA NA NA cold->cold NA NA NA NA NA NA NA NA NA cold->dizzy NA NA NA NA NA NA NA NA NA dizzy->normal NA NA NA NA NA NA NA NA NA dizzy->cold NA NA NA NA NA NA NA NA NA dizzy->dizzy NA NA NA NA NA NA NA NA NA </code></pre> here the cell <code>(1,1)</code> represents the following sequence: <code>normal->normal->normal->normal</code> or would it instead be just <code>domain²xdomain</code> like so: <pre class="prettyprint"><code> normal cold dizzy normal->normal NA NA NA normal->cold NA NA NA normal->dizzy NA NA NA cold->normal NA NA NA cold->cold NA NA NA cold->dizzy NA NA NA dizzy->normal NA NA NA dizzy->cold NA NA NA dizzy->dizzy NA NA NA </code></pre> here the cell <code>(1,1)</code> represents <code>normal->normal->normal</code> which is different from the previous representation 2-What about the initial vector for a MC of degree 2. Would we need two initial vectors of size <code>1xdomain</code> like so: <pre class="prettyprint"><code> normal cold dizzy [1,] NA NA NA </code></pre> leading to two initial vectors per class. the first giving the probability of occurrence of <code>{normal,cold,dizzy}</code> on the first day for the <code>healthy/fever</code> class while the second gives the probability of occurrence on the second day for the <code>healthy/fever</code>. this would give 4 initial vectors. OR would we just need one initial vector of size <code>1xdomain²</code>like so: <pre class="prettyprint"><code> normal->normal normal->cold normal->dizzy cold->normal cold->cold cold->dizzy dizzy->normal dizzy->cold dizzy->dizzy [1,] NA NA NA NA NA NA NA NA NA </code></pre> I can see how the second way of representing the initial vector would be problematic in case we want to classify an observation with only one state.

Say the set of spaces is S. Typically, in the nth order, <ol> <li>The transition matrix has dimensions |S|n X |S|. This is because given the current n history of states, we need the probability of the single next state. It is true that this single next state induces another compound state of history n, but the transition itself is to the single next state. See this example in Wikipedia, e.g..</li> <li>The initial distribution is a distribution over |S|n elements (your second option).</li> </ol>

understanding how to construct a higher order markov chain

Tags:

algorithm

markov-chains

Suppose I want to predict if a person is of class1=healthy or of class2= fever. I have a data set with the following domain: {normal,cold,dizzy}

The transition matrix would contain the probability of transition generated from our training dataset while the initial vector would contain the probability that a person starts(day1) with a state x from the domain {normal,cold,dizzy}, again this is also generated from our training set.

If I want to build a first order markov chain, I would generate a 3x3 transition matrix and a 1x3 initial vector per class like so:

> TransitionMatrix
       normal cold dizzy
normal     NA   NA    NA
cold       NA   NA    NA
dizzy      NA   NA    NA

>Initial Vector
     normal cold dizzy
[1,]     NA   NA    NA

The NA will be filled with the corresponding probabilities.

1-My question is about transition matrices in higher order chain. For example in second order MC would we have a transition matrix of size domain²xdomain² like so:

               normal->normal normal->cold normal->dizzy cold->normal cold->cold cold->dizzy dizzy->normal dizzy->cold dizzy->dizzy
normal->normal             NA           NA            NA           NA         NA          NA            NA          NA           NA
normal->cold               NA           NA            NA           NA         NA          NA            NA          NA           NA
normal->dizzy              NA           NA            NA           NA         NA          NA            NA          NA           NA
cold->normal               NA           NA            NA           NA         NA          NA            NA          NA           NA
cold->cold                 NA           NA            NA           NA         NA          NA            NA          NA           NA
cold->dizzy                NA           NA            NA           NA         NA          NA            NA          NA           NA
dizzy->normal              NA           NA            NA           NA         NA          NA            NA          NA           NA
dizzy->cold                NA           NA            NA           NA         NA          NA            NA          NA           NA
dizzy->dizzy               NA           NA            NA           NA         NA          NA            NA          NA           NA

here the cell (1,1) represents the following sequence: normal->normal->normal->normal

or would it instead be just domain²xdomain like so:

               normal cold dizzy
normal->normal     NA   NA    NA
normal->cold       NA   NA    NA
normal->dizzy      NA   NA    NA
cold->normal       NA   NA    NA
cold->cold         NA   NA    NA
cold->dizzy        NA   NA    NA
dizzy->normal      NA   NA    NA
dizzy->cold        NA   NA    NA
dizzy->dizzy       NA   NA    NA

here the cell (1,1) represents normal->normal->normal which is different from the previous representation

2-What about the initial vector for a MC of degree 2. Would we need two initial vectors of size 1xdomain like so:

     normal cold dizzy
[1,]     NA   NA    NA

leading to two initial vectors per class. the first giving the probability of occurrence of {normal,cold,dizzy} on the first day for the healthy/fever class while the second gives the probability of occurrence on the second day for the healthy/fever. this would give 4 initial vectors.

OR would we just need one initial vector of size 1xdomain²like so:

    normal->normal normal->cold normal->dizzy cold->normal cold->cold cold->dizzy dizzy->normal dizzy->cold dizzy->dizzy
[1,]             NA           NA            NA           NA         NA          NA            NA          NA           NA

I can see how the second way of representing the initial vector would be problematic in case we want to classify an observation with only one state.

510

asked Aug 15 '16 10:08

Imlerith

1 Answers

Say the set of spaces is S. Typically, in the nth order,

The transition matrix has dimensions |S|ⁿ X |S|. This is because given the current n history of states, we need the probability of the single next state. It is true that this single next state induces another compound state of history n, but the transition itself is to the single next state. See this example in Wikipedia, e.g..
The initial distribution is a distribution over |S|ⁿ elements (your second option).

139

answered Nov 10 '22 00:11

Ami Tavory

Related questions
                            
                                Which kind of recursive parsing is this algorithm? Bottom-up or top-down?
                            
                                How to format data for the spark mlib kmeans clustering algorithm?
                            
                                Number of shortest paths in weighted graph
                            
                                How to calculate distribution (Histogram) of large amount of data in a distributed system?
                            
                                Maximal rectangle set cover
                            
                                efficiently calculate locations for rectangles in a unit grid
                            
                                Inclusion-exclusion in dynamic programming
                            
                                what algorithm should I use?
                            
                                How do I improve remove duplicate algorithm?
                            
                                Kruskal with Heap or Sort Algorithm
                            
                                A faulty algorithm for subtraction of big integers in Forth
                            
                                Subgraph with minimum edge weight and node weight >= Val
                            
                                When can the Master Theorem actually be applied?
                            
                                How to make shortest path between two points algorithm faster?
                            
                                How can I get a set of frequencies for a certain recurring number in an array?
                            
                                Name Entity Resolution Algorithm
                            
                                Algorithm to sort rows and cols by similarity
                            
                                Minimize total distance using k links among n nodes
                            
                                find minimum-length subarray that has all numbers
                            
                                Analysis of different Sets and optimizations. Best approach?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With