Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transition matrix

Tags:

r

matrix

Consider the following dataframe:

 df = data.frame(cusip = paste("A", 1:10, sep = ""), xt = c(1,2,3,2,3,5,2,4,5,1), xt1 = c(1,4,2,1,1,4,2,2,2,5))

The data is divided in five states, which are quantiles in reality: 1,2,3,4,5. The first column of the dataframe represents the state at time t, and the second column is the state at time t+1.

I would like to compute a sort of a transition matrix for the five states. The meaning of the matrix would be as follows:

  • (Row, Col) = (1,1) : % of cusips that were in quantile 1 at time t, and stayed at 1 in time t+1
  • (Row, Col) = (1,2) : % of cusips that were in quantile 1 at t, and became quantile 2 at t+1
  • etc...

I am really not sure how to do this in an efficient way. I have the feeling the answer is trivial, but I just can't get my head around it.

Could anyone please help?

like image 774
Mayou Avatar asked Jan 27 '14 21:01

Mayou


People also ask

What are transition matrices used for?

Transition matrices are used to describe the way in which transitions are made between two states. It is used when events are more or less likely depending on the previous events.

What is the transition matrix of a Markov process?

A Markov transition matrix is a square matrix describing the probabilities of moving from one state to another in a dynamic system. In each row are the probabilities of moving from the state represented by that row, to the other states. Thus the rows of a Markov transition matrix each add to one.

What is transition matrix and emission matrix?

The first one, called the transition matrix, determines probabilities of transitions from one hidden state to another one (the next one). The second matrix, called the emission matrix, determines probabilities of observations given a hidden state.


1 Answers

res <- with(df, table(xt, xt1)) ## table() to form transition matrix
res/rowSums(res)                ## /rowSums() to normalize by row
#    xt1
# xt          1         2         4         5
#   1 0.5000000 0.0000000 0.0000000 0.5000000
#   2 0.3333333 0.3333333 0.3333333 0.0000000
#   3 0.5000000 0.5000000 0.0000000 0.0000000
#   4 0.0000000 1.0000000 0.0000000 0.0000000
#   5 0.0000000 0.5000000 0.5000000 0.0000000

## As an alternative to  2nd line above, use sweep(), which won't rely on 
## implicit recycling of vector returned by rowSums(res)
sweep(res, MARGIN = 1, STATS = rowSums(res), FUN = `/`)
like image 126
Josh O'Brien Avatar answered Sep 26 '22 13:09

Josh O'Brien