What is an efficient way to test if rows in a matrix are sorted? [Update: see Aaron's Rcpp answer - straightforward & very fast.] I am porting some code that uses <code>issorted(,'rows')</code> from Matlab. As it seems that <code>is.unsorted</code> does not extend beyond vectors, I'm writing or looking for something else. The naive method is to check that the sorted version of the matrix (or data frame) is the same as the original, but that's obviously inefficient. NB: For sorting, a la <code>sortrows()</code> in Matlab, my code essentially uses <code>SortedDF <- DF[do.call(order, DF),]</code> (it's wrapped in a larger function that converts matrices to data frames, passes parameters to <code>order</code>, etc.). I wouldn't be surprised if there are faster implementations (data table comes to mind). <hr> Update 1: To clarify: I'm not testing for sorting intra-row or intra-columns. (Such sorting generally results in an algebraically different matrix.) As an example for creating an unsorted matrix: <pre class="prettyprint"><code>set.seed(0) x <- as.data.frame(matrix(sample(3, 60, replace = TRUE), ncol = 6, byrow = TRUE)) </code></pre> Its sorted version is: <pre class="prettyprint"><code>y <- x[do.call(order, x),] </code></pre> A proper test, say <code>testSorted</code>, would return <code>FALSE</code> for <code>testSorted(x)</code> and <code>TRUE</code> for <code>testSorted(y)</code>. Update 2: The answers below are all good - they are concise and do the test. Regarding efficiency, it looks like these are sorting the data after all. I've tried these with rather large matrices, such as 1M x 10, (just changing the creation of <code>x</code> above) and all have about the same time and memory cost. What's peculiar is that they all consume more time for unsorted objects (about 5.5 seconds for 1Mx10) than for sorted ones (about 0.5 seconds for <code>y</code>). This suggests they're sorting before testing. I tested by creating a <code>z</code> matrix: <pre class="prettyprint"><code>z <- y z[,2] <- y[,1] z[,1] <- y[,2] </code></pre> In this case, all of the methods take about 0.85 seconds to complete. Anyway, finishing in 5.5 seconds isn't terrible (in fact, that seems to be right about the time necessary to sort the object), but knowing that a sorted matrix is 11X faster suggests that a test that doesn't sort could be even faster. In the case of the 1M row matrix, the first three rows of <code>x</code> are: <pre class="prettyprint"><code> V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 1 3 1 2 2 3 1 3 3 2 2 2 1 1 1 3 2 3 2 3 3 2 3 3 3 1 2 1 1 2 1 2 3 </code></pre> There's no need to look beyond row 2, though vectorization isn't a bad idea. (I've also added the <code>byrow</code> argument for the creation of <code>x</code>, so that row values don't depend on the size of <code>x</code>.) Update 3: Another comparison for this testing can be found with the <code>sort -c</code> command in Linux. If the file is already written (using <code>write.table()</code>), with 1M rows, then <code>time sort -c myfile.txt</code> takes 0.003 seconds for the unsorted data and 0.101 seconds for the sorted data. I don't intend to write out to a file, but it's a useful comparison. Update 4: Aaron's Rcpp method bested all other methods offered here and that I've tried (including the <code>sort -c</code> comparison above: in-memory is expected to beat on-disk). As for the ratio relative to other methods, it's hard to tell: the denominator is too small to give an accurate measurement, and I've not extensively explored <code>microbenchmark</code>. The speedups can be very large (4-5 orders of magnitude) for some matrices (e.g. one made with <code>rnorm</code>), but this is misleading - checking can terminate after only a couple of rows. I've had speedups with the example matrices of about 25-60 for the unsorted and about 1.1X for the sorted, as the competing methods were already very fast if the data is sorted. Since this does the right thing (i.e. no sorting, just testing), and does it very quickly, it's the accepted answer.

If y is sorted then do.call(order,y) returns 1:nrow(y). <pre class="prettyprint"><code> testSorted = function(y){all(do.call(order,y)==1:nrow(y))} </code></pre> note this doesn't compare the matrices, but it doesn't dip out as soon as it finds a non-match.

Well, why don't you use: <pre class="prettyprint"><code>all(do.call(order, y)==seq(nrow(y))) </code></pre> That avoids creating the ordered matrix, and ensures it checks your style of ordering.

Testing if rows of a matrix or data frame are sorted in R

Q: How do I check if a list is sorted in R?

unsorted() function in R Language is used to check if an object is sorted or not. It returns False if the object is sorted otherwise True.

Q: How do I sort a row in a matrix in R?

To sort each row of an R data frame in increasing order, we can use apply function for sorting the columns and then transpose the output. For example, if we have a data frame called df that contains 5 columns then each row of df can be sorted in increasing order by using the command t(apply(df,1,sort)).

Q: How do I select specific rows with conditions in R?

By Using subset() R base also provides a subset() function that can be used to select rows based on the logical condition of a column.

Tags:

sorting

r

matlab

What is an efficient way to test if rows in a matrix are sorted? [Update: see Aaron's Rcpp answer - straightforward & very fast.]

I am porting some code that uses issorted(,'rows') from Matlab. As it seems that is.unsorted does not extend beyond vectors, I'm writing or looking for something else. The naive method is to check that the sorted version of the matrix (or data frame) is the same as the original, but that's obviously inefficient.

NB: For sorting, a la sortrows() in Matlab, my code essentially uses SortedDF <- DF[do.call(order, DF),] (it's wrapped in a larger function that converts matrices to data frames, passes parameters to order, etc.). I wouldn't be surprised if there are faster implementations (data table comes to mind).

Update 1: To clarify: I'm not testing for sorting intra-row or intra-columns. (Such sorting generally results in an algebraically different matrix.)

As an example for creating an unsorted matrix:

set.seed(0)
x <- as.data.frame(matrix(sample(3, 60, replace = TRUE), ncol = 6, byrow = TRUE))

Its sorted version is:

y <- x[do.call(order, x),]

A proper test, say testSorted, would return FALSE for testSorted(x) and TRUE for testSorted(y).

Update 2: The answers below are all good - they are concise and do the test. Regarding efficiency, it looks like these are sorting the data after all.

I've tried these with rather large matrices, such as 1M x 10, (just changing the creation of x above) and all have about the same time and memory cost. What's peculiar is that they all consume more time for unsorted objects (about 5.5 seconds for 1Mx10) than for sorted ones (about 0.5 seconds for y). This suggests they're sorting before testing.

I tested by creating a z matrix:

z <- y
z[,2] <- y[,1]
z[,1] <- y[,2]

In this case, all of the methods take about 0.85 seconds to complete. Anyway, finishing in 5.5 seconds isn't terrible (in fact, that seems to be right about the time necessary to sort the object), but knowing that a sorted matrix is 11X faster suggests that a test that doesn't sort could be even faster. In the case of the 1M row matrix, the first three rows of x are:

  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1  3  1  2  2  3  1  3  3  2   2
2  1  1  1  3  2  3  2  3  3   2
3  3  3  1  2  1  1  2  1  2   3

There's no need to look beyond row 2, though vectorization isn't a bad idea.

(I've also added the byrow argument for the creation of x, so that row values don't depend on the size of x.)

Update 3: Another comparison for this testing can be found with the sort -c command in Linux. If the file is already written (using write.table()), with 1M rows, then time sort -c myfile.txt takes 0.003 seconds for the unsorted data and 0.101 seconds for the sorted data. I don't intend to write out to a file, but it's a useful comparison.

Update 4: Aaron's Rcpp method bested all other methods offered here and that I've tried (including the sort -c comparison above: in-memory is expected to beat on-disk). As for the ratio relative to other methods, it's hard to tell: the denominator is too small to give an accurate measurement, and I've not extensively explored microbenchmark. The speedups can be very large (4-5 orders of magnitude) for some matrices (e.g. one made with rnorm), but this is misleading - checking can terminate after only a couple of rows. I've had speedups with the example matrices of about 25-60 for the unsorted and about 1.1X for the sorted, as the competing methods were already very fast if the data is sorted.

Since this does the right thing (i.e. no sorting, just testing), and does it very quickly, it's the accepted answer.

853

asked Sep 29 '11 14:09

Iterator

2 Answers

If y is sorted then do.call(order,y) returns 1:nrow(y).

 testSorted = function(y){all(do.call(order,y)==1:nrow(y))}

note this doesn't compare the matrices, but it doesn't dip out as soon as it finds a non-match.

answered Oct 29 '22 11:10

Spacedman

Well, why don't you use:

all(do.call(order, y)==seq(nrow(y)))

That avoids creating the ordered matrix, and ensures it checks your style of ordering.

answered Oct 29 '22 11:10

Nick Sabbe

Related questions
                            
                                Rstudio shiny ggvis tooltip on mouse hover
                            
                                Create column with grouped values based on another column
                            
                                What is the difference between rel error and x error in a rpart decision tree? [closed]
                            
                                ggplot2: can't sort x axis by y value
                            
                                Locking R shiny dashboard sidebar (shinydashboard)
                            
                                change value to percentage of row in R [duplicate]
                            
                                Creating a contingency table using multiple columns in a data frame in R
                            
                                Hide/show outputs Shiny R
                            
                                Perform an operation on a vector using the previous value after an initial value
                            
                                R: `ID : Coercing LHS to a list` in adding an ID column, why?
                            
                                Move axis labels in between plot and facet strip
                            
                                group by in dplyr and calculating percentages
                            
                                How to convert list of list into a tibble (dataframe)
                            
                                Count consecutive TRUE values within each block separately [duplicate]
                            
                                merge data frame and named vector
                            
                                How to sort rows of a data frame based on a vector using dplyr pipe
                            
                                Querying Oracle DB from Revolution R using RODBC
                            
                                Deleting specific rows from a data frame
                            
                                Paste together two character vectors of different lengths
                            
                                k-means: Same clusters for every execution

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With