Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I remove a column from a sparse matrix efficiently?

If I am using the sparse.lil_matrix format, how can I remove a column from the matrix easily and efficiently?

like image 633
Brandon Pelfrey Avatar asked Mar 03 '10 03:03

Brandon Pelfrey


2 Answers

Much simpler and faster. You might not even need the conversion to csr, but I just know for sure that it works with csr sparse matrices and converting between shouldn't be an issue.

from scipy import sparse

x_new = sparse.lil_matrix(sparse.csr_matrix(x)[:,col_list])
like image 82
Newmu Avatar answered Oct 17 '22 02:10

Newmu


I've been wanting this myself and in truth there isn't a great built-in way to do it yet. Here's a way to do it. I chose to make a subclass of lil_matrix and add the remove_col function. If you want, you can instead add the removecol function to the lil_matrix class in your lib/site-packages/scipy/sparse/lil.py file. Here's the code:

from scipy import sparse
from bisect import bisect_left

class lil2(sparse.lil_matrix):
    def removecol(self,j):
        if j < 0:
            j += self.shape[1]

        if j < 0 or j >= self.shape[1]:
            raise IndexError('column index out of bounds')

        rows = self.rows
        data = self.data
        for i in xrange(self.shape[0]):
            pos = bisect_left(rows[i], j)
            if pos == len(rows[i]):
                continue
            elif rows[i][pos] == j:
                rows[i].pop(pos)
                data[i].pop(pos)
                if pos == len(rows[i]):
                    continue
            for pos2 in xrange(pos,len(rows[i])):
                rows[i][pos2] -= 1

        self._shape = (self._shape[0],self._shape[1]-1)

I have tried it out and don't see any bugs. I certainly think that it is better than slicing the column out, which just creates a new matrix as far as I know.

I decided to make a removerow function as well, but I don't think that it is as good as removecol. I'm limited by not being able to remove one row from an ndarray in the way that I would like. Here is removerow which can be added to the above class

    def removerow(self,i):
        if i < 0:
            i += self.shape[0]

        if i < 0 or i >= self.shape[0]:
            raise IndexError('row index out of bounds')

        self.rows = numpy.delete(self.rows,i,0)
        self.data = numpy.delete(self.data,i,0)
        self._shape = (self._shape[0]-1,self.shape[1])

Perhaps I should submit these functions to the Scipy repository.

like image 42
Justin Peel Avatar answered Oct 17 '22 02:10

Justin Peel