Scipy: Linear programming with sparse matrices

Q: What is sparse matrix in SciPy?

Matrices that mostly contain zeroes are said to be sparse. Sparse matrices are commonly used in applied machine learning (such as in data containing data-encodings that map categories to count) and even in whole subfields of machine learning such as natural language processing (NLP).

Q: What is the SciPy function which creates a sparse matrix?

from scipy.sparse import csc_matrix. # Creating a 3 * 4 sparse matrix. sparseMatrix = csc_matrix(( 3 , 4 ), dtype = np.int8).toarray() # Print the sparse matrix.

Q: How do you solve sparse matrix?

Factorization or decomposition. Depending on the properties of the matrix A, different factorizations are used: For an n\times n symmetric positive definite matrix, the Cholesky factorization A=LL^T is usually computed, where L is a lower triangular (sparse) matrix.

Q: What is SciPy optimize used for?

SciPy optimize provides functions for minimizing (or maximizing) objective functions, possibly subject to constraints. It includes solvers for nonlinear problems (with support for both local and global optimization algorithms), linear programing, constrained and nonlinear least-squares, root finding, and curve fitting.

Tags:

python

scipy

sparse-matrix

linear-programming

I want to solve a linear program in python. The number of variables (I will call it N from now on) is very large (~50000) and in order to formulate the problem in the way scipy.optimize.linprog requires it, I have to construct two N x N matrices (A and B below). The LP can be written as

minimize: c.x
subject to:
    A.x <= a
    B.x  = b
    x_i >= 0 for all i in {0, ..., n}

whereby . denotes the dot product and a, b, and c are vectors with length N.

My experience is that constructing such large matrices (A and B have both approx. 50000x50000 = 25*10^8 entries) comes with some issues: If the hardware is not very strong, NumPy may refuse to construct such big matrices at all (see for example Very large matrices using Python and NumPy) and even if NumPy creates the matrix without problems, there is a huge performance issue. This is natural regarding the huge amount of data NumPy has to deal with.

However, even though my linear program comes with N variables, the matrices I work with are very sparse. One of them has only entries in the very first row, the other one only in the first M rows, with M < N/2. Of course I would like to exploit this fact.

As far as I have read (e.g. Trying to solve a Scipy optimization problem using sparse matrices and failing), scipy.optimize.linprog does not work with sparse matrices. Therefore, I have the following questions:

Is it actually true that SciPy does not offer any possibility to solve a linear program with sparse matrices? (If not, how can I do it?)
Do you know any alternative library that will solve the problem more effectively than SciPy with non-sparse matrices? (The library suggested in the thread above seems to be not flexible enough for my purposes - as far as I understand its documentation)
Can it be expected that a new implementation of the simplex algorithm (using plain Python, no C) that exploits the sparsity of the matrices will be more efficient than SciPy with non-sparse matrices?

681

asked Jan 15 '16 23:01

Samufi

2 Answers

I would say forming a dense matrix (or two) to solve a large sparse LP is probably not the right thing to do. When solving a large sparse LP it is important to use a solver that has facilities to handle such problems and also to generate the model in a way that does not create explicitly any of these zero elements.

Writing a stable, fast, sparse Simplex LP solver in Python as a replacement for the SciPy dense solver is not a trivial exercise. Moreover a solver written in pure Python may not perform as well.

For the size you indicate, although not very, very large (may be large medium sized model would be a good classification) you may want to consider a commercial solver like Cplex, Gurobi or Mosek. These solvers are very fast and very reliable (they solve basically any LP problem you throw at them). They all have Python APIs. The solvers are free or very cheap for academics.

If you want to use an Open Source solver, you may want to look at the COIN CLP solver. It also has a Python interface.

If your model is more complex then you also may want to consider to use a Python modeling tool such as Pulp or Pyomo (Gurobi also has good modeling support in Python).

153

answered Oct 17 '22 15:10

Erwin Kalvelagen

I can't believe nobody has pointed you in the direction of PuLP! You will be able to create your problem efficiently, like so:

import pulp
prob = pulp.LpProblem("test problem",pulp.LpMaximize)
x = pulp.LpVariable.dicts('x', range(5), lowBound=0.0)
prob += pulp.lpSum([(ix+1)*x[ix] for ix in range(5)]), "objective"
prob += pulp.lpSum(x)<=3, "capacity"

prob.solve()
for k, v in prob.variablesDict().iteritems():
    print k, v.value()

PuLP is fantastic, comes with a very decent solver (CBC) and can be hooked up to open source and commercial solvers. I am currently using it in production for a forestry company and exploring Dippy for the hardest (integer) problems we have. Best of luck!

answered Oct 17 '22 15:10

Sergio Lucero

Related questions
                            
                                Relative shebang: How to write an executable script running portable interpreter which comes with it
                            
                                Can I save a range to a variable?
                            
                                Converting some columns of a matrix from float to int
                            
                                What's the best practice for installing development versions of Python modules in Anaconda?
                            
                                How to expose a function returning a C++ object to Python without copying the object?
                            
                                How to register a Custom Form with Django Admin
                            
                                TensorFlow initializing Tensor of ones
                            
                                Chaining difference in ES6 Promises and PEP3148 Futures
                            
                                Python setup.py test dependencies for custom test command
                            
                                Pydub from_mp3 gives [Errno 2] No such file or directory
                            
                                skflow regression predict multiple values
                            
                                django.db.utils.OperationalError: (1046, 'No database selected')
                            
                                BeautifulSoup: how to ignore spurious end tags
                            
                                Django + Auth0 JWT authentication refusing to decode
                            
                                What are triple equal signs and ubuntu2 in Python pip freeze?
                            
                                nltk StanfordNERTagger : How to get proper nouns without capitalization
                            
                                Speeding up past-60-day mean in pandas
                            
                                Get output colored text with Popen
                            
                                Sort a nested list by two elements [duplicate]
                            
                                Tensorflow ArgumentError Running CIFAR-10 example

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With