This question has been thoroughly discussed and updated on OR exchange, where I had crossposted it.
When I run CPLEX 12.5.0.0 from the command line:
cplex -f my_instance.lp
an optimal integer solution is found in 19056.99 ticks.
But through the Python API, on the very same instance:
import cplex
problem = cplex.Cplex("my_instance.lp")
problem.solve()
the required time now amounts to 97407.10 ticks (more than 5 times slower).
In both cases, the mode is parallel, deterministic, up to 2 threads. Wondering if this poor performance was due to some Python thread overhead, I tried:
problem = cplex.Cplex("my_instance.lp")
problem.parameters.threads.set(1)
problem.solve()
Required 46513.04 ticks (i.e., using one core was two times faster than using two!).
Being new to CPLEX and LP in general, I find these results quite confusing. Is there a way to improve the Python API performance, or should I switch to some more mature API (i.e., Java or C++)?
Here are the full details of the 2-threads' resolutions, first the (common) preamble:
Tried aggregator 3 times.
MIP Presolve eliminated 2648 rows and 612 columns.
MIP Presolve modified 62 coefficients.
Aggregator did 13 substitutions.
Reduced MIP has 4229 rows, 1078 columns, and 13150 nonzeros.
Reduced MIP has 1071 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.06 sec. (18.79 ticks)
Probing fixed 24 vars, tightened 0 bounds.
Probing time = 0.08 sec. (18.12 ticks)
Tried aggregator 1 time.
MIP Presolve eliminated 87 rows and 26 columns.
MIP Presolve modified 153 coefficients.
Reduced MIP has 4142 rows, 1052 columns, and 12916 nonzeros.
Reduced MIP has 1045 binaries, 7 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.05 sec. (11.67 ticks)
Probing time = 0.01 sec. (1.06 ticks)
Clique table members: 4199.
MIP emphasis: balance optimality and feasibility.
MIP search method: dynamic search.
Parallel mode: deterministic, using up to 2 threads.
Root relaxation solution time = 0.20 sec. (91.45 ticks)
Results from the command line:
GUB cover cuts applied: 1
Clique cuts applied: 3
Cover cuts applied: 2
Implied bound cuts applied: 38
Zero-half cuts applied: 7
Gomory fractional cuts applied: 2
Root node processing (before b&c):
Real time = 5.27 sec. (2345.14 ticks)
Parallel b&c, 2 threads:
Real time = 35.15 sec. (16626.69 ticks)
Sync time (average) = 0.00 sec.
Wait time (average) = 0.00 sec.
------------
Total (root+branch&cut) = 40.41 sec. (18971.82 ticks)
Results from the Python API:
Clique cuts applied: 33
Cover cuts applied: 1
Implied bound cuts applied: 4
Zero-half cuts applied: 10
Gomory fractional cuts applied: 4
Root node processing (before b&c):
Real time = 6.42 sec. (2345.36 ticks)
Parallel b&c, 2 threads:
Real time = 222.28 sec. (95061.73 ticks)
Sync time (average) = 0.01 sec.
Wait time (average) = 0.00 sec.
------------
Total (root+branch&cut) = 228.70 sec. (97407.10 ticks)
You might try disabling the presolver and cuts in both cases. Then rerun the experiment to test if the Python API itself is throttling performance. If the performances match after disabling cuts, then look into Python cut parameter tuning & defaults.
In my opinion C++ is preferred for performance, but it can add serious time to development. Just my opinion though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With