Thursday, October 20, 2016

Excel and very large data files

Tales with many rows (much more than the one million row limit in Excel) can be loaded in Excel’s Power Pivot Data Model. Here is an example from Statistics New Zealand with a CSV file of more than 23 million records.


Actually Excel stores this data quite efficiently:


Note: wc reports one extra row because of the headers in line 1.


Wednesday, October 19, 2016

Microsoft Open Sources Gradient Boosting Machine code

Microsoft abandoned their MS Solver Foundation framework, now focusing on machine learning (and R) ?


  1. Alexey Natekin, Alois Knoll, ”Gradient Boosting Machines, A Tutorial”, 1983,

Commercial Vehicle Routing


An interesting approach to provide vehicle routing services is: You send a spreadsheet with the data (addresses) and get the ordering back,

  • The CEO has a degree from Erasmus University in Rotterdam
  • They got some funding to get started ($1.2M).
  • Article in Fast Company:
  • One more article:
  • This is based on a “Bee Inspired” heuristic. There are quite a few of them:
    artificial bee colony (ABC) algorithm, honeybees mating optimization (HBMO) algorithm, artificial beehive algorithm (ABHA), bee colony optimization (BCO) algorithm, bee colony inspired algorithm (BCiA), bee swarm optimization (BSO) algorithm, bee system (BS) algorithm, BeeHive algorithm, bees algorithm (BeA), bees life algorithm (BLA), bumblebees algorithm, honeybee social foraging (HBSF) algorithm, OptBees algorithm, simulated bee colony (SBC) algorithm, virtual bees algorithm (VBA), and wasp swarm optimization (WSO) algorithm.  This list is from (1)
  • Looks like simple heuristics are preferred, compared to sophisticated mathematical programming based algorithms.


  1. Bo Xing, Wen-Jing Gao, “Bee Inspired Algorithms”, in Innovative Computational Intelligence: A Rough Guide to 134 Clever Algorithms, 2013.

Tuesday, October 18, 2016

MIP Modeling: if constraint


could you please explain the your logic with binary variable delta, for more general case, for instance, if y={a: x>0, b: x<0}.

If we assume \(y\) can be either \(a\) or \(b\) (constants) when \(x=0\), and \(x \in [L,U]\)  (with \(L<0\) and \(U>0\)) then we can write

y = \begin{cases}a&x\ge 0\\b&x\le 0 \end{cases}
& \Longrightarrow
& \boxed{\begin{align}&L(1-\delta) \le x \le U \delta\\&y=a\delta + b (1-\delta)\\ &\delta \in \{0,1\} \end{align}}
\end{matrix} \]

I am not sure if the left part is formulated mathematically correctly. May be I should say:

\[y = \begin{cases} a&x> 0\\
                   b&x< 0 \\
                   a \text{ or } b & x=0

If we want to exclude \(x=0\) then we need to add some small numbers:

y = \begin{cases}a&x> 0\\b&x < 0 \end{cases}
& \Longrightarrow
& \boxed{\begin{align}&\varepsilon + (L-\varepsilon)(1-\delta) \le x \le –\varepsilon +(U+\varepsilon)\delta\\&y=a\delta + b (1-\delta)\\&\delta \in \{0,1\}\\
\end{matrix} \]
Of course in many cases we can approximate \(U\approx U+\varepsilon\) (similar for \(L\)), I was a bit precise here.

If \(a\) and \(b\) are variables (instead of constants) things become somewhat more complicated. I believe the following is correct:

&\varepsilon + (L-\varepsilon)(1-\delta) \le x \le –\varepsilon +(U+\varepsilon)\delta\\
&a+M_1(1-\delta)\le y \le a+ M_2(1-\delta)\\
&b+M_3\delta\le y \le b+M_4\delta\\
&\delta \in \{0,1\}\\
&M_1 = b^{LO}-a^{UP}\\
&M_2 = b^{UP}-a^{LO}\\
&M_3 = a^{LO}-b^{UP}\\
&M_4 = a^{UP}-b^{LO}

I used here \(a \in [a^{LO},a^{UP}]\) and \(b \in [b^{LO},b^{UP}]\).

Saturday, October 15, 2016

MIP Modeling: from Sudoku to KenKen

The Mixed Integer Programming formulations for solving the puzzles Sudoku and KenKen have a major trait they share: the basic data structure is a binary decision variable:

\[x_{i,j,k} = \begin{cases}1 & \text{if cell $(i,j)$ has the value $k$}\\
                                    0 & \text{otherwise}\end{cases}\]

Such a binary variable allows us to easily formulate so-called ‘all-different’ constructs (1). E.g. if each row \(i\) having \(n\) cells must have different values \(1..n\) in each cell, we can write:

\[\boxed{\begin{align}&\sum_j x_{i,j,k} = 1 &&\forall i, k&& \text{value $k$ appears once in row $i$}\\
                       &\sum_k x_{i,j,k} = 1&&\forall i, j&&\text{cell $(i,j)$ contains one value}\\
                       &x_{i,j,k} \in \{0,1\}

If we want to recover the value of a cell \((i,j)\) we can introduce a variable \(v_{i,j}\) with:

\[v_{i,j} = \sum_k k \cdot x_{i,j,k}\]

Detail: this is essentially what is sometimes called a set of accounting rows. We could also do this in post-processing outside the optimization model.

Below I extend this basic structure to a model that solves the Sudoku problem. Next we tackle the KenKen problem. We do this along the lines of (6) but we use a different, more straightforward logarithmic linearization technique for the multiplications and divisions. 


A typical Sudoku puzzle grid, with nine rows and nine columns that intersect at square spaces. Some of the spaces are filled with one number each; others are blank spaces for a solver to fill with a number.The previous puzzle, solved with additional numbers that each fill a blank space.

On the left we have a Sudoku puzzle and on the right we see the solution (2).

In Sudoku we need uniqueness along the rows, the columns and inside the 3 x 3 sub-squares. The constraints that enforce these uniqueness conditions are:

\[\boxed{\begin{align}&\sum_j x_{i,j,k} = 1 &&\forall i, k&& \text{value $k$ appears once in row $i$}\\
&\sum_j x_{i,j,k} = 1 &&\forall j, k&& \text{value $k$ appears once in column $j$}\\
&\sum_{i,j|u_{s,i,j}} x_{i,j,k} = 1 &&\forall s, k&& \text{value $k$ appears once in square $s$}\\
                       &\sum_k x_{i,j,k} = 1&&\forall i, j&&\text{cell $(i,j)$ contains one value}

Here \(u_{s,i,j}\) contains the elements \(i,j\) for square \(s\). With this we have two issues left: How can we build up \(u_{s,i,j}\) and how can we fix the given values in the Sudoku problem.

Populate areas

We number the squares as follows:


Let’s see if we can populate \(u_{s,i,j}\) without entering data manually and without using loops. We can invent a function \(s=f(i,j)\) that takes a row and column number and returns the number of the square as follows:

\[f(i,j) = 3\left\lceil i/3 \right\rceil + \left\lceil j/3 \right\rceil – 3\]

Here the funny brackets indicate the ceiling function. E.g. \(f(4,7)= 3\left\lceil 4/3 \right\rceil + \left\lceil 7/3 \right\rceil –3 = 3\cdot 2+3-3=6\). This means we can populate \(u\) as in:


When we take a step back, we can consider rows and columns as just another particularly shaped block. So we add to our set \(u\) the individual rows and columns:


This means we can reduce the model equations to just:

\[\boxed{\begin{align}&\sum_{i,j|u_{a,i,j}} x_{i,j,k} = 1 &&\forall a, k&& \text{value $k$ appears once in area $a$}\\
                       &\sum_k x_{i,j,k} = 1&&\forall i, j&&\text{cell $(i,j)$ contains one value}\\
                       &x_{i,j,k} \in \{0,1\}
Fixing initial values

If we use the variables \(v_{i,j}\) in the model, we can use them to fix the initial values.


If we don’t have the \(v\) variables in the model we can fix the underlying \(x\) variables:


Complete model


The input is most conveniently organized as a table:

table v0(i,j)
c1 c2 c3 c4 c5 c6 c7 c8 c9

r1  5  3        7
r2  6        1  9  5
r3     9  8              6
r4  8           6           3
r5  4        8     3        1
r6  7           2           6
r7     6              2  8
r8           4  1  9        5
r9              8        7  9


The output looks like:

----     49 PARAMETER v 

         c1       c2       c3       c4       c5       c6       c7       c8       c9

r1        5        3        4        6        7        8        9        1        2
r2        6        7        2        1        9        5        3        4        8
r3        1        9        8        3        4        2        5        6        7
r4        8        5        9        7        6        1        4        2        3
r5        4        2        6        8        5        3        7        9        1
r6        7        1        3        9        2        4        8        5        6
r7        9        6        1        5        3        7        2        8        4
r8        2        8        7        4        1        9        6        3        5
r9        3        4        5        2        8        6        1        7        9



On the left we have a KenKen puzzle and on the right we see the solution (4).

In the KenKen puzzle we need to have unique values along the rows and the columns. As we have seen in Sudoku example, this is straightforward. We can formulate this as:

\[\boxed{\begin{align}&\sum_j x_{i,j,k} = 1 &&\forall i, k&& \text{value $k$ appears once in row $i$}\\
&\sum_i x_{i,j,k} = 1 &&\forall j, k&& \text{value $k$ appears once in column $j$}\\
&\sum_k x_{i,j,k} = 1&&\forall i, j&&\text{cell $(i,j)$ contains one value}\\

                       &x_{i,j,k} \in \{0,1\}

As in the Sudoku example, we can combine the first two constraints into a single one:

\[\sum_{i,j|u_{s,i,j}} x_{i,j,k} = 1 \>\>\forall s,k\]

In addition to these constraints we have a slew of restrictions related to collections of cells - these are called the cages. Each cage has a rule and a given answer. Let’s discuss them in detail:

  1. n this is a single cell with a given value. This case is not present in the data set in our example above. The rule is easy: just put the number in the cell \((i,j)\): \(v_{i,j} = n\),
    For this example we would fix this cell to the value 1.
  2. n+ indicates we add up the values in the cage and the result should be n. This is a simple linear constraint: \[\sum_{i,j|C{i,j}} v_{i,j} = n\]
  3. n- is a cage with just two cells where the absolute value difference between the values of the two cells is n. I.e. \[|v_{i_1,j_1}- v_{i_2,j_2}|=n\] In (6) it is proposed to model this in a linear fashion with the help of an additional binary variable as: \[v_{i_1,j_1}- v_{i_2,j_2}=n(2 \delta-1)\] where \(\delta \in \{0,1\}\). 
  4. n* indicates we multiply all values in the cage: \[\prod_{i,j|C{i,j}} v_{i,j} = n\] In (6) a number of possible linearizations are mentioned, some of them rather complicated with such things as prime number factorizations. Let me suggest a much simpler approach using logarithms. We have: \[\begin{align}&\prod_{i,j|C{i,j}} v_{i,j} = n\\ \Longrightarrow\>\> & \sum_{i,j|C_{i,j}} \log(v_{i,j}) = \log(n)\\ \Longrightarrow\>\> & \sum_{i,j|C_{i,j}} \sum_k \log(k) x_{i,j,k} = \log(n)\end{align}\] The last constraint is linear in the decision variables \(x_{i,j,k}\)!. We have used here that \(\log(v_{i,j})=\sum_k \log(k) x_{i,j,k}\). This simpler linearization is not mentioned in (6).
    Note that we can always evaluate \(\log(k)\) and \(\log(n)\) as \(k\ge 1\) and \(n \ge 1\).
  5. this operation has two cells, and indicates division and can be described more formally: \[\frac{v_{i_1,j_1}}{v_{i_2,j_2}}=n \>\text{or}\>\frac{v_{i_2,j_2}}{v_{i_1,j_1}}=n\] In (6) the following linearization is proposed:\[\begin{align} &v_{i_1,j_1}-n\cdot v_{i_2,j_2} \ge 0-M\cdot \delta\\&v_{i_1,j_1}-n\cdot v_{i_2,j_2} \le 0+M\cdot \delta\\&v_{i_2,j_2}-n\cdot v_{i_1,j_1} \ge 0-M\cdot (1-\delta)\\&v_{i_2,j_2}-n\cdot v_{i_1,j_1} \le 0+M\cdot (1-\delta)\\&\delta\in\{0,1\}\end{align}\] Again I would suggest a simpler linearization: \[\begin{align}&\frac{v_{i_1,j_1}}{v_{i_2,j_2}}=n \>\text{or}\>\frac{v_{i_2,j_2}}{v_{i_1,j_1}}=n\\ \Longrightarrow\>\>&\log(v_{i_1,j_1})-\log(v_{i_2,j_2})=\log(n) \> \text{or} \>\log(v_{i_2,j_2})-\log(v_{i_1,j_1})=\log(n)\\ \Longrightarrow\>\>&|\log(v_{i_1,j_1})-\log(v_{i_2,j_2})|=\log(n)\\  \Longrightarrow\>\>&\log(v_{i_1,j_1})-\log(v_{i_2,j_2})=\log(n)(2\delta-1)\\  \Longrightarrow\>\>&\sum_k \log(k) x_{i_1,j_1,k}-\sum_k \log(k) x_{i_2,j_2,k}=\log(n)(2\delta-1)\end{align}\]
    I believe we can assume \(n\ge 1\) even for subtraction, hence we can safely apply the logarithm.
Complete Model

We have now all the pieces to assemble the complete model.


The sets are almost identical to the Sudoku model. We have introduced new sets \(op\), \(p\) and \(c\). The unique values will be handled through a set \(u\) as before, only now this set is simpler: only rows and columns (no squares).

We have a somewhat complicated parameter that holds the problem data:


The data can be entered as follows:

parameter operation(c,op,i,j) /
cage1 .plus.(r1,r2).c1              11
cage2 .div .r1.(c2,c3)               2
cage3 .min .r2.(c2,c3)               3
cage4 .mul .(r1,r2).c4              20
cage5 .mul .(r1.c5, (r1,r2,r3).c6)   6
cage6 .div .(r2,r3).c5               3
cage7 .mul .(r3,r4).(c1,c2)        240
cage8 .mul .r3.(c3,c4)               6
cage9 .mul .(r4,r5).c3               6,r5).c4, r5.c5)      7
cage11.mul .r4.(c5,c6)              30
cage12.mul .r5.(c1,c2)               6,r6).c6               9*c3)               8
cage15.div .r6.(c4,c5)               2

We need to extract data from this data structure to make things easier:


The sets \(cells\), \(cellp\) and the parameter \(n\) look like:

----     53 SET cells  cells in cage

                   c1          c2          c3          c4          c5          c6

cage1 .r1         YES
cage1 .r2         YES
cage2 .r1                     YES         YES
cage3 .r2                     YES         YES
cage4 .r1                                             YES
cage4 .r2                                             YES
cage5 .r1                                                         YES         YES
cage5 .r2                                                                     YES
cage5 .r3                                                                     YES
cage6 .r2                                                         YES
cage6 .r3                                                         YES
cage7 .r3         YES         YES
cage7 .r4         YES         YES
cage8 .r3                                 YES         YES
cage9 .r4                                 YES
cage9 .r5                                 YES
cage10.r4                                             YES
cage10.r5                                             YES         YES
cage11.r4                                                         YES         YES
cage12.r5         YES         YES
cage13.r5                                                                     YES
cage13.r6                                                                     YES
cage14.r6         YES         YES         YES
cage15.r6                                             YES         YES

----     53 SET cellp  ordered cell numbers

cage1 .p1.r1.c1
cage1 .p2.r2.c1
cage2 .p1.r1.c2
cage2 .p2.r1.c3
cage3 .p1.r2.c2
cage3 .p2.r2.c3
cage4 .p1.r1.c4
cage4 .p2.r2.c4
cage5 .p1.r1.c5
cage5 .p2.r1.c6
cage5 .p3.r2.c6
cage5 .p4.r3.c6
cage6 .p1.r2.c5
cage6 .p2.r3.c5
cage7 .p1.r3.c1
cage7 .p2.r3.c2
cage7 .p3.r4.c1
cage7 .p4.r4.c2
cage8 .p1.r3.c3
cage8 .p2.r3.c4
cage9 .p1.r4.c3
cage9 .p2.r5.c3

----     53 PARAMETER n  value for each cage

cage1   11.000,    cage2    2.000,    cage3    3.000,    cage4   20.000,    cage5    6.000,    cage6    3.000
cage7  240.000,    cage8    6.000,    cage9    6.000,    cage10   7.000,    cage11  30.000,    cage12   6.000
cage13   9.000,    cage14   8.000,    cage15   2.000

The variables in the model are as follows:


The loop to fix some cells is not used in this data set as we don’t have fixed values in the data. The rest of the model looks like:


The summations in the subtraction and divisions are a little bit misleading. These just pick the value of the first (p1) and second (p2) cell in the cage c. In the division equation we pick up the logarithm of the value. Here are the actually generated MIP rows:

---- subtraction  =E=  cages with subtraction

subtraction(cage3)..  v(r2,c2) - v(r2,c3) + 6*delta(cage3) =E= 3 ; (LHS = 0, INFES = 3 ****)

---- division  =E=  cages with division

division(cage2)..  logv(r1,c2) - logv(r1,c3) + 1.38629436111989*delta(cage2) =E= 0.693147180559945 ;
      (LHS = 0, INFES = 0.693147180559945 ****)
division(cage6)..  logv(r2,c5) - logv(r3,c5) + 2.19722457733622*delta(cage6) =E= 1.09861228866811 ;
      (LHS = 0, INFES = 1.09861228866811 ****)
division(cage15)..  logv(r6,c4) - logv(r6,c5) + 1.38629436111989*delta(cage15) =E= 0.693147180559945 ;
      (LHS = 0, INFES = 0.693147180559945 ****)

In equations \(calclogv\) we could restrict the calculation of \(logv\) only for those cells \((i,j)\) where we have a multiplication or division. Similarly, we can calculate \(v_{i,j}\) only for cells where we do addition or subtraction. Now we calculate too many \(logv\)’s and \(v\)'s. For this small case this is not an issue. Furthermore a good LP/MIP presolver will probably detect a variable that is not used anywhere else, and will remove the variable and the accompanying equation. 

The final result looks like:

----    101 VARIABLE v.L  value

            c1          c2          c3          c4          c5          c6

r1       5.000       6.000       3.000       4.000       1.000       2.000
r2       6.000       1.000       4.000       5.000       2.000       3.000
r3       4.000       5.000       2.000       3.000       6.000       1.000
r4       3.000       4.000       1.000       2.000       5.000       6.000
r5       2.000       3.000       6.000       1.000       4.000       5.000
r6       1.000       2.000       5.000       6.000       3.000       4.000


To solve the KenKen puzzle as a Mixed Integer Program, we can start with a data structure similar as used in the Sodoku formulation: \(x_{i,j,k}\in\{0,1\}\). Using different linearizations from the ones shown in (6) we can solve then KenKen problem easily with a MIP solver.


  1. All-different and Mixed Integer Programming,
  2. Sudoku,
  3. Sudoku Puzzle,
  4. KenKen,
  5. KenKen Puzzle,
  6. Vardges Melkonian, “An Integer Programming Model for the KenKen Problem”, American Journal of Operations Research, 2016, 6, 213-225. [link] Contains a complete AMPL model.
  7. Andrew C. Bartlett, Timothy P. Chartier, Amy N. Langville, Timothy D. Rankin, “An Integer Programming Model for the Sudoku Problem”, The Journal of Online Mathematics and Its Applications, Vol.8, 2008, [link]

Wednesday, October 12, 2016

Optimization: Retraction

Free first page

We, the Editors and Publishers of Optimization: A Journal of Mathematical Programming and Operations Research, are retracting the following article:

Fridoun Moradlou and Sattar Alizadeh, Strong Convergence Theorem by a New Iterative Method for Equilibrium Problems and Symmetric Generalized Hybrid Mappings,

We are now aware of a substantially similar version of this article which was previously submitted to and published in Mediterranean Journal of Mathematics by the same authors:

Sattar Alizadeh and Fridoun Moradlou, A Strong Convergence Theorem for Equilibrium Problems and Generalized Hybrid Mappings, Mediterranean Journal of Mathematics, DOI: 10.1007/s00009-014-0462-6,

This action constitutes a breach of warranties made by the authors with respect to originality and of our policy on publishing ethics and integrity. We note we received, peer-reviewed, accepted, and published the article in good faith based on these warranties, and censure this action.

Publishers seem to become more aggressive (may be assertive is a better word) about these things and retract more often. The web site offers some fascinating reading.

Monday, October 10, 2016

One more P=NP proof?


Abtract. We present a polynomial-sized linear program (LP) for the \(n\)-city TSP drawing upon "complex flow" modeling ideas by the first two authors who used an \(O(n^9) \times O(n^8)\) model*. Here we have only \(O(n^5)\) variables and \(O(n^4)\) constraints. We do not model explicit cycles of the cities, and our modeling does not involve the city-to-city variables-based, traditional TSP polytope referred to in the literature as "The TSP Polytope." Optimal TSP objective value and tours are achieved by solving our proposed LP. In the case of a unique optimum, the integral solution representing the optimal tour is obtained using any LP solver (solution algorithm). In the case of alternate optima, an LP solver (e.g., an interior-point solver) may stop with a fractional (interior-point) solution, which (we prove) is a convex combination of alternate optimal TSP tours. In such cases, one of the optimal tours can be trivially retrieved from the solution using a simple iterative elimination procedure we propose. We have solved over a million problems with up to 27 cities using the barrier methods of CPLEX, consistently obtaining all integer solutions. Since LP is solvable in polynomial time and we have a model which is of polynomial size in \(n\), the paper is thus offering (although, incidentally) a proof of the equality of the computational complexity classes "P" and "NP". The non-applicability and numerical refutations of existing extended formulations results (such as Braun et al. (2015) or Fiorini et al. (2015) in particular) are briefly discussed in an appendix. [*: Advances in Combinatorial Optimization: Linear Programming Formulation of the Traveling Salesman and Other Hard Combinatorial Optimization Problems (World Scientific, January 2016).]

Unfortunately there are about as many \(P=NP\) proofs around as there are \(P\ne NP\) proofs. See (1) for a large list. The first author is already on the list with a few earlier papers.

As a practitioner, all of this may not be so important, but there are cases where I see complexity theory being somewhat misused or misunderstood in the following sense. When confronted with a combinatorial problem, a common reaction is to a-priori dismiss Mixed Integer Programming in favor of some heuristic, often accompanied with some mumblings about NP-hard problems. I believe complexity theory says very little about solving a given problem of a given size with an actual MIP solver. Notice also that many state-of-art MIP solvers contain many heuristics. In practice, for large difficult problems, MIP solvers can often (but not always) deliver good solutions quickly, with the added advantage of giving you an indication of how close we are from a possible optimal solution (the gap).


  1. Gerhard J. Woeginger, The P-versus-NP page,

Saturday, October 8, 2016

An alternative linear programming solution technique



Linear superiorization considers linear programming problems but instead of attempting to solve them with linear optimization methods it employs perturbation resilient feasibility-seeking algorithms and steers them toward reduced (not necessarily minimal) target function values. The two questions that we set out to explore experimentally are (i) Does linear superiorization provide a feasible point whose linear target function value is lower than that obtained by running the same feasibility-seeking algorithm without superiorization under identical conditions? and (ii) How does linear superiorization fare in comparison with the Simplex method for solving linear programming problems? Based on our computational experiments presented here, the answers to these two questions are: “yes” and “very well”, respectively.




Linear superiorization (abbreviated: LinSup) considers linear programming (LP) problems wherein the constraints as well as the objective function are linear. It allows to steer the iterates of a feasibilityseeking iterative process toward feasible points that have lower (not necessarily minimal) values of the objective function than points that would have been reached by the same feasiblity-seeking iterative process without superiorization. Using a feasibility-seeking iterative process that converges even if the linear feasible set is empty, LinSup generates an iterative sequence that converges to a point that minimizes a proximity function which measures the linear constraints violation. In addition, due to LinSup’s repeated objective function reduction steps such a point will most probably have a reduced objective function value. We present an exploratory experimental result that illustrates the behavior of LinSup on an infeasible LP problem.

The Huffington Post and the Future of Statistical Computing

Via the Huffington Post copied from Quora:


How do you see statistical computing evolving over the next ten to twenty years?originally appeared on Quora: the knowledge sharing network where compelling questions are answered by people with unique insights.

Answer by Hadley Wickham, Chief Scientist, RStudio, on Quora:

“How do you see statistical computing evolving over the next ten to twenty years?”

“Prediction is very difficult, especially about the future.” - Niels Bohr. I probably shouldn’t answer this question because I’m likely to make predictions that will come back to haunt me. (In 2004, Bill Gates predicted that “spam will be a thing of the past in two years time.”) I’ll take a stab at it anyway:

  1. Open source and open science will continue to gain importance. It will become increasingly difficult for closed source statistical environments to survive.
  2. We are likely to see at least one alternative implementation to R that has substantially better performance. The most promising candidate at this time is rho.
  3. Data storage tools (like databases) will continue to gain more stat/ML tools. The needs of data science will drive much innovation in database space over the next ten years.
  4. A new generation of tools (including both IDEs and ML algorithms) will more fluently combine the strengths of humans and computers.

Thursday, October 6, 2016

Scheduling Business Dinners: an MINLP, MIQCP and MIP model

In (1) a problem is discussed where an optimization model can help us.

We want to set up \(N\) business dinners attended by customers and suppliers with the following peculiarities:

  • there are \(T\) tables,
  • we need to seat \(C\) customers and \(S\) suppliers,
  • at most \(Maxc\) customers and at most \(Maxs\) suppliers can sit at a single table,
  • a customer \(c\) and supplier \(s\) must sit at the same table exactly once,
  • two suppliers can only sit at the same table at most once

We want to minimize the number of dinners \(N\) we need to organize such that all these constraints are met.

Restaurant Flo in Barcelona, Spain [link]

The data

We use the example data from (1):

  • 6 customers, 5 suppliers
  • \(Maxc=3\), \(Maxs=2\)
  • 2 tables

A (non-optimal) solution with 8 dinner rounds is given:


An MINLP model

We start with our central binary variable \(x_{i,t,n}\in\{0,1\}\) indicating if person \(i\) sits at table \(t\) in dinner round \(n\). With this we can develop a Mixed Integer Nonlinear Programming model:

\min\>&\sum_n y_n && && \text{minimize number of dinner rounds} \\
        &\sum_t x_{i,t,n} \le 1 &&\forall i,n&&\text{sit at one table}\\
        &\sum_s x_{s,t,n} \le Maxs &&\forall t,n&&\text{limit suppliers at table}\\
        &\sum_c x_{c,t,n} \le Maxc &&\forall t,n&&\text{limit customers at table}\\
      * \> &\sum_{t,n} x_{c,t,n}\cdot x_{s,t,n} = 1&&\forall c,s&&\text{customers and suppliers meet exactly once}\\
     * \> &\sum_{t,n} x_{s1,t,n} \cdot x_{s2,t,n} \le 1&&\forall s1\ne s2&&\text{suppliers sit at most once at the same table}\\
     * \> &y_n = \max_{i,t} \{x_{i,t,n}\} && \forall n &&\text{we need a dinner round}\\
       &x_{i,t,n},y_n \in \{0,1\}

The equations marked with a * are nonlinear. The variable \(y_n\) indicates whether we need a dinner round \(n\).


A non-convex MIQCP model

To implement the above MINLP we can apply a few modifications:

  • When we checked if suppliers \(s1,s2\) are at the same table, we can skip checking \(s2,s1\). Hence we only need to check suppliers \(s1,s2\) with \(s1<s2\). This saves us a number of equations.
  • The constraint \(y_n = \max_{i,t} \{x_{i,t,n}\}\) can be written as a collection of inequalities \(y_n \ge x_{i,t,n}\) (we use the objective to help us drive down \(y_n\)).
  • Optionally we can require that \(y_n \ge y_{n+1}\), This makes the solutions more readable (we have the used dinner rounds first) but also reduces symmetry.
  • There are more symmetries in the model we can try to alleviate. E.g. order dinner rounds and order tables by some statistic.

We are now only left with quadratic terms in the constraints. Such a model is called a Mixed Integer Quadratically Constrained Problem (MIQCP). Although solvers like Cplex and Gurobi can solve convex MIQCPs, unfortunately this problem is non-convex. That means we need a global solver. The model below can be solved quickly with the solvers GloMIQO, Antigone and Baron. Interestingly, the global solver Couenne is struggling mightily with this model: it has troubles finding a feasible solution.


A linearized MIP model

The binary multiplication can be linearized using a standard reformulation:

\boxed{\begin{align}&z=x\cdot y\\
                             &x,y,z\in \{0,1\}\end{align}}&\Longleftrightarrow&
\boxed{\begin{align}&z \le x\\&z \le y\\&z\ge x+y-1\\&x,y,z\in \{0,1\}\end{align}}

Note that we need to apply this reformulation on the individual product terms in the two quadratic constraints. This means we need to introduce a number of additional variables in the model.

We can simplify the binary multiplication further in the supplier case: we only need to make sure that \(z \ge x+y-1\). That will save us some constraints. The reason hat we can do this is that we have an inequality of the form \(\sum_i x_i \cdot y_i \le 1\) so we are only interested in the product terms that are forced to be one. Hence we have the two different cases:

\[\begin{matrix} \displaystyle\sum_i x_i\cdot y_i = 1 & \Longrightarrow &
\boxed{\begin{align} &z_i \le x_i\\ &z_i \le y_i\\&z_i \ge x_i+y_i-1\\&\sum_i z_i=1\end{align}}\\
\displaystyle\sum_i x_i\cdot y_i \le 1 & \Longrightarrow &
\boxed{\begin{align}&z_i \ge x_i+y_i-1\\&\sum_i z_i\le 1 \end{align}}\end{matrix}\]

I should note that I assume \(x_i,y_i,z_i \in \{0,1\}\). It is nice to see both constructs next to each other in one small model. The complete linearized model can look like:  



This model can now be solved with any MIP solver, including Cplex and Gurobi which were out of the picture for the MIQCP model formulation.

An optimal solution with 3 dinner rounds is:




  1. Alejandra Estanislao, Frédéric Meunier, “A Business Dinner Problem”, Journal of Combinatorial Mathematics and Combinatorial Computing, 97, 173-188, 2016. [link]