Saturday, September 24, 2016

LP/MIP models in R


OMPR (Optimization Modelling Package in R) is a DSL to model and solve Mixed Integer Linear Programs. It is inspired by the excellent Jump project in Julia.

Here are some problems you could solve with this package:

  • What is the cost minimal way to visit a set of clients and return home afterwards?
  • What is the optimal conference time table subject to certain constraints (e.g. availability of a projector)?
  • If you run a radio station :) What is the optimal way to play music such that your users do not have to listen to the same songs too often?

The Wikipedia article gives a good starting point if you would like to learn more about the topic.

This is a beta version. Currently working towards a first stable version for CRAN. At the moment not recommended for production systems / important analyses. Although most obvious bugs should be gone. Happy to get bug reports or feedback.

Current version: 0.3.0

Looks interesting.

Wednesday, September 21, 2016

LP Model with 150 billion equations?!topic/gurobi/zdO8cqJIIv4.

My usual rule of thumb is that up to 10 million equations is doable on normal PC hardware. This is 15000 times over that amount. I am using here the US notion of a billion (i.e. \(10^9\) instead of \(10^{12}\)).

In some cases I feel models are overly detailed. Often the answer to results that don’t match expectations is: “we need to add more detail”. I am convinced this is not always the correct conclusion. Of course building and solving large models is more sexy than small models. But much insight can be learned from really small models.

A problem with \(1.5 \times 10^{11}\) equations probably needs a ton a of memory, say in the order of hundreds of terabytes. Is there a machine with with several hundred terabyte of memory? The Titan supercomputer has about 500 terabyte memory (the link expresses memory in Tebibytes – I had to look that up).

Tuesday, September 20, 2016

Gender-based Seating Arrangement

From this post:

I'm trying to solve a special assignment problem, but I cannot handle it.

The problem is to assign seats to audiences. Each audience has a ranking of the seats based on his/her own preference. I want to assign the seats to the audiences such that the overall satisfaction is maximized.

The above problem is nothing but an assignment problem, which can be solved by the Hungarian algorithm. Now, things becomes a little more complicated. The audiences are divided into male audiences and female audiences. When assigning the seats, there should be at least n empty seats between each male and female. I assume that there are enough seats to accommodate all audiences and that all seats are located in a row.

This looks like an assignment problem with some side-constraints. The assignment part is boilerplate:

\[\boxed{\begin{align}\max\>&\sum_{i,j}s_{i,j}x_{i,j}\\&\sum_j x_{i,j}=1 \>\forall i\\&\sum_i x_{i,j}\le 1\>\forall j\\&x_{i,j}\in \{0,1\}\end{align}}\]

Here \(i\) indicates spectators and \(j\) refers to seats. Modeling the constraint that females cannot sit next to males is more challenging and interesting (from a strict modeling perspective that is: these seating arrangements do not look very attractive to me).

To experiment with some formulations I generate some random data for the preferences \(s_{i,j}\):


If we want to have \(n\) empty seats in between the males and females, then for \(m=m_F+m_M\) persons we need at least \(m+n\) seats.

Quadratic Model

Probably the easiest is to use a quadratic model. First we introduce a distance matrix \(d_{j,j’}\) indicating 1 + the number of seats between seats \(j\) and \(j’\). I.e. this matrix looks like:


Then we need to construct a set \(check_{i,i’}\) which has all unique female-male combinations we need to check. This set arranged as a matrix looks like:


With this data we can now write our side-constraint as:

\[\sum_{j\ne j’} d_{j,j’} x_{i,j} x_{i’,j’} \ge n+1 \>\> \forall check(i,i’) \]

i.e.: for every combination \((i,i’)\) where \(i\) is a female and \(i’\) is a male, placed in seats \((j,j’)\), make sure the distance \(d_{j,j’}\) of their seats is at least \(n+1\).

This formulation is quadratic but unfortunately not convex (the Q matrix of quadratic coefficients is not negative semidefinite). We are lucky: some solvers like Cplex and Gurobi accept it nevertheless: they apply some reformulations behind the scenes. Gurobi really, really wants to let you know and prints:

Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals
Modified 30 Q diagonals

The optimal assignment for \(n=1\) looks like:



The multiplication of two binary variables can be easily linearized. A standard way to linearize \(z=x\cdot y\) with \(x,y \in \{0,1\}\) is:

\[\boxed{\begin{align}&z \le x\\ &z \le y \\ & z\ge x+y-1\\& x,y,z \in \{0,1\}\end{align}}\]

In our case we can simplify a bit:

\[\boxed{\begin{align}&\sum_{j,j’} d_{j,j’} y_{i,j,i’,j’} \ge n+1\\ & y_{i,j,i’,j’} \le x_{i,j}\\ &  y_{i,j,i’,j’} \le x_{i’,j’}\\ & y_{i,j,i’,j’} \in \{0,1\} \end{align}} \]

This formulation leads to a very large model:


Although this linearized formulation finds the optimal solution, clearly, this is not the way to go.

Forbid patterns

A different formulation can be based on forbidding certain patterns in the seating. For \(n=1\) we do not allow F,M or M,F to appear.

The first thing we do is to introduce a variable that tells us if set \(j\) is occupied by a male or female (or empty). We can do this by using two binary variables \(m_j\) and \(f_j\), or we can use a single variable \(g_{j,fm} \in \{0,1\} \) where \(fm = \{f,m\}\) is a set indicating the gender. 

To calculate \(g_{j,fm}\) we can use a simple summation over the females or males only, depending on what \(fm\) is:

\[g_{j,fm}=\sum_{i|gender_{i,fm}} x_{i,j} \]

Variable \(g_{i,fm}\) will look like:


This is kind of an aggregation of the variable \(x_{i,j}\). Based on \(g_{j,fm}\) we can formulate a set of equations, again assuming \(n=1\):


Of course I did not write these individual equations. Rather I created and populated the 4 dimensional set \(pattern_{k,j,j’,fm}\) and introduced just one equation block:

\[\sum_{j’,fm|pattern_{k,j,j’,fm}} g_{j’,fm} \le 1 \>\>\forall j,k\]

This linear formulation has 83 variables and 180 equations and solves very fast. For \(n>1\) we can formulate generalizations of this scheme.

Different problem: no males next to each other

A slightly different problem is actually easier to formulate: make sure no males are sitting next to each other (because they start to fight?). Also let’s apply the same rule for the women. Again we can use the scheme where we forbid patterns. Here we forbid F,F and M,M.  This we can do by only allowing up to one F or M in two consecutive seats.

First we create a simple set that gives us the groups of two seats next to each other. Call this set \(group_{j,j’}\). It looks like:


We can formulate the spacing equation simply as:

\[\sum_{i|gender_{i,fm}} \sum_{j’|group_{j,j’}} x_{i,j’} \le 1\>\> \forall j,fm\]

Actually we can improve on this a little bit. Using extra variables \(g_{j,fm}\) indicating the gender of the person in seat \(j\) as defined in the previous paragraph we can write:

\[\begin{align}&g_{j,fm}=\sum_{i|gender_{i,fm}} x_{i,j}\>\> \forall j,fm\\&\sum_{j’|group_{j,j’}} g_{j’,fm} \le 1\>\>\forall j,fm \end{align}\]

Although we add extra variables and equations, this makes the model more readable, and actually sparser (i.e. fewer non-zero elements). Often larger and sparser is better than smaller and denser.

The results looks like:


GAMS and Trump

A new study based on an input-output model (written in GAMS) indicates that the trade policies proposed by presidential candidate Donald Trump could have substantial negative effects.

The principle of comparative advantages in trade was called by Paul Samuelson: an economic idea that is both universally true and not obvious. The not so intuitive part of this seems prevalent in the political discussions and on the campaign trail: most arguments are not  based on economics.

  1. Marcus Noland, Gary Clyde Hufbauer, Sherman Robinson, and Tyler Moran, “Assessing Trade Agendas in the US Presidential Campaign”,
  2. Donald J. Boudreaux, “Comparative Advantage”, The Concise Encyclopedia of Economics,
  3. Erwin Kalvelagen, “Solving Systems of Linear Equations with GAMS”, Has an example demonstrating some techniques relating to Input-Output analysis.

Wednesday, September 14, 2016

A New Architecture for Optimization Modeling Frameworks


AbstractWe propose a new architecture for optimization modeling frameworks in which solvers are expressed as computation graphs in a framework like TensorFlow rather than as standalone programs built on a low-level linear algebra interface. Our new architecture makes it easy for modeling frameworks to support high performance computational platforms like GPUs and distributed clusters, as well as to generate solvers specialized to individual problems. Our approach is particularly well adapted to first-order and indirect optimization algorithms. We introduce cvxflow, an open-source convex optimization modeling framework in Python based on the ideas in this paper, and show that it outperforms the state of the art.

The issues addressed in this paper are typically not the ones I am struggling with most of the time. Once more an indication that “optimization modeling” has a very different meaning for different modelers. 

Monday, September 12, 2016

Getting rid of fly ash from coal-fired power plants

Fly ash is a by-product of coal burning power plants.

Fly ash at Cockenzie Power Station. Image by Richard Webb.

In (1) a model is presented where a group of six Chinese power plants tries to optimally sell and ship fly ash to different markets. The leftover fly ash needs to be disposed of at a certain cost. Typically disposal means using landfills. A well-known use of fly ash is as a replacement for portland cement in producing concrete. Fly ash can make the concrete stronger and more durable (2). The plants are operating under single ownership so they coordinate their actions: there is no competitive behavior between the plants and we are interested to optimize the total profit of the group.

Bricks made from Fly Ash. Image by Thamizhpparithi Maari.

The model and the paper are not very interesting. The model is rather simple, the English text in the paper is often difficult to follow and I believe the model is not correctly categorized as a Goal Programming and Scheduling Model (it is neither a scheduling model nor a goal programming model). The LP model in (1) is just:


Here \(x_{i,j}\) is a non-negative variable indicating the optimal shipping patterns from plant \(i\) to market \(j\), and \(y_i\) is the amount of fly ash we needed to dispose of. The total output of each plant is \(q_i\). Note that market price \(p_j\) and the demand are given. This is essentially a transportation model plus a little bit of logic to handle the split between selling and disposal. The results are:


This is where the paper (1) stops. However I think we can make something more interesting out of this.

Although the disposal cost are relatively small compared to the overall revenue and profit, an economist would tell you we can do better than this. By lowering the price we can increase demand and thus be able to reduce the amount we need to dispose of. In the following we will set up a scenario to see if this economist is correct. Let’s say that the demand responds to a price decrease according to a fixed price elasticity \(\eta<0\). I.e. we have:

\[\frac{\partial \ln d_j}{\partial \ln p_j} = \eta\]

An example demand curve can look like the picture below. Although the demand curve is expressing demand as a function of price: \(d = f(p)\), economists typically make graphs with the price \(p\) on the \(y\)-axis and the quantity \(d\) on the \(x\)-axis. 


We assume here a constant price elasticity. This will result in a nonlinear demand curve. In the model we can now use the variables \(p_j\) to calculate the demand:

\[d_j = \beta_j \cdot p_j^{\eta} \]

The constant \(\beta_j\) can be found by plugging in our original constants for the price and the demand (this is called “calibration”). I.e.:

\[\beta_j := \frac{d^0_j}{(p^0_j)^\eta}\]

Assuming we don’t want to increase prices on existing customers we have: \(p_i \le p^0_j\). Quite the opposite: we want to lower prices in exchange for additional demand. Let’s further assume \(\eta = –2\) i.e. a 1% decrease in price leads to a 2% increase in demand. The model now looks like:


Note that prices (and thus the derived demand) are now endogenous instead of exogenous. As this is now an NLP model I would recommend to set some initial values for variables \(p\) and \(x\) (these are the non-linear variables). The results are:



So I would recommend to send some salespeople to markets 4, 6 and 8 and see if in exchange for a price drop the buyers are willing to purchase more. As an incentive, some of the profit improvement can be used as a bonus for these sales teams.

Of course if markets start to trade between them, this scheme may need some refinement. Another question: what about this \(\eta=-2\) elasticity? What if \(\eta=-1\) or \(\eta=-0.5\)? I ran these cases, and for \(\eta=-1\) we get rid of a some but not all disposal. The profit increases by a smaller amount than with \(\eta=-2\). With \(\eta=-0.5\) we don’t see any change compared to the original model (we have the same amount of ash to dispose of, and obtain the same profit).

  1. Ming Fu, Lu Wang, Yong-Lan Xu, Dong-Xiao Niu, Tian-Nan Ma,"The Study of Fly Ash Scheduling Optimization Model in Coal-Fired Power Plant". In Steven Y. Liang, Energy and Mechanical Engineering, Proceedings of 2015 International Conference on Energy and Mechanical Engineering.

Thursday, September 8, 2016




The StructJuMP package provides a parallel algebraic modeling framework for block structured optimization models in Julia. StructJuMP, originally known as StochJuMP, is tailored to two-stage stochastic optimization problems and uses MPI to enable a parallel, distributed memory instantiation of the problem. StructJuMP.jl is an extension of the JuMP.jl package, which is as fast as AMPL and faster than any other modeling tools such as GAMS and Pyomo (see this).

Nonlinear solvers

Problems modeled in StructJuMP models can be solved in parallel using the PIPS-NLP parallel optimization solver. In addition, StructJuMP models can be solved (in serial only) using Ipopt. The solver interface and glue code for PIPS-NLP and Ipopt are located at StructJuMPSolverInterface.

The readme is at:

Friday, September 2, 2016


Suppose we want to implement an interpolation scheme for some 2D function \(z=f(x,y)\). We assume we have some data points \((x_i,y_j,z_{i,j})\) where the \((x_i,y_j)\) values form a grid. The grid points do not have to be equidistant, but the rows and columns must align:


All these three examples are valid for this approach.

A 2D SOS2 based interpolation scheme can be expressed as:


We force that up to four neighboring elements can be non-zero in the matrix \(\lambda_{i,j}\). This is accomplished by calculating row- and column-sums and then saying up to two consecutive elements of these vectors are nonzero. This scheme is rather simple but it actually works.

A problem

As indicated in the comments below, this formulation will not always give you unique solutions.

Consider the really small grid \(x=[0,1]\), \(y=[0,1]\) with the following data (using \(z=x\cdot y\)):

\[\begin{matrix}x&y&z\\ \hline 0&0&0\\0&1&0\\1&0&0\\1&1&1\end{matrix}\]

Our grid of \((x,y,z)\) values now looks like:

\[\left(\begin{matrix}(0,1,0)&(1,1,1)\\  (0,0,0)&(1,0,0)\end{matrix}\right)\]

Assume we want to find an approximation for \(f(0.5, 0.5)\). Using the above formulation we can see the following values:

\[\lambda=\left(\begin{matrix}0&0.5\\0.5&0\end{matrix}\right)\] \(z=0.5\)
\[\lambda=\left(\begin{matrix}0.25&0.25\\0.25&0.25\end{matrix}\right)\] \(z=0.25\)
\[\lambda=\left(\begin{matrix}0.5&0\\0&0.5\end{matrix}\right)\] \(z=0\)

So depending on the situation we can find different interpolated values for \(z\). Obviously this is not very satisfactory. In the next paragraphs I will discuss some countermeasures we can take.


Instead of interpolating between four points we can also interpolate between three points. Schematically our grid becomes:


Note that we also could have chosen diagonals that slope down (I discuss this in more detail later). If we have \(m\) \(x\)-values and \(n\) \(y\)-values, we end up with \(n+m-1\) diagonals. We can see this easily from:


Basically the idea is to select also two neighboring diagonals. This is again easily expressed using a SOS2 set. The \(\lambda\)’s along a diagonal are described by \(\lambda_{i,i+t}\) where \(t\) is a fixed shift or offset (note that \(t\) will be negative as well as zero and positive). Obviously, the diagonal that passes through \(\lambda_{1,1}\), \(\lambda_{2,2}\), etc., has \(t=0\). I.e. the condition to add is:

\[\boxed{\begin{align}&\textbf{sos2 variable }\beta_t\\&\beta_t = \sum_i \lambda_{i,i+t}\end{align}}\] (1)

Does this help us with the non-uniqueness? The solutions with

\[\lambda=\left(\begin{matrix}0.25&0.25\\0.25&0.25\end{matrix}\right)\] \(z=0.25\)
\[\lambda=\left(\begin{matrix}0.5&0\\0&0.5\end{matrix}\right)\] \(z=0\)

are no longer feasible. We are left with only:

\[\lambda=\left(\begin{matrix}0&0.5\\0.5&0\end{matrix}\right)\] \(z=0.5\)

Unfortunately this is not the best approximation. Note that the point \((x,y)=(0.5,0.5)\) is actually part of two possible triangles so we can see two possible configurations in cases like this.

\[\lambda=\left(\begin{matrix}0&0.5\\0.5&\end{matrix}\right)\] \(z=0.5\)
\[\lambda=\left(\begin{matrix}&0.5\\0.5&0\end{matrix}\right)\] \(z=0.5\)

These configurations have the same approximation. If a point is precisely on a diagonal it is part of two triangles, but as we have seen this is not an issue as only two \(\lambda_{i,j}\)’s are nonzero, making the interpolation unique again.

Modeling in GAMS

Equation (1) is not directly implementable in GAMS. If variable \(\beta_t\) is indexed by \(t\) and variable \(\lambda_{i,j}\) by \((i,j)\), we need to map between \(t\) and \((i,j)\). One possible way to do this is as follows:

abort$(card(t)<>card(i)+card(j)-1) "set t has wrong size";
ord(j)=ord(i)+ord(t)-card(j)) = yes

sos2 variable
defbeta(t).. beta(t) =e=
(map(i,j,t), lambda(i,j));

The advantage of using an auxiliary set map is that we can develop this independently of the equation and debug this before running actual the model. The resulting equation is now very simple.

For our small 2x2 example, the mapping set looks like:

----     50 SET map 

               t1          t2          t3

i1.j1                     YES
i1.j2                                 YES
i2.j1         YES
i2.j2                     YES

This corresponds to:


Downward sloping diagonals

Instead of upward sloping diagonals, we can use downward sloping ones instead.


The gams code to populate our mapping set ‘map’ for this case is not very complicated:

map(i,j,t)$(ord(i)+ord(j)=ord(t)-1) = yes;


If we can make the function \(f(x,y)\) separable, we may be able to use a different scheme. E.g. assume \(f(x,y)=x\cdot y\) with \(x,y>0\). We can make this separable by applying a log transformation: \(\log(z)=\log(x)+\log(y)\). We can implement each of the functions \(z’=\log(z),x’=\log(x),y’=\log(y)\) using standard 1D piecewise linear formulations. Then we just add the constraint: \(z’=x’+y’\).

Of course another alternative is using a nonlinear solver.

  1. H.P.Williams, “Model Building in Mathematical Programming”, 5th edition, Wiley, 2013.
  2. Misener, R. & Floudas, C.A., “Piecewise-Linear Approximations of Multidimensional Functions”, J. Optim. Theory Appl. (2010), pp.145-120.
    In this paper SOS1 sets are used for the \(\delta\) and \(\gamma\) variables and SOS2 sets for the \(\beta\) variables. Sadly, Chris Floudas passed away last month while vacationing in Greece.
  3. GAMS: Piecewise linear functions with SOS2 variables. Basic GAMS syntax and rules for SOS2 sets.
  4. 2D Interpolation With SOS2 variables. Example GAMS model.

Wednesday, August 31, 2016

Semi-continuous variables

A semi-continuous (or semicontinuous) variable \(x\) is a continuous variable between bounds \([\ell,u]\) that also can assume the value zero. This is sometimes written as:
\[x \in \{0\} \cup [\ell,u]\]

I also see sometimes \(x \in 0 \cup [\ell,u]\). I am not sure which version is more correct. We assume \(0 \le \ell \le u\) (negative bounds are not very well defined in conjunction with semi-continuous variables).

Semi-continuous variables are often used in situations where we don’t like to see very small values. A standard example is a portfolio optimization problem where many small allocations are not very attractive (e.g. because of a transaction costs). Instead we prefer fewer instruments in the portfolio with larger allocations. This situation can be handled using semi-continuous variables.

There are a number of ways we can implement this. Some useful, some not so much.

\[\boxed{\begin{align}&\textbf{semi-continuous variable}\> x\\&x\in[\ell,u]\end{align}}\] Many MIP solvers have a built-in type for semi-continuous variables. The values \(\ell, u\) need to be specified as bounds on \(x\). (Don’t specify them in equations, as in this case \(x\ge\ell\) has a different meaning than \(x^{lo}:=\ell\)).
\[\boxed{\begin{align}&\ell\cdot \delta \le x \le u\cdot \delta\\&\delta \in \{0,1\}\\&x\in[0,u]\end{align}}\] This is a standard way to model semi-continuous behavior using an extra binary variable \(\delta\). In most cases the “sandwich” equation needs to be implemented as two separate constraints. The bounds on \(x\) are not so relevant (e.g. we could have used a free variable).
\[\boxed{\begin{align}&x=\delta \cdot y\\&\delta\in \{0,1\}\\&y\in [\ell,u]\\&x\in [0,u]\end{align}}\] A simple, intuitive non-linear (quadratic) formulation. Unfortunately many MIQP/MIQCP solvers will not accept this construct. We see many interesting and rather scary error messages. My favorite:
Return code - 5010 [MSK_RES_ERR_MIO_INTERNAL]: A fatal error occurred in the mixed 
integer optimizer. Please contact MOSEK support.
(May be you are asked to contact the Mosek people to apologize for doing this). The reason for all this havoc: a nonlinear equality constraint makes the model non-convex. In some cases we can replace \(x=\delta\cdot y\) by an inequality to make things digestible for convex solvers.
\[\boxed{\begin{align}&x^2\ge \ell\cdot x\\&x\in [0,u]\end{align}}\] Very compact and looks harmless. So nice try, but unfortunately also non-convex. There is just no way to create a convex formulation without the aid of some form of binary (or at least discrete) variables.

Every post should have preferably some graph, so here we go. This plots \(x^2\ge \ell x\) for \(\ell=10\).