Inequality constraint sitting on the bound

Hi Joe,

Further to your comments on the thread “Bad_alloc exception on cpp usage”, in which you write:

“Alright, so why are we getting nans here? Good question and one difficult to answer with the available information here. There are a number of places where the adjoint of the inequality, h'(x)* , could be fed a nan. It could mean that the Lagrange multiplier contains nans. It could mean that the output from the forward operator, h'(x) , contains nans. The essential assumption that we make in Optizelle is that the starting point provided by the user ensures that f(x) , g(x) , h(x) , and all of their derivatives can be evaluated and contain any infs or nans. We also assume that h(x) > 0 . If not, we need to reformulate the problem to add a slack variable since this is not done automatically.”

I have noticed, as you mention, that if you place an initial parameter value on the bounds, for instance x = 1 combined with the inequality constraint x >= 1, this will lead to inf and consequently nan’s.

The inf value arises in the “Jordan product inverse, z ← inv(L(x)) y where L(x) y = x o y.” function linv, if it is called with a zero value in the vector x. This is the case, when a parameter sits on the associated bound.

I was not aware of the need for h(x) > 0 of the initial x, as the documentation as well as the examples shows h(x) >= 0.

Is this requirement only for the initial value or in general ?

I ask because test examples with an optimum parameter being on the bound tends to converge slowly towards the bound without ending up on it.

To be frank, I would like to avoid introducing slack parameters, since quite a few of my parameters can hit the bounds.

Otherwise, thank you very much for making Optizelle available to the public,

Mikker

Yes, that’s correct. The function h(x) needs to be strictly feasible and that should be better documented if it’s not clear. Unfortunately, that’s the interior part of an interior point method. Very specifically, a primal-dual interior point method uses a log barrier and we can’t take the log of zero.

As far as converging slow when close to the boundary, that can also be the case. It sort of depends on a few factors, but let me initially ask: Do you have any equality constraints or only bounds?

Hi Joe,

Thank you for your reply and copy on the function h(x) needing to be strictly feasible for the reason, which you explain.

I have a mix of linear and non-linear equality as well as inequality constraints. In accordance with the documentation, I introduce slack parameters to convert non-linear inequality constraints to pairs of a non-linear equality with an associated affine inequality constraint. These additional slack parameters are often also a convenient mean for monitoring the non-linear expressions.

Similar to some of the other contributors in this forum, I have long used the SQP in NLopt with good success. However, I need to expand my model size to more than 1000 parameters, at which point NLopt slows down noticeably. Therefore, I am intrigued by the matrix-free structure of Optizelle, which should allow me to exploit the sparsity of my system. I have also considered IPOPT but would like to give Optizelle a first try,

Michael

Hi Michael,
Thanks for your interest. In short, Optizelle may work, but there’s a flaw in the algorithm that may cause slow or lack of convergence for that kind of problem.

Specifically, for general nonlinear optimization, there’s a variety of algorithms, but second-order Newton type algorithms tend to be a powerful, general tool. In these algorithms, Newton’s method is applied to the nonlinear equations that require the gradient of the Lagrangian to be zero (optimality) and the residual of the constraints to be zero (feasibility). Now, the problem with this approach is that Newton’s method may not converge if far from the solution, so we need to globalize. In a line-search method, this can be a bit tricky because solving the full Newton system does not guarantee a descent direction without modifying the system. In a trust-region method, solving the full system doesn’t guarantee a solution within the trust-region. In response to this, there are two approaches. Either we modify the full Newton system or we divide the solve into one step toward feasibility and another step toward optimality, which is called a composite step method. Optizelle uses a composite step method. An algorithm like IPOPT works with the overall system directly.

Which method is better is a matter of debate. Personally, I strongly prefer composite step methods because it separates the linear systems for feasibility and optimality, which often have different physics, structure, and scale. It therefore allows some very specialized solution methods and preconditioners for the constraints, which is important for very large mechanics and physics based problems.

This brings us to the interior point method. There are two general approaches to integrating this algorithm with a Newton method. In the first method, the modified complementary slackness condition is solved out of the Newton system. This creates a modification of the gradient and Hessian of the optimality system. This is what both Optizelle and IPOPT do. In the second method, the complementary slackness condition is left in the Newton system, but a slack variable is introduced to the Newton system and the step in the slack variable is scaled by the slack itself. This is what the NITRO algorithm does.

As it turns out, the NITRO approach is really what you want to use with a composite step method, not what Optizelle does. It works well for IPOPT because it is not a composite step method and solves the overall system, so step is handled differently. The reason that Optizelle struggles with this approach is that the step toward feasibility (the quasi-normal step) does not have any information as to where the bound is. This means that it can repeatedly run into the boundary during the feasibility step as well as the during the null-space projection during the optimality step. This is bad and slows down convergence. This does not happen with the NITRO algorithm because the scaling by the slack variable is very clever. Essentially, when the slack variable gets small, it slows down modifications to the step and it makes it less likely that the inequality becomes violated.

When I wrote Optizelle, I didn’t know this and implemented the first approach, which is what the linear SDP solvers did.

Now, algorithmically, this is fixable, but it actually changes how you abstract the algorithm. At the moment, Optizelle combines the algorithms for unconstrained and inequality constrained problems. It also combines the algorithms for equality constrained and fully constrained problems. If one wants to implement the NITRO algorithm, basically, the unconstrained code sits separately from the inequality, equality, and constrained codes. Also, since there’s now always a linear system, especially one that the user did not construct, it makes sense to handle solving the linear system in the default case, which Optizelle doesn’t do.

We have a code that does all of this and more, but it’s not been released. I’d love to and I’ve promised one for years. I’d still like to, but practically it’s been difficult. Essentially, there’s two issues. One, there’s a bunch of algorithmic improvements that should be published ahead of time. Two, our clients don’t exactly pay for us to publish general papers, so I need to expense that cost in overhead. I’m fine with that, but it’s hard to really balance all of this with ongoing work. Honestly, if you know of a friendly editor that wants to publish a sequence of papers that detail this algorithm, please let me know.

Alright, so where this leaves you is: Optizelle is mostly fine for unconstrained, pure equality constrained, and pure inequality constrained problems. It may or may not work for problems with a combination of equality and inequality constraints. If you try the problem and it keeps hitting the boundary during the feasibility step or the nullspace projection, it’s probably going to struggle.

Anyway, hopefully this helps. Let me know if that helps to clarify things and sorry there isn’t an easier answer.

Joe

Hi Joe,

Thank you for your detailed and candid reply - much appreciated!

It has helped me to better understand why my simpler test cases at times struggles to converge towards a bound, until the quadratic convergence finally kicks in.

At the present, my larger scale model converges slower than acceptable. However, this may well be rooted on my side by providing improper 2nd order gradients. I need to verify these gradients before deciding which course to take.

I also wish that I knew an editor for your publications, as I would be keen to try your new code. But not being in the academia, I rarely (never) publicize anything myself.

So for now, I will proceed checking the computed gradients in order to judge Optizelle fairly,

Michael

Hi Michael,
Not a problem! An accurate gradient and Hessian will help no matter what solver you end up using. Best of luck!

Joe

Hi Joe,
I fully agree regarding accurate gradients, especially in multi-dimensional parameter space.
The 1st order have passed all comparisons with finite difference generated counter parts, whereas the comparisons of 2nd order gradients show some discrepancies. Hopefully I can get the latter ironed out - knock on wood.
All the best and once again thank you for your helpful posts,
Michael