Techniques for caching intermediate computations

I’ve just written my first small test program, modelled after the simple_equality test. Upon logging I found that Optizelle recomputes the gradient lots of times. Because my real life program will have hundreds of variables a key-value cache is impractical. As an alternative it would be nice to assign a unique (sequential) index to every iteration of x. This index would be used as a key to the cache instead of x.

One way to achieve this might be to wrap the index alongside Vector in Optizelle::<Real>::Rm, although that means I will have to create a new vector space.

Is this an approach you would recommend? Or would you recommend an alternative approach?

BTW, I am working with C++

I have created said vector space and can now track the number of modifications to an object. It strikes me that Optizelle creates a LOT of objects even for a small, two-variable, problem and I am concerned how this would scale in high dimensional problems. I am aware that the large number of “trials” could be because algorithm_class defaults to TrustRegion which is an “exploratory” algorithm. However upon changing the algorithm to line search, in code, like so state.algorithm_class = Optizelle::AlgorithmClass::LineSearch did nothing to reduce the number of objects that Optizelle creates. It could be that I’ve got to set few more parameters to get line search working properly, but this is the limit of my knowledge at this stage. I can post my code if required.

Some context. I am looking to migrate my code from NLOPT to Optizelle because NLOPT’s SLSQP’s performance is dominated (>99%) by the low-level Fortran routines, h12_ and ddot_sl_. However at the moment NLOPT SLSQP is still faster on my the initial small (two-variable) test problem :-(. I am certain I am doing something wrong because SLSQP is very old code (from 1983) that does not leverage modern hardware and ought to be outperformed by Optizelle.

Update: posting an older version of two-variable problem without the custom vector space.

#include <optizelle/optizelle.h>
#include <optizelle/vspaces.h>

#include <iostream>
#include <iomanip>
#include <cstdlib>

#define LOG_FUNC_CALLS
#define PRN_X   "x = [" << x[0] << "," << x[1] << "]"
#define PRN_dX "dx = [" << dx[0] << "," << dx[1] << "]"
#define PRN_dY "dy = [" << dy[0] << "]"

template <typename Real>
Real sq(Real const & x){
    return x*x;
}

struct MyObj
    : public Optizelle::ScalarValuedFunction <double,Optizelle::Rm>
{
    typedef Optizelle::Rm <double> X;
    
    // Evaluation
    double eval(X::Vector const & x) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyObj::eval(" << PRN_X << ")" << std::endl;
#endif
        double x0_2 = pow(x[0],2);
        double x0_3 = pow(x[0],3);
        double x1_2 = pow(x[1],2);
        double x1_3 = pow(x[1],3);

        return (10*(5824*pow(x[0],4) - 7696*x0_3 + 11648*x0_2*x1_2 - 7696*x0_2*x[1] + 2822*x0_2 - 7696*x[0]*x1_2 - 6313*x[0]*x[1] + 253*x[0] + 5824*pow(x[1],4) - 7696*x1_3 + 2822*x1_2 + 253*x[1] + 7651))/9;
    }

    // Gradient
    void grad(
        X::Vector const & x,
        X::Vector & grad
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyObj::grad(" << PRN_X << ")" << std::endl;
#endif
        double x0_2 = pow(x[0],2);
        double x0_3 = pow(x[0],3);
        double x1_2 = pow(x[1],2);
        double x1_3 = pow(x[1],3);

        grad[0] = (10*(23296*x0_3 - 23088*x0_2 + 23296*x[0]*x1_2 - 15392*x[0]*x[1] + 5644*x[0] - 7696*x1_2 - 6313*x[1] + 253))/9;

        grad[1] = (10*(23296*x0_2*x[1] - 7696*x0_2 - 15392*x[0]*x[1] - 6313*x[0] + 23296*x1_3 - 23088*x1_2 + 5644*x[1] + 253))/9;
    }
    
    // Hessian-vector product
    void hessvec(
        X::Vector const & x,
        X::Vector const & dx,
        X::Vector & H_dx
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyObj::hessvec(" << PRN_X << " " << PRN_dX << ")" << std::endl;
#endif
        double g11 = 153920*x[0];
        double g9 = 232960*x[0]*x[0] - g11;
        double g12 = 153920*x[1];
        double g10 = 232960*x[1]*x[1] - g12;
        double d2dx2 = (g10 + 3*g9 + 56440)/9;
        double d2dy2 = (3*g10 + g9 + 56440)/9;
        double d2dxy = (465920*x[1]*x[0] - g12 - g11 - 63130)/9;
        
        H_dx[0] = d2dx2*dx[0] + d2dxy*dx[1];
        H_dx[1] = d2dxy*dx[0] + d2dy2*dx[1];
    }
};

// c(x,y)=(x - 1)**2 + y**2 - 0.81
struct MyEq
    :public Optizelle::VectorValuedFunction<double,Optizelle::Rm,Optizelle::Rm>
{
    typedef Optizelle::Rm <double> X;
    typedef Optizelle::Rm <double> Y;
    
    // y=g(x)
    void eval(
        X::Vector const & x,
        Y::Vector & y
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::eval(" << PRN_X << ")" << std::endl;
#endif
        y[0] = sq(x[0] - 1) + sq(x[1]) - 0.81;
    }
    
    // y=g'(x)dx
    void p(
        X::Vector const & x,
        X::Vector const & dx,
        Y::Vector & y
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::p(" << PRN_X << " " << PRN_dX << ")" << std::endl;
#endif
        y[0] = 2*(x[0] - 1)*dx[0] + 2*x[1]*dx[1];
    }
    
    // xhat=g'(x)*dy
    void ps(
        X::Vector const & x,
        Y::Vector const & dy,
        X::Vector & xhat
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::ps(" << PRN_X << " " << PRN_dY << ")" << std::endl;
#endif
        xhat[0] = 2*(x[0] - 1)*dy[0];
        xhat[1] = 2*x[1]*dy[0];
    }
    
    // xhat=(g''(x)dx)*dy
    void pps(
        X::Vector const & x,
        X::Vector const & dx,
        Y::Vector const & dy,
        X::Vector & xhat
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::pps(" << PRN_X << " " << PRN_dY << " " << PRN_dY << ")" << std::endl;
#endif
        xhat[0] = 2.0*dx[0]*dy[0];
        xhat[1] = 2.0*dx[1]*dy[0];
    }
};

// Define a Schur preconditioner for the equality constraints
struct MyPrecon:
    public Optizelle::Operator <double,Optizelle::Rm,Optizelle::Rm>
{
public:
    typedef Optizelle::Rm <double> X;
    typedef X::Vector X_Vector;
    typedef Optizelle::Rm <double> Y;
    typedef Y::Vector Y_Vector;
private:
    X_Vector& x;
public:
    MyPrecon(X::Vector& x_) : x(x_) {}
    void eval(Y_Vector const & dy,Y_Vector & result) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyPrecon::eval(" << x[0] << "," << x[1] << ")" << std::endl;
#endif
        result[0] = dy[0]/(4*sq(x[0] - 1) + 4*sq(x[1]));
    }
};


int main()
{
    // Create a type shortcut
    using Optizelle::Rm;
    
    auto x = std::vector <double> {1.0, 1.0};
    
    // Allocate memory for dual
    auto y = std::vector <double> (1);
    
    Optizelle::EqualityConstrained <double,Rm,Rm>::State::t state(x,y);
    state.algorithm_class = Optizelle::AlgorithmClass::LineSearch;
    
    Optizelle::EqualityConstrained <double,Rm,Rm>::Functions::t fns;
    fns.f.reset(new MyObj);
    fns.g.reset(new MyEq);
    fns.PSchur_left.reset(new MyPrecon(state.x));
    
    Optizelle::EqualityConstrained <double,Rm,Rm>::Algorithms::getMin(
        Optizelle::Messaging::stdout,fns,state);

    std::cout << "The algorithm converged due to: " <<
        Optizelle::OptimizationStop::to_string(state.opt_stop) <<
        std::endl;
    std::cout << std::scientific << std::setprecision(16)
        << "The optimal point is: (" << state.x[0] << ','
	<< state.x[1] << ')' << std::endl;
    
    return EXIT_SUCCESS;
}

I’m really sorry that I missed this message earlier. It appears as though my hosting provider ran an upgrade that required my own account to be reauthenticated, so I was missing notices from the forum.

Alright, long story short, here’s a piece of code that mostly accomplishes what you want:

#include <optizelle/optizelle.h>
#include <optizelle/vspaces.h>

#include <iostream>
#include <iomanip>
#include <cstdlib>

#define LOG_FUNC_CALLS
#define PRN_X   "x = [" << x[0] << "," << x[1] << "]"
#define PRN_dX "dx = [" << dx[0] << "," << dx[1] << "]"
#define PRN_dY "dy = [" << dy[0] << "]"

template <typename Real>
Real sq(Real const & x){
    return x*x;
}

struct MyObj
    : public Optizelle::ScalarValuedFunction <double,Optizelle::Rm>
{
    typedef Optizelle::Rm <double> X;

    // Location of the last computed iterate
    mutable std::unique_ptr <X::Vector> x_cache;

    // Cache of the last computed gradient
    mutable std::unique_ptr <X::Vector> grad_cache;

    // Evaluation
    double eval(X::Vector const & x) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyObj::eval(" << PRN_X << ")" << std::endl;
#endif
        double x0_2 = pow(x[0],2);
        double x0_3 = pow(x[0],3);
        double x1_2 = pow(x[1],2);
        double x1_3 = pow(x[1],3);

        return (10*(5824*pow(x[0],4) - 7696*x0_3 + 11648*x0_2*x1_2 - 7696*x0_2*x[1] + 2822*x0_2 - 7696*x[0]*x1_2 - 6313*x[0]*x[1] + 253*x[0] + 5824*pow(x[1],4) - 7696*x1_3 + 2822*x1_2 + 253*x[1] + 7651))/9;
    }

    // Gradient
    void grad(
        X::Vector const & x,
        X::Vector & grad
    ) const {

        // Use a cached value
        if(x_cache && *x_cache == x) {
            grad = *grad_cache;

        // Compute a new value and cache it
        } else {

            // Allocate the initial memory for the cached value and gradient
            if(!x_cache) {
                x_cache = std::make_unique <X::Vector> (X::init(x));
                grad_cache = std::make_unique <X::Vector> (X::init(x));
            }

            // Compute the gradient
#ifdef LOG_FUNC_CALLS
            std::cout << "MyObj::grad(" << PRN_X << ")" << std::endl;
#endif
            double x0_2 = pow(x[0],2);
            double x0_3 = pow(x[0],3);
            double x1_2 = pow(x[1],2);
            double x1_3 = pow(x[1],3);

            grad[0] = (10*(23296*x0_3 - 23088*x0_2 + 23296*x[0]*x1_2 - 15392*x[0]*x[1] + 5644*x[0] - 7696*x1_2 - 6313*x[1] + 253))/9;

            grad[1] = (10*(23296*x0_2*x[1] - 7696*x0_2 - 15392*x[0]*x[1] - 6313*x[0] + 23296*x1_3 - 23088*x1_2 + 5644*x[1] + 253))/9;

            // Cache the iterate and gradient
            X::copy(x,*x_cache);
            X::copy(grad,*grad_cache);
        }
    }

    // Hessian-vector product
    void hessvec(
        X::Vector const & x,
        X::Vector const & dx,
        X::Vector & H_dx
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyObj::hessvec(" << PRN_X << " " << PRN_dX << ")" << std::endl;
#endif
        double g11 = 153920*x[0];
        double g9 = 232960*x[0]*x[0] - g11;
        double g12 = 153920*x[1];
        double g10 = 232960*x[1]*x[1] - g12;
        double d2dx2 = (g10 + 3*g9 + 56440)/9;
        double d2dy2 = (3*g10 + g9 + 56440)/9;
        double d2dxy = (465920*x[1]*x[0] - g12 - g11 - 63130)/9;
        
        H_dx[0] = d2dx2*dx[0] + d2dxy*dx[1];
        H_dx[1] = d2dxy*dx[0] + d2dy2*dx[1];
    }
};

// c(x,y)=(x - 1)**2 + y**2 - 0.81
struct MyEq
    :public Optizelle::VectorValuedFunction<double,Optizelle::Rm,Optizelle::Rm>
{
    typedef Optizelle::Rm <double> X;
    typedef Optizelle::Rm <double> Y;

    // Location of the last computed iterate
    mutable std::unique_ptr <X::Vector> x_cache;

    // Cache of the last computed derivative
    mutable std::unique_ptr <X::Vector> gp_cache;
    
    // y=g(x)
    void eval(
        X::Vector const & x,
        Y::Vector & y
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::eval(" << PRN_X << ")" << std::endl;
#endif
        y[0] = sq(x[0] - 1) + sq(x[1]) - 0.81;
    }

    // g'(x)
    void gp(
        X::Vector const & x
    ) const {
        // Allocate the initial memory for the cached value and derivative
        if(!x_cache) {
            x_cache = std::make_unique <X::Vector> (X::init(x));
            gp_cache = std::make_unique <std::vector <double>>(2);
        }
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::gp(" << PRN_X << ")" << std::endl;
#endif

        // Compute the derivative
        (*gp_cache)[0] = 2*(x[0] - 1);
        (*gp_cache)[1] = 2*x[1];

        // Save the iterate
        X::copy(x,*x_cache);
    }

    // y=g'(x)dx
    void p(
        X::Vector const & x,
        X::Vector const & dx,
        Y::Vector & y
    ) const {

        // Compute a new value and cache it
        if(!x_cache || *x_cache != x) {
            gp(x);
        }
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::p(" << PRN_X << " " << PRN_dX << ")" << std::endl;
#endif

        // Compute the matrix-vector product
        y[0] = (*gp_cache)[0]*dx[0] + (*gp_cache)[1]*dx[1];
    }

    // xhat=g'(x)*dy
    void ps(
        X::Vector const & x,
        Y::Vector const & dy,
        X::Vector & xhat
    ) const {

        // Compute a new value and cache it
        if(!x_cache || *x_cache != x) {
            gp(x);
        }
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::ps(" << PRN_X << " " << PRN_dY << ")" << std::endl;
#endif

        // Compute the matrix-vector product
        xhat[0] = (*gp_cache)[0] * dy[0];
        xhat[1] = (*gp_cache)[1] * dy[0];
    }

    // xhat=(g''(x)dx)*dy
    void pps(
        X::Vector const & x,
        X::Vector const & dx,
        Y::Vector const & dy,
        X::Vector & xhat
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::pps(" << PRN_X << " " << PRN_dY << " " << PRN_dY << ")" << std::endl;
#endif
        xhat[0] = 2.0*dx[0]*dy[0];
        xhat[1] = 2.0*dx[1]*dy[0];
    }
};

// Define a Schur preconditioner for the equality constraints
struct MyPrecon:
    public Optizelle::Operator <double,Optizelle::Rm,Optizelle::Rm>
{
public:
    typedef Optizelle::Rm <double> X;
    typedef X::Vector X_Vector;
    typedef Optizelle::Rm <double> Y;
    typedef Y::Vector Y_Vector;
private:
    X_Vector& x;
public:
    MyPrecon(X::Vector& x_) : x(x_) {}
    void eval(Y_Vector const & dy,Y_Vector & result) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyPrecon::eval(" << x[0] << "," << x[1] << ")" << std::endl;
#endif
        result[0] = dy[0]/(4*sq(x[0] - 1) + 4*sq(x[1]));
    }
};

int main()
{
    // Create a type shortcut
    using Optizelle::Rm;
    
    auto x = std::vector <double> {1.0, 1.0};
    
    // Allocate memory for dual
    auto y = std::vector <double> (1);
    
    Optizelle::EqualityConstrained <double,Rm,Rm>::State::t state(x,y);
    state.algorithm_class = Optizelle::AlgorithmClass::LineSearch;
    
    Optizelle::EqualityConstrained <double,Rm,Rm>::Functions::t fns;
    fns.f.reset(new MyObj);
    fns.g.reset(new MyEq);
    fns.PSchur_left.reset(new MyPrecon(state.x));
    
    Optizelle::EqualityConstrained <double,Rm,Rm>::Algorithms::getMin(
        Optizelle::Messaging::stdout,fns,state);

    std::cout << "The algorithm converged due to: " <<
        Optizelle::OptimizationStop::to_string(state.opt_stop) <<
        std::endl;
    std::cout << std::scientific << std::setprecision(16)
        << "The optimal point is: (" << state.x[0] << ','
	<< state.x[1] << ')' << std::endl;
    
    return EXIT_SUCCESS;
}

Essentially, look at the incoming value and see if it’s cached. If so, use it. Otherwise, recompute.

Now, that said, I’ll admit that the design is not good, but there are some historical reasons for why things are this way. Over a decade ago when this code was first written, I was working on some seismic inversion problems. Those problems involve a large number of PDE solves, so the approach at the time was to randomly sum the data together and to compute a single PDE solve. That mostly worked for the client’s use case, but it complicated a normal design for caching computations. Essentially, the objective function kept changing, so it wasn’t safe to cache the results. Now, what we learned over time is that we really don’t want to randomize the function evaluations all of the time, but at some very particular instances. Those lessons are not reflected in this code.

In fact, it turns out to be moderately tricky as to when to cache the objective and constraint derivatives as well as the preconditioner in order to accommodate a variety of useful algorithms and models. There is a good answer for this, but it involves changing the interface for the optimizer.

As to whether or not those changes will be ported to this code, probably not. There are a huge number of lessons that were learned over the last decade in the optimization algorithm itself as well as its implementation. It really needs a new code. I do have one. It works well. I’d love to release it, but the legal side is complicated. There’s a long conversation to be had about intellectual property, patents, copyright, public domain, and publications, which means that a big release is not quite as simple as putting the code on Github. Our prerogative is to release an open source code and it’s getting those above details correct, so that we’re not locked out of our own algorithm, that’s important. Likely that involves a publication of the new algorithm, so if you know a friendly journal, please let me know.

Till then, just cache internally using mutable variables.

Hi Joe,

Thanks for your reply. Unfortunately, caching the last value does not do much for reducing the number of functions. I’ve made the following minor modification to LOG_FUNC_CALLS

#include <optizelle/optizelle.h>
#include <optizelle/vspaces.h>

#include <iostream>
#include <iomanip>
#include <cstdlib>

template<typename V>
const char* print(const char* symbol, V& vec)
{
	std::cout << symbol << " = [" << std::setprecision(16) << vec[0];
	for(int i = 1; i < vec.size(); ++i) std::cout << ", " << std::setprecision(16) << vec[i];
	std::cout << "]";
	return "";
}

#define LOG_FUNC_CALLS
#define PRN_X   "x = [" << x[0] << "," << x[1] << "]"
#define PRN_dX "dx = [" << dx[0] << "," << dx[1] << "]"
#define PRN_dY "dy = [" << dy[0] << "]"

template <typename Real>
Real sq(Real const & x){
    return x*x;
}

struct MyObj
    : public Optizelle::ScalarValuedFunction <double,Optizelle::Rm>
{
    typedef Optizelle::Rm <double> X;

    // Location of the last computed iterate
    mutable std::unique_ptr <X::Vector> x_cache;

    // Cache of the last computed gradient
    mutable std::unique_ptr <X::Vector> grad_cache;

    // Evaluation
    double eval(X::Vector const & x) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyObj::eval(" << print("x", x) << ")" << std::endl;
#endif
        double x0_2 = pow(x[0],2);
        double x0_3 = pow(x[0],3);
        double x1_2 = pow(x[1],2);
        double x1_3 = pow(x[1],3);

        return (10*(5824*pow(x[0],4) - 7696*x0_3 + 11648*x0_2*x1_2 - 7696*x0_2*x[1] + 2822*x0_2 - 7696*x[0]*x1_2 - 6313*x[0]*x[1] + 253*x[0] + 5824*pow(x[1],4) - 7696*x1_3 + 2822*x1_2 + 253*x[1] + 7651))/9;
    }

    // Gradient
    void grad(
        X::Vector const & x,
        X::Vector & grad
    ) const {

        // Use a cached value
        if(x_cache && *x_cache == x) {
            grad = *grad_cache;

        // Compute a new value and cache it
        } else {

            // Allocate the initial memory for the cached value and gradient
            if(!x_cache) {
                x_cache = std::make_unique <X::Vector> (X::init(x));
                grad_cache = std::make_unique <X::Vector> (X::init(x));
            }

            // Compute the gradient
#ifdef LOG_FUNC_CALLS
            std::cout << "MyObj::grad(" << print("x", x) << ")" << std::endl;
#endif
            double x0_2 = pow(x[0],2);
            double x0_3 = pow(x[0],3);
            double x1_2 = pow(x[1],2);
            double x1_3 = pow(x[1],3);

            grad[0] = (10*(23296*x0_3 - 23088*x0_2 + 23296*x[0]*x1_2 - 15392*x[0]*x[1] + 5644*x[0] - 7696*x1_2 - 6313*x[1] + 253))/9;

            grad[1] = (10*(23296*x0_2*x[1] - 7696*x0_2 - 15392*x[0]*x[1] - 6313*x[0] + 23296*x1_3 - 23088*x1_2 + 5644*x[1] + 253))/9;

            // Cache the iterate and gradient
            X::copy(x,*x_cache);
            X::copy(grad,*grad_cache);
        }
    }

    // Hessian-vector product
    void hessvec(
        X::Vector const & x,
        X::Vector const & dx,
        X::Vector & H_dx
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyObj::hessvec(" << print("x", x) << " " << print("dx", dx) << ")" << std::endl;
#endif
        double g11 = 153920*x[0];
        double g9 = 232960*x[0]*x[0] - g11;
        double g12 = 153920*x[1];
        double g10 = 232960*x[1]*x[1] - g12;
        double d2dx2 = (g10 + 3*g9 + 56440)/9;
        double d2dy2 = (3*g10 + g9 + 56440)/9;
        double d2dxy = (465920*x[1]*x[0] - g12 - g11 - 63130)/9;
        
        H_dx[0] = d2dx2*dx[0] + d2dxy*dx[1];
        H_dx[1] = d2dxy*dx[0] + d2dy2*dx[1];
    }
};

// c(x,y)=(x - 1)**2 + y**2 - 0.81
struct MyEq
    :public Optizelle::VectorValuedFunction<double,Optizelle::Rm,Optizelle::Rm>
{
    typedef Optizelle::Rm <double> X;
    typedef Optizelle::Rm <double> Y;

    // Location of the last computed iterate
    mutable std::unique_ptr <X::Vector> x_cache;

    // Cache of the last computed derivative
    mutable std::unique_ptr <X::Vector> gp_cache;
    
    // y=g(x)
    void eval(
        X::Vector const & x,
        Y::Vector & y
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::eval(" << print("x", x) << ")" << std::endl;
#endif
        y[0] = sq(x[0] - 1) + sq(x[1]) - 0.81;
    }

    // g'(x)
    void gp(
        X::Vector const & x
    ) const {
        // Allocate the initial memory for the cached value and derivative
        if(!x_cache) {
            x_cache = std::make_unique <X::Vector> (X::init(x));
            gp_cache = std::make_unique <std::vector <double>>(2);
        }
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::gp(" << print("x", x) << ")" << std::endl;
#endif

        // Compute the derivative
        (*gp_cache)[0] = 2*(x[0] - 1);
        (*gp_cache)[1] = 2*x[1];

        // Save the iterate
        X::copy(x,*x_cache);
    }

    // y=g'(x)dx
    void p(
        X::Vector const & x,
        X::Vector const & dx,
        Y::Vector & y
    ) const {

        // Compute a new value and cache it
        if(!x_cache || *x_cache != x) {
            gp(x);
        }
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::p(" << print("x", x) << " " << print("dx", dx) << ")" << std::endl;
#endif

        // Compute the matrix-vector product
        y[0] = (*gp_cache)[0]*dx[0] + (*gp_cache)[1]*dx[1];
    }

    // xhat=g'(x)*dy
    void ps(
        X::Vector const & x,
        Y::Vector const & dy,
        X::Vector & xhat
    ) const {

        // Compute a new value and cache it
        if(!x_cache || *x_cache != x) {
            gp(x);
        }
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::ps(" << print("x", x) << " " << print("dy", dy) << ")" << std::endl;
#endif

        // Compute the matrix-vector product
        xhat[0] = (*gp_cache)[0] * dy[0];
        xhat[1] = (*gp_cache)[1] * dy[0];
    }

    // xhat=(g''(x)dx)*dy
    void pps(
        X::Vector const & x,
        X::Vector const & dx,
        Y::Vector const & dy,
        X::Vector & xhat
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::pps(" << print("x", x) << " " << print("dx", dx) << " " << print("dy", dy) << ")" << std::endl;
#endif
        xhat[0] = 2.0*dx[0]*dy[0];
        xhat[1] = 2.0*dx[1]*dy[0];
    }
};

// Define a Schur preconditioner for the equality constraints
struct MyPrecon:
    public Optizelle::Operator <double,Optizelle::Rm,Optizelle::Rm>
{
public:
    typedef Optizelle::Rm <double> X;
    typedef X::Vector X_Vector;
    typedef Optizelle::Rm <double> Y;
    typedef Y::Vector Y_Vector;
private:
    X_Vector& x;
public:
    MyPrecon(X::Vector& x_) : x(x_) {}
    void eval(Y_Vector const & dy,Y_Vector & result) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyPrecon::eval(" << print("x", x) << ")" << std::endl;
#endif
        result[0] = dy[0]/(4*sq(x[0] - 1) + 4*sq(x[1]));
    }
};

int main()
{
    // Create a type shortcut
    using Optizelle::Rm;
    
    auto x = std::vector <double> {1.0, 1.0};
    
    // Allocate memory for dual
    auto y = std::vector <double> (1);
    
    Optizelle::EqualityConstrained <double,Rm,Rm>::State::t state(x,y);
    state.algorithm_class = Optizelle::AlgorithmClass::LineSearch;
    
    Optizelle::EqualityConstrained <double,Rm,Rm>::Functions::t fns;
    fns.f.reset(new MyObj);
    fns.g.reset(new MyEq);
    fns.PSchur_left.reset(new MyPrecon(state.x));
    
    Optizelle::EqualityConstrained <double,Rm,Rm>::Algorithms::getMin(
        Optizelle::Messaging::stdout,fns,state);

    std::cout << "The algorithm converged due to: " <<
        Optizelle::OptimizationStop::to_string(state.opt_stop) <<
        std::endl;
    std::cout << std::scientific << std::setprecision(16)
        << "The optimal point is: (" << state.x[0] << ','
	<< state.x[1] << ')' << std::endl;
    
    return EXIT_SUCCESS;
}

Counts
Number of lines in output: 470. (Method wc -l output.txt)
Number of unique lines in output: 218 (Method sort -u output.txt | wc -l)

Basically, more than half of the computations are duplicates. And that’s besides the dynamic memory allocations. In comparison I suspect that the version of FORTRAN that SLSQP was written in had/has no dynamic memory allocation. Based on my experiment with the Vector space I can now assign unique integer identifiers to unique iterations of x and that could be used to cache all computations in combination with copy on write (COW) and an arena allocator. What do you think?

Update
I’ve been trawling the source and I see some patterns of memory allocations that can be optimized away. They all revolve around X::init() e.g. X::init() followed by X::copy() or X::init() followed by X::zero().

Your assessment is correct and you’re running up against what turned out to be a poor design decision for a historical reason. Originally, the code was designed to work with what are called reduced space formulations for PDE constrained optimization problems. Despite the name PDE constrained, these formulations are unconstrained since the PDE solve is conducted implicitly inside of the objective function. There are a fair amount of oddities related to these formulations, but the big one is that the number of variables is very large and would often be somewhere on the order of hundreds of millions. As a result, there was a great amount of sensitivity in understanding exactly what memory was allocated where. In fact, the really early versions had essentially no memory allocation other than what was originally allocated. I think at some point, I also coded some kind of vector pool as to not reallocate new vectors.

In retrospect, this was all very silly. Memory is not that valuable on the vector level when compared to the truly horrific computations being done to solve the PDE. As such, allocators were put into the code, which is how we have init. Now, part of the problem here was that the code I was hooked to at the time had soft copies of all of the vectors implemented internally. As such, it was really important to have a hard init function to ensure we didn’t accidentally get a soft copy to some vector.

Anyway, due to some exotic processing of the solution during the course of the optimization, I added this concept of a StateManipulator that basically allowed an arbitrary function to be applied to the state. This can break an optimization solve pretty badly when done poorly, but it allowed a lot of odd tricks, which we thought necessary at the time. When the code was later broadened to allow for inequalities and equalities, these state manipulators were hijacked to insert the necessary code. On one hand, that allowed a rapid progress in the development of the code. On the other, that’s a terrible way to design an optimization solver because it’s not a easily debugable design.

Now, this affects your current issue because, clearly, the caching is poor. Some of those issues are related to a less than optimally implemented GMRES algorithm that’s applying the linear operator to 0. Your code can be made robust to this with the following:

#include <optizelle/optizelle.h>
#include <optizelle/vspaces.h>

#include <iostream>
#include <iomanip>
#include <cstdlib>

template<typename V>
const char* print(const char* symbol, V& vec)
{
	std::cout << symbol << " = [" << std::setprecision(16) << vec[0];
	for(int i = 1; i < vec.size(); ++i) std::cout << ", " << std::setprecision(16) << vec[i];
	std::cout << "]";
	return "";
}

#define LOG_FUNC_CALLS
#define PRN_X   "x = [" << x[0] << "," << x[1] << "]"
#define PRN_dX "dx = [" << dx[0] << "," << dx[1] << "]"
#define PRN_dY "dy = [" << dy[0] << "]"

template <typename Real>
Real sq(Real const & x){
    return x*x;
}

struct MyObj
    : public Optizelle::ScalarValuedFunction <double,Optizelle::Rm>
{
    typedef Optizelle::Rm <double> X;

    // Location of the last computed iterate
    mutable std::unique_ptr <X::Vector> x_cache;

    // Cache of the last computed gradient
    mutable std::unique_ptr <X::Vector> grad_cache;

    // Evaluation
    double eval(X::Vector const & x) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyObj::eval(" << print("x", x) << ")" << std::endl;
#endif
        double x0_2 = pow(x[0],2);
        double x0_3 = pow(x[0],3);
        double x1_2 = pow(x[1],2);
        double x1_3 = pow(x[1],3);

        return (10*(5824*pow(x[0],4) - 7696*x0_3 + 11648*x0_2*x1_2 - 7696*x0_2*x[1] + 2822*x0_2 - 7696*x[0]*x1_2 - 6313*x[0]*x[1] + 253*x[0] + 5824*pow(x[1],4) - 7696*x1_3 + 2822*x1_2 + 253*x[1] + 7651))/9;
    }

    // Gradient
    void grad(
        X::Vector const & x,
        X::Vector & grad
    ) const {

        // Use a cached value
        if(x_cache && *x_cache == x) {
            grad = *grad_cache;

        // Compute a new value and cache it
        } else {

            // Allocate the initial memory for the cached value and gradient
            if(!x_cache) {
                x_cache = std::make_unique <X::Vector> (X::init(x));
                grad_cache = std::make_unique <X::Vector> (X::init(x));
            }

            // Compute the gradient
#ifdef LOG_FUNC_CALLS
            std::cout << "MyObj::grad(" << print("x", x) << ")" << std::endl;
#endif
            double x0_2 = pow(x[0],2);
            double x0_3 = pow(x[0],3);
            double x1_2 = pow(x[1],2);
            double x1_3 = pow(x[1],3);

            grad[0] = (10*(23296*x0_3 - 23088*x0_2 + 23296*x[0]*x1_2 - 15392*x[0]*x[1] + 5644*x[0] - 7696*x1_2 - 6313*x[1] + 253))/9;

            grad[1] = (10*(23296*x0_2*x[1] - 7696*x0_2 - 15392*x[0]*x[1] - 6313*x[0] + 23296*x1_3 - 23088*x1_2 + 5644*x[1] + 253))/9;

            // Cache the iterate and gradient
            X::copy(x,*x_cache);
            X::copy(grad,*grad_cache);
        }
    }

    // Hessian-vector product
    void hessvec(
        X::Vector const & x,
        X::Vector const & dx,
        X::Vector & H_dx
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyObj::hessvec(" << print("x", x) << " " << print("dx", dx) << ")" << std::endl;
#endif
        double g11 = 153920*x[0];
        double g9 = 232960*x[0]*x[0] - g11;
        double g12 = 153920*x[1];
        double g10 = 232960*x[1]*x[1] - g12;
        double d2dx2 = (g10 + 3*g9 + 56440)/9;
        double d2dy2 = (3*g10 + g9 + 56440)/9;
        double d2dxy = (465920*x[1]*x[0] - g12 - g11 - 63130)/9;
        
        H_dx[0] = d2dx2*dx[0] + d2dxy*dx[1];
        H_dx[1] = d2dxy*dx[0] + d2dy2*dx[1];
    }
};

// c(x,y)=(x - 1)**2 + y**2 - 0.81
struct MyEq
    :public Optizelle::VectorValuedFunction<double,Optizelle::Rm,Optizelle::Rm>
{
    typedef Optizelle::Rm <double> X;
    typedef Optizelle::Rm <double> Y;

    // Location of the last computed iterate
    mutable std::unique_ptr <X::Vector> x_cache;

    // Cache of the last computed derivative
    mutable std::unique_ptr <X::Vector> gp_cache;
    
    // y=g(x)
    void eval(
        X::Vector const & x,
        Y::Vector & y
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::eval(" << print("x", x) << ")" << std::endl;
#endif
        y[0] = sq(x[0] - 1) + sq(x[1]) - 0.81;
    }

    // g'(x)
    void gp(
        X::Vector const & x
    ) const {
        // Allocate the initial memory for the cached value and derivative
        if(!x_cache) {
            x_cache = std::make_unique <X::Vector> (X::init(x));
            gp_cache = std::make_unique <std::vector <double>>(2);
        }
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::gp(" << print("x", x) << ")" << std::endl;
#endif

        // Compute the derivative
        (*gp_cache)[0] = 2*(x[0] - 1);
        (*gp_cache)[1] = 2*x[1];

        // Save the iterate
        X::copy(x,*x_cache);
    }

    // y=g'(x)dx
    void p(
        X::Vector const & x,
        X::Vector const & dx,
        Y::Vector & y
    ) const {

        // Return immediately if given a 0 direction
        if(std::all_of(dx.begin(),dx.end(),[](double const & dxi){
            return dxi==0;})
        ){
            std::fill(y.begin(),y.end(),0.);
            return;
        }

        // Compute a new value and cache it
        if(!x_cache || *x_cache != x) {
            gp(x);
        }
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::p(" << print("x", x) << " " << print("dx", dx) << ")" << std::endl;
#endif

        // Compute the matrix-vector product
        y[0] = (*gp_cache)[0]*dx[0] + (*gp_cache)[1]*dx[1];
    }

    // xhat=g'(x)*dy
    void ps(
        X::Vector const & x,
        Y::Vector const & dy,
        X::Vector & xhat
    ) const {

        // Return immediately if given a 0 direction
        if(std::all_of(dy.begin(),dy.end(),[](double const & dyi){
            return dyi==0;})
        ){
            std::fill(xhat.begin(),xhat.end(),0.);
            return;
        }

        // Compute a new value and cache it
        if(!x_cache || *x_cache != x) {
            gp(x);
        }
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::ps(" << print("x", x) << " " << print("dy", dy) << ")" << std::endl;
#endif

        // Compute the matrix-vector product
        xhat[0] = (*gp_cache)[0] * dy[0];
        xhat[1] = (*gp_cache)[1] * dy[0];
    }

    // xhat=(g''(x)dx)*dy
    void pps(
        X::Vector const & x,
        X::Vector const & dx,
        Y::Vector const & dy,
        X::Vector & xhat
    ) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyEq::pps(" << print("x", x) << " " << print("dx", dx) << " " << print("dy", dy) << ")" << std::endl;
#endif
        xhat[0] = 2.0*dx[0]*dy[0];
        xhat[1] = 2.0*dx[1]*dy[0];
    }
};

// Define a Schur preconditioner for the equality constraints
struct MyPrecon:
    public Optizelle::Operator <double,Optizelle::Rm,Optizelle::Rm>
{
public:
    typedef Optizelle::Rm <double> X;
    typedef X::Vector X_Vector;
    typedef Optizelle::Rm <double> Y;
    typedef Y::Vector Y_Vector;
private:
    X_Vector& x;
public:
    MyPrecon(X::Vector& x_) : x(x_) {}
    void eval(Y_Vector const & dy,Y_Vector & result) const {
#ifdef LOG_FUNC_CALLS
        std::cout << "MyPrecon::eval(" << print("x", x) << ")" << std::endl;
#endif
        result[0] = dy[0]/(4*sq(x[0] - 1) + 4*sq(x[1]));
    }
};

int main()
{
    // Create a type shortcut
    using Optizelle::Rm;
    
    auto x = std::vector <double> {1.0, 1.0};
    
    // Allocate memory for dual
    auto y = std::vector <double> (1);
    
    Optizelle::EqualityConstrained <double,Rm,Rm>::State::t state(x,y);
    state.algorithm_class = Optizelle::AlgorithmClass::LineSearch;
    
    Optizelle::EqualityConstrained <double,Rm,Rm>::Functions::t fns;
    fns.f.reset(new MyObj);
    fns.g.reset(new MyEq);
    fns.PSchur_left.reset(new MyPrecon(state.x));
    
    Optizelle::EqualityConstrained <double,Rm,Rm>::Algorithms::getMin(
        Optizelle::Messaging::stdout,fns,state);

    std::cout << "The algorithm converged due to: " <<
        Optizelle::OptimizationStop::to_string(state.opt_stop) <<
        std::endl;
    std::cout << std::scientific << std::setprecision(16)
        << "The optimal point is: (" << state.x[0] << ','
	<< state.x[1] << ')' << std::endl;
    
    return EXIT_SUCCESS;
}

More than that, requires a deeper modification of the code, which is not clear to me is worth the labor. For example, I noticed a redundant computation at line 8103 and 9422 in optizelle.h. The first relates to computing how well we solved the linear feasibility with the Cauchy step. The second relates to caching linearized feasibility with respect to the trust-region globalization. Could this be cached? Sure. With the current design, it’s not particularly easy to do so.

Broadly speaking, it’s also a question of how much we should care. That really depends on the application. There are absolutely some applications where this is critical. I’ll contend, though, most of the time it doesn’t really affect the overall run time. Generally speaking, the derivative calculations should be far more expensive. In the case of equality constraints, g, for a problem with a sizable number of constraints, the computation of the derivative g'(x) should dominate the cost. Now, even though this is a matrix-free code, realistically, we need a matrix to cache g'(x) because we need a preconditioner for g'(x)g'(x)* where * here means adjoint. That factorization is likely vastly more costly than computing a matrix-vector product. And, recall, computing g'(x) itself is likely vastly more costly that computing a matrix-vector product.

Now, does that mean we should be sloppy? Of course not. There are a variety of issues related to redundant computation in the current code that have been removed in our new code, which is not released. Part of what made this possible was a fresh rewrite based on what we learned from the last code.

As to whether or not it would be worthwhile to just cache all of these combinations with integer identifiers, I’ll still contend that in the general case probably not. The derivative and preconditioner computations should really be far more expensive once we get to more than a handful of variables. Though, certainly, there are cases were it could pay off. Generally speaking, I think there’s better value in a clean rewrite, which is what we ultimately pursued internally.