Hi,

I have to minimise a function where the gradient can be computed almost for free along the function itself, but I don’t know how to take advantage of it. If a call to the gradient is always preceded by a function call in the same point, I can just store it in the class; is this behaviour guaranteed?

Otherwise, I could implement a memoize pattern and keep the last `n`

gradients, but that would obviously be more work.

Also, I would like to compute the gradient in place. Is this a problem? Does any of the algorithms require the knowlege of the gradient in more than one point at the same time?

I am doing convex optimisations in tens of thousands of dimensions, so I am trying to keep the allocations to a minimum.