I have to minimise a function where the gradient can be computed almost for free along the function itself, but I don’t know how to take advantage of it. If a call to the gradient is always preceded by a function call in the same point, I can just store it in the class; is this behaviour guaranteed?
Otherwise, I could implement a memoize pattern and keep the last
n gradients, but that would obviously be more work.
Also, I would like to compute the gradient in place. Is this a problem? Does any of the algorithms require the knowlege of the gradient in more than one point at the same time?
I am doing convex optimisations in tens of thousands of dimensions, so I am trying to keep the allocations to a minimum.