From 29c1ca8200fce3c0d535dcfb6a0309639b00fa86 Mon Sep 17 00:00:00 2001
From: Vaibhav Dixit <vaibhavyashdixit@gmail.com>
Date: Thu, 6 Apr 2023 15:17:25 +0530
Subject: [PATCH 1/9] Update `OptimizationFunction`, `OptimizationProblem` and
 `solve` docstrings

---
 src/problems/basic_problems.jl | 62 +++++++++++------------
 src/scimlfunctions.jl          | 90 +++++++++++++---------------------
 src/solve.jl                   | 35 ++++++++-----
 3 files changed, 88 insertions(+), 99 deletions(-)

diff --git a/src/problems/basic_problems.jl b/src/problems/basic_problems.jl
index 5ab9d4cd6..83802c839 100644
--- a/src/problems/basic_problems.jl
+++ b/src/problems/basic_problems.jl
@@ -547,25 +547,25 @@ Documentation Page: https://docs.sciml.ai/Optimization/stable/API/optimization_p
 
 ## Mathematical Specification of an Optimization Problem
 
-To define an Optimization Problem, you simply need to give the function ``f``
-which defines the cost function to minimize:
+To define an optimization problem, you need the objective function ``f``
+which is minimized over the domain of ``u``, the collection of optimization variables:
 
 ```math
 min_u f(u,p)
 ```
 
-``u₀`` is an initial guess of the minimum. `f` should be specified as `f(u,p)`
-and `u₀` should be an AbstractArray (or number) whose geometry matches the
-desired geometry of `u`. Note that we are not limited to numbers or vectors
+Say ``u₀`` is an initial guess for the minimizer. `f` should be specified as `f(u,p)`
+and `u₀` should be an `AbstractArray` whose geometry matches the
+desired geometry of `u`. Note that we are not limited to vectors
 for `u₀`; one is allowed to provide `u₀` as arbitrary matrices /
-higher-dimension tensors as well.
+higher-dimension tensors as well if the optimization solver supports it.
 
 ## Problem Type
 
 ### Constructors
 
 ```julia
-OptimizationProblem{iip}(f, u0, p = SciMLBase.NullParameters(),;
+OptimizationProblem{iip}(f, x, p = SciMLBase.NullParameters(),;
                         lb = nothing,
                         ub = nothing,
                         lcons = nothing,
@@ -574,27 +574,27 @@ OptimizationProblem{iip}(f, u0, p = SciMLBase.NullParameters(),;
                         kwargs...)
 ```
 
-`isinplace` optionally sets whether the function is in-place or not. This is
-determined automatically, but not inferred. Note that for OptimizationProblem,
-in-place only refers to the Jacobian and Hessian functions, and thus by default
-if the `OptimizationFunction` is not defined directly then `iip = true` is
-done by default.
+`isinplace` optionally sets whether the function is in-place or not.
+This is determined automatically, but not inferred. Note that for OptimizationProblem,
+in-place refers to the objective's derivative functions, the constraint function
+and its derivatives. `OptimizationProblem` currently only supports in-place.
 
-Parameters are optional, and if not given, then a `NullParameters()` singleton
+Parameters `p` are optional, and if not given, then a `NullParameters()` singleton
 will be used, which will throw nice errors if you try to index non-existent
-parameters. Any extra keyword arguments are passed on to the solvers. For example,
-if you set a `callback` in the problem, then that `callback` will be added in
-every solve call.
+parameters.
 
 `lb` and `ub` are the upper and lower bounds for box constraints on the
-optimization. They should be an `AbstractArray` matching the geometry of `u`,
-where `(lb[I],ub[I])` is the box constraint (lower and upper bounds)
-for `u[I]`.
+optimization variables. They should be an `AbstractArray` matching the geometry of `u`,
+where `(lb[i],ub[i])` is the box constraint (lower and upper bounds) for `u[i]`.
+
+`lcons` and `ucons` are the upper and lower bounds incase of inequality constraints on the
+optimization and if they are set to be equal then it represents an equality constraint.
+They should be an `AbstractArray` matching the geometry of `u`, where `(lcons[i],ucons[i])`
+are the lower and upper bounds for `cons[i]`.
 
-`lcons` and `ucons` are the upper and lower bounds for equality constraints on the
-optimization. They should be an `AbstractArray` matching the geometry of `u`,
-where `(lcons[I],ucons[I])` is the constraint (lower and upper bounds)
-for `cons[I]`.
+The `f` in the `OptimizationProblem` should typically be an instance of [`OptimizationFunction`](@ref)
+to specify the objective function and its derivatives either by passing 
+predefined functions for them or automatically generated using the [`ADType`](@ref). 
 
 If `f` is a standard Julia function, it is automatically transformed into an
 `OptimizationFunction` with `NoAD()`, meaning the derivative functions are not
@@ -605,21 +605,21 @@ Any extra keyword arguments are captured to be sent to the optimizers.
 ### Fields
 
 * `f`: the function in the problem.
-* `u0`: the initial guess for the optima.
-* `p`: the parameters for the problem. Defaults to `NullParameters`.
-* `lb`: the lower bounds for the optimization of `u`.
-* `ub`: the upper bounds for the optimization of `u`.
-* `int`: integrality indicator for `u`.
-* `lcons`: the vector of lower bounds for the constraints passed to [`OptimizationFunction`](@ref).
+* `u0`: the initial guess for the optimization variables.
+* `p`: the constant parameters used for defining the problem. Defaults to `NullParameters`.
+* `lb`: the lower bounds for the optimization variables `u`.
+* `ub`: the upper bounds for the optimization variables `u`.
+* `int`: integrality indicator for `u`. If `int[i] == 1`, then `u[i]` is an integer variable.
+    Defaults to `nothing`, implying no integrality constraints.
+* `lcons`: the vector of lower bounds for the constraints passed to [OptimizationFunction](@ref).
     Defaults to `nothing`, implying no lower bounds for the constraints (i.e. the constraint bound is `-Inf`)
 * `ucons`: the vector of upper bounds for the constraints passed to [`OptimizationFunction`](@ref).
     Defaults to `nothing`, implying no upper bounds for the constraints (i.e. the constraint bound is `Inf`)
 * `sense`: the objective sense, can take `MaxSense` or `MinSense` from Optimization.jl.
-* `kwargs`: the keyword arguments passed on to the solvers.
 
 ## Inequality and Equality Constraints
 
-Both inequality and equality constraints are defined by the `f.cons` function in the `OptimizationFunction`
+Both inequality and equality constraints are defined by the `f.cons` function in the [`OptimizationFunction`](@ref)
 description of the problem structure. This `f.cons` is given as a function `f.cons(u,p)` which computes
 the value of the constraints at `u`. For example, take `f.cons(u,p) = u[1] - u[2]`.
 With these definitions, `lcons` and `ucons` define the bounds on the constraint that the solvers try to satisfy.
diff --git a/src/scimlfunctions.jl b/src/scimlfunctions.jl
index 391859e20..dfb9d4514 100644
--- a/src/scimlfunctions.jl
+++ b/src/scimlfunctions.jl
@@ -1997,9 +1997,9 @@ end
 TruncatedStacktraces.@truncate_stacktrace IntervalNonlinearFunction 1 2
 
 """
-    OptimizationFunction{iip,AD,F,G,H,HV,C,CJ,CH,HP,CJP,CHP,S,S2,HCV,CJCV,CHCV} <: AbstractOptimizationFunction{iip,specialize}
+    OptimizationFunction{iip, AD, F, G, H, HV, C, CJ, CH, HP, CJP, CHP, S, S2, EX, CEX, SYS} <: AbstractOptimizationFunction{iip,specialize}
 
-A representation of an optimization of an objective function `f`, defined by:
+A representation of an objective function `f`, defined by:
 
 ```math
 \\min_{u} f(u,p)
@@ -2015,47 +2015,43 @@ OptimizationFunction{iip}(f, adtype::AbstractADType = NoAD();
                           grad = nothing, hess = nothing, hv = nothing,
                           cons = nothing, cons_j = nothing, cons_h = nothing,
                           lag_h = nothing,
-                          hess_prototype = nothing, cons_jac_prototype = __has_jac_prototype(f) ? f.jac_prototype : nothing,
+                          hess_prototype = nothing,
+                          cons_jac_prototype = nothing,
                           cons_hess_prototype = nothing,
-                          lag_hess_prototype = nothing,
                           syms = __has_syms(f) ? f.syms : nothing,
-                          paramsyms = __has_paramsyms(f) ? f.paramsyms : nothing,
-                          observed = __has_observed(f) ? f.observed : DEFAULT_OBSERVED_NO_TIME,
-                          hess_colorvec = __has_colorvec(f) ? f.colorvec : nothing,
-                          cons_jac_colorvec = __has_colorvec(f) ? f.colorvec : nothing,
-                          cons_hess_colorvec = __has_colorvec(f) ? f.colorvec : nothing,
-                          lag_hess_colorvec = nothing,
-                          sys = __has_sys(f) ? f.sys : nothing)
+                          paramsyms = __has_paramsyms(f) ? f.paramsyms : nothing)
 ```
 
 ## Positional Arguments
 
-- `f(u,p)`: the function to optimize. `u` are the state variables and `p` are the hyperparameters of the optimization.
-  This function should return a scalar.
-- `adtype`: see the section "Defining Optimization Functions via AD"
+- `f(u,p,args...)`: the function to optimize. `u` are the optimization variables and `p` are parameters used in definition of
+the objective, even if no such parameters are used in the objective it should be an argument in the function. This can also take
+any additonal arguments that are relevant to the objective function, for example minibatches used in machine learning,
+take a look at the minibatching tutorial [here](https://docs.sciml.ai/Optimization/stable/tutorials/minibatch/). This should return
+a scalar, the loss value, as the first return output and if any additional outputs are returned, they will be passed to the `callback`
+function described in [Callback Functions](@ref).
+- `adtype`: see the section [Defining Optimization Functions via AD](@ref)
 
 ## Keyword Arguments
 
-- `grad(G,u,p)` or `G=grad(u,p)`: the gradient of `f` with respect to `u`
-- `hess(H,u,p)` or `H=hess(u,p)`: the Hessian of `f` with respect to `u`
-- `hv(Hv,u,v,p)` or `Hv=hv(u,v,p)`: the Hessian-vector product ``(d^2 f / du^2) v``.
-- `cons(res,x,p)` or `cons(x,p)` : the constraints function, should mutate or return a vector
+- `grad(G,u,p)`: the gradient of `f` with respect to `u`. If `f` takes additional arguments
+    then `grad(G,u,p,args...)` should be used.
+- `hess(H,u,p)`: the Hessian of `f` with respect to `u`. If `f` takes additional arguments
+    then `hess(H,u,p,args...)` should be used.
+- `hv(Hv,u,v,p)`: the Hessian-vector product ``(d^2 f / du^2) v``. If `f` takes additional arguments
+    then `hv(Hv,u,v,p,args...)` should be used.
+- `cons(res,x,p)`: the constraints function, should mutate the passed `res` array
     with value of the `i`th constraint, evaluated at the current values of variables
     inside the optimization routine. This takes just the function evaluations
     and the equality or inequality assertion is applied by the solver based on the constraint
     bounds passed as `lcons` and `ucons` to [`OptimizationProblem`](@ref), in case of equality
     constraints `lcons` and `ucons` should be passed equal values.
-- `cons_j(res,x,p)` or `res=cons_j(x,p)`: the Jacobian of the constraints.
-- `cons_h(res,x,p)` or `res=cons_h(x,p)`: the Hessian of the constraints, provided as
-   an array of Hessians, with `res[i]` being the Hessian with respect to the `i`th output on `cons`.
-- `lag_h(res,x,sigma,mu,p)` or `res=lag_h(x,sigma,mu,p)`: the Hessian of the Lagrangian,
-    where `sigma` is a multiplier of the cost function and `mu` are the Lagrange multipliers
-    multiplying the constraints. This can be provided instead of `hess` and `cons_h`
-    to solvers that directly use the Hessian of the Lagrangian.
-- `paramjac(pJ,u,p)`: returns the parameter Jacobian ``df/dp``.
+- `cons_j(res,x,p)`: the Jacobian of the constraints.
+- `cons_h(res,x,p)`: the Hessian of the constraints, provided as
+   an array of Hessians with `res[i]` being the Hessian with respect to the `i`th output on `cons`.
 - `hess_prototype`: a prototype matrix matching the type that matches the Hessian. For example,
   if the Hessian is tridiagonal, then an appropriately sized `Hessian` matrix can be used
-  as the prototype and integrators will specialize on this structure where possible. Non-structured
+  as the prototype and optimization solvers will specialize on this structure where possible. Non-structured
   sparsity patterns should use a `SparseMatrixCSC` with a correct sparsity pattern for the Hessian.
   The default is `nothing`, which means a dense Hessian.
 - `cons_jac_prototype`: a prototype matrix matching the type that matches the constraint Jacobian.
@@ -2070,38 +2066,30 @@ OptimizationFunction{iip}(f, adtype::AbstractADType = NoAD();
 - `paramsyms`: the symbol names for the parameters of the equation. This should match `p` in
   size. For example, if `p = [0.0, 1.0]` and `paramsyms = [:a, :b]`, this will apply a canonical
   naming to the values, allowing `sol[:a]` in the solution.
-- `hess_colorvec`: a color vector according to the SparseDiffTools.jl definition for the sparsity
-  pattern of the `hess_prototype`. This specializes the Hessian construction when using
-  finite differences and automatic differentiation to be computed in an accelerated manner
-  based on the sparsity pattern. Defaults to `nothing`, which means a color vector will be
-  internally computed on demand when required. The cost of this operation is highly dependent
-  on the sparsity pattern.
-- `cons_jac_colorvec`: a color vector according to the SparseDiffTools.jl definition for the sparsity
-  pattern of the `cons_jac_prototype`.
-- `cons_hess_colorvec`: an array of color vector according to the SparseDiffTools.jl definition for
-  the sparsity pattern of the `cons_hess_prototype`.
 
-## Defining Optimization Functions Via AD
+## Defining Optimization Functions via AD
 
 While using the keyword arguments gives the user control over defining
-all the possible functions, the simplest way to handle the generation
-of an `OptimizationFunction` is by specifying an AD type. By doing so,
-this will automatically fill in all the extra functions. For example,
+all of the possible functions, the simplest way to handle the generation
+of an `OptimizationFunction` is by specifying the `ADtype` which lets the user choose the
+Automatic Differentiation backend to use for automatically filling in all of the extra functions.
+For example,
 
 ```julia
-OptimizationFunction(f,AutoZygote())
+OptimizationFunction(f,AutoForwardDiff())
 ```
 
-will use [Zygote.jl](https://docs.sciml.ai/Zygote.jl/stable/) to define
+will use [ForwardDiff.jl](https://github.com/JuliaDiff/ForwardDiff.jl) to define
 all of the necessary functions. Note that if any functions are defined
 directly, the auto-AD definition does not overwrite the user's choice.
 
 Each of the AD-based constructors are documented separately via their
-own dispatches.
+own dispatches below in the [Automatic Differentiation Construction Choice Recommendations](@ref) section.
 
 ## iip: In-Place vs Out-Of-Place
 
 For more details on this argument, see the ODEFunction documentation.
+Note that currently `OptimizationFunction` only supports in-place.
 
 ## specialize: Controlling Compilation and Specialization
 
@@ -2111,10 +2099,8 @@ For more details on this argument, see the ODEFunction documentation.
 
 The fields of the OptimizationFunction type directly match the names of the inputs.
 """
-struct OptimizationFunction{iip, AD, F, G, H, HV, C, CJ, CH, LH, HP, CJP, CHP, LHP, S, S2,
-    O, HCV,
-    CJCV,
-    CHCV, LHCV, EX, CEX, SYS} <: AbstractOptimizationFunction{iip}
+struct OptimizationFunction{iip, AD, F, G, H, HV, C, CJ, CH, HP, CJP, CHP, S, S2,
+                            EX, CEX} <: AbstractOptimizationFunction{iip}
     f::F
     adtype::AD
     grad::G
@@ -2123,21 +2109,13 @@ struct OptimizationFunction{iip, AD, F, G, H, HV, C, CJ, CH, LH, HP, CJP, CHP, L
     cons::C
     cons_j::CJ
     cons_h::CH
-    lag_h::LH
     hess_prototype::HP
     cons_jac_prototype::CJP
     cons_hess_prototype::CHP
-    lag_hess_prototype::LHP
     syms::S
     paramsyms::S2
-    observed::O
-    hess_colorvec::HCV
-    cons_jac_colorvec::CJCV
-    cons_hess_colorvec::CHCV
-    lag_hess_colorvec::LHCV
     expr::EX
     cons_expr::CEX
-    sys::SYS
 end
 
 TruncatedStacktraces.@truncate_stacktrace OptimizationFunction 1 2
diff --git a/src/solve.jl b/src/solve.jl
index eb8de1dbe..35c833901 100644
--- a/src/solve.jl
+++ b/src/solve.jl
@@ -51,29 +51,40 @@ The callback function `callback` is a function which is called after every optim
 step. Its signature is:
 
 ```julia
-callback = (params, loss_val, other_args) -> false
+callback = (u, loss_val, other_args) -> false
 ```
 
-where `params` and `loss_val` are the current parameters and loss/objective value 
-in the optimization loop and `other_args` are the extra return arguments of 
-the optimization `f`. This allows for saving values from the optimization and 
-using them for plotting and display without recalculating. The callback should 
+where `u` and `loss_val` are the current optimization variables and loss/objective value
+in the optimization loop and `other_args` can be the extra things returned from
+the optimization `f`. This allows for saving values from the optimization and
+using them for plotting and display without recalculating. The callback should
 return a Boolean value, and the default should be `false`, such that the optimization
 gets stopped if it returns `true`.
 
 ### Callback Example
 
+Here we show an example a callback function that plots the prediction at the current value of the optimization variables.
+The loss function here returns the loss and the prediction i.e. the solution of the `ODEProblem` `prob`, so we can use the prediction in the callback.
+
 ```julia
-function loss(p)
-    # Some calculations
-    lossval,x,y,z
+function predict(u)
+    Array(solve(prob, Tsit5(), p = u))
 end
 
-function callback(p,lossval,x,y,z)
-    # Do some analysis
+function loss(u, p)
+    pred = predict(u)
+    sum(abs2, batch .- pred), pred
+end
 
-    # When lossval < 0.01, stop the optimization
-    lossval < 0.01
+callback = function (p, l, pred; doplot = false) #callback function to observe training
+    display(l)
+    # plot current prediction against data
+    if doplot
+        pl = scatter(t, ode_data[1, :], label = "data")
+        scatter!(pl, t, pred[1, :], label = "prediction")
+        display(plot(pl))
+    end
+    return false
 end
 ```
 """

From 2d3603e62edc3e657026a860c0195e0f729154a9 Mon Sep 17 00:00:00 2001
From: Vaibhav Dixit <vaibhavyashdixit@gmail.com>
Date: Sat, 8 Apr 2023 20:13:03 +0530
Subject: [PATCH 2/9] Add back observed and sys

---
 src/scimlfunctions.jl | 81 ++++++++++++++++++++-----------------------
 1 file changed, 37 insertions(+), 44 deletions(-)

diff --git a/src/scimlfunctions.jl b/src/scimlfunctions.jl
index dfb9d4514..6092fc0d7 100644
--- a/src/scimlfunctions.jl
+++ b/src/scimlfunctions.jl
@@ -1997,7 +1997,7 @@ end
 TruncatedStacktraces.@truncate_stacktrace IntervalNonlinearFunction 1 2
 
 """
-    OptimizationFunction{iip, AD, F, G, H, HV, C, CJ, CH, HP, CJP, CHP, S, S2, EX, CEX, SYS} <: AbstractOptimizationFunction{iip,specialize}
+    OptimizationFunction{iip, AD, F, G, H, HV, C, CJ, CH, HP, CJP, CHP, S, S2, O, EX, CEX, SYS} <: AbstractOptimizationFunction{iip,specialize}
 
 A representation of an objective function `f`, defined by:
 
@@ -2014,12 +2014,12 @@ and more. For all cases, `u` is the state and `p` are the parameters.
 OptimizationFunction{iip}(f, adtype::AbstractADType = NoAD();
                           grad = nothing, hess = nothing, hv = nothing,
                           cons = nothing, cons_j = nothing, cons_h = nothing,
-                          lag_h = nothing,
                           hess_prototype = nothing,
                           cons_jac_prototype = nothing,
                           cons_hess_prototype = nothing,
                           syms = __has_syms(f) ? f.syms : nothing,
-                          paramsyms = __has_paramsyms(f) ? f.paramsyms : nothing)
+                          paramsyms = __has_paramsyms(f) ? f.paramsyms : nothing,
+                          observed = __has_observed(f) ? f.observed : DEFAULT_OBSERVED_NO_TIME)
 ```
 
 ## Positional Arguments
@@ -2060,12 +2060,18 @@ function described in [Callback Functions](@ref).
   This is defined as an array of matrices, where `hess[i]` is the Hessian w.r.t. the `i`th output.
   For example, if the Hessian is sparse, then `hess` is a `Vector{SparseMatrixCSC}`.
   The default is `nothing`, which means a dense constraint Hessian.
+
+When [Symbolic Problem Building with ModelingToolkit](@ref) interface is used the following arguments are also relevant:
+
 - `syms`: the symbol names for the elements of the equation. This should match `u0` in size. For
   example, if `u = [0.0,1.0]` and `syms = [:x, :y]`, this will apply a canonical naming to the
   values, allowing `sol[:x]` in the solution and automatically naming values in plots.
 - `paramsyms`: the symbol names for the parameters of the equation. This should match `p` in
   size. For example, if `p = [0.0, 1.0]` and `paramsyms = [:a, :b]`, this will apply a canonical
   naming to the values, allowing `sol[:a]` in the solution.
+- `observed`: an algebraic combination of optimization variables that is of interest to the user
+    which will be available in the solution. This can be single or multiple expressions.
+- `sys`: field that stores the `OptimizationSystem`.
 
 ## Defining Optimization Functions via AD
 
@@ -2099,8 +2105,8 @@ For more details on this argument, see the ODEFunction documentation.
 
 The fields of the OptimizationFunction type directly match the names of the inputs.
 """
-struct OptimizationFunction{iip, AD, F, G, H, HV, C, CJ, CH, HP, CJP, CHP, S, S2,
-                            EX, CEX} <: AbstractOptimizationFunction{iip}
+struct OptimizationFunction{iip, AD, F, G, H, HV, C, CJ, CH, HP, CJP, CHP, S, S2, O,
+                            EX, CEX, SYS} <: AbstractOptimizationFunction{iip}
     f::F
     adtype::AD
     grad::G
@@ -2114,8 +2120,10 @@ struct OptimizationFunction{iip, AD, F, G, H, HV, C, CJ, CH, HP, CJP, CHP, S, S2
     cons_hess_prototype::CHP
     syms::S
     paramsyms::S2
+    observed::O
     expr::EX
     cons_expr::CEX
+    sys::SYS
 end
 
 TruncatedStacktraces.@truncate_stacktrace OptimizationFunction 1 2
@@ -3900,45 +3908,30 @@ OptimizationFunction(args...; kwargs...) = OptimizationFunction{true}(args...; k
 
 function OptimizationFunction{iip}(f, adtype::AbstractADType = NoAD();
     grad = nothing, hess = nothing, hv = nothing,
-    cons = nothing, cons_j = nothing, cons_h = nothing,
-    lag_h = nothing,
-    hess_prototype = nothing,
-    cons_jac_prototype = __has_jac_prototype(f) ?
-                         f.jac_prototype : nothing,
-    cons_hess_prototype = nothing,
-    lag_hess_prototype = nothing,
-    syms = __has_syms(f) ? f.syms : nothing,
-    paramsyms = __has_paramsyms(f) ? f.paramsyms : nothing,
-    observed = __has_observed(f) ? f.observed :
-               DEFAULT_OBSERVED_NO_TIME,
-    hess_colorvec = __has_colorvec(f) ? f.colorvec : nothing,
-    cons_jac_colorvec = __has_colorvec(f) ? f.colorvec :
-                        nothing,
-    cons_hess_colorvec = __has_colorvec(f) ? f.colorvec :
-                         nothing,
-    lag_hess_colorvec = nothing,
-    expr = nothing, cons_expr = nothing,
-    sys = __has_sys(f) ? f.sys : nothing) where {iip}
-    _f = prepare_function(f)
-    isinplace(_f, 2; has_two_dispatches = false, isoptimization = true)
-    OptimizationFunction{iip, typeof(adtype), typeof(_f), typeof(grad), typeof(hess),
-        typeof(hv),
-        typeof(cons), typeof(cons_j), typeof(cons_h), typeof(lag_h),
-        typeof(hess_prototype),
-        typeof(cons_jac_prototype), typeof(cons_hess_prototype),
-        typeof(lag_hess_prototype),
-        typeof(syms), typeof(paramsyms), typeof(observed),
-        typeof(hess_colorvec), typeof(cons_jac_colorvec),
-        typeof(cons_hess_colorvec), typeof(lag_hess_colorvec),
-        typeof(expr), typeof(cons_expr),
-        typeof(sys)}(_f, adtype, grad, hess,
-        hv, cons, cons_j, cons_h, lag_h,
-        hess_prototype, cons_jac_prototype,
-        cons_hess_prototype, lag_hess_prototype, syms,
-        paramsyms, observed, hess_colorvec,
-        cons_jac_colorvec, cons_hess_colorvec,
-        lag_hess_colorvec, expr,
-        cons_expr, sys)
+                                   cons = nothing, cons_j = nothing, cons_h = nothing,
+                                   hess_prototype = nothing,
+                                   cons_jac_prototype = __has_jac_prototype(f) ?
+                                                        f.jac_prototype : nothing,
+                                   cons_hess_prototype = nothing,
+                                   syms = __has_syms(f) ? f.syms : nothing,
+                                   paramsyms = __has_paramsyms(f) ? f.paramsyms : nothing,
+                                   observed = __has_observed(f) ? f.observed :
+                                              DEFAULT_OBSERVED_NO_TIME,
+                                   expr = nothing, cons_expr = nothing,
+                                   sys = __has_sys(f) ? f.sys : nothing) where {iip}
+    isinplace(f, 2; has_two_dispatches = false, isoptimization = true)
+    OptimizationFunction{iip, typeof(adtype), typeof(f), typeof(grad), typeof(hess),
+                         typeof(hv),
+                         typeof(cons), typeof(cons_j), typeof(cons_h),
+                         typeof(hess_prototype),
+                         typeof(cons_jac_prototype), typeof(cons_hess_prototype),
+                         typeof(syms), typeof(paramsyms), typeof(observed),
+                         typeof(expr), typeof(cons_expr), typeof(sys)
+                         }(f, adtype, grad, hess,
+                           hv, cons, cons_j, cons_h,
+                           hess_prototype, cons_jac_prototype,
+                           cons_hess_prototype, syms,
+                           paramsyms, observed, expr, cons_expr, sys)
 end
 
 function BVPFunction{iip, specialize, twopoint}(f, bc;

From 76bc2ecc1fae64d900f4c2ee16ba3c5464863766 Mon Sep 17 00:00:00 2001
From: Vaibhav Dixit <vaibhavyashdixit@gmail.com>
Date: Sun, 9 Apr 2023 00:57:12 +0530
Subject: [PATCH 3/9] Add back deleted unsed fields

---
 src/problems/basic_problems.jl |  2 +-
 src/scimlfunctions.jl          | 27 +++++++++++++++++++++++----
 2 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/src/problems/basic_problems.jl b/src/problems/basic_problems.jl
index 83802c839..6419d3bab 100644
--- a/src/problems/basic_problems.jl
+++ b/src/problems/basic_problems.jl
@@ -565,7 +565,7 @@ higher-dimension tensors as well if the optimization solver supports it.
 ### Constructors
 
 ```julia
-OptimizationProblem{iip}(f, x, p = SciMLBase.NullParameters(),;
+OptimizationProblem{iip}(f, u0, p = SciMLBase.NullParameters(),;
                         lb = nothing,
                         ub = nothing,
                         lcons = nothing,
diff --git a/src/scimlfunctions.jl b/src/scimlfunctions.jl
index 6092fc0d7..73947bcb8 100644
--- a/src/scimlfunctions.jl
+++ b/src/scimlfunctions.jl
@@ -2106,7 +2106,8 @@ For more details on this argument, see the ODEFunction documentation.
 The fields of the OptimizationFunction type directly match the names of the inputs.
 """
 struct OptimizationFunction{iip, AD, F, G, H, HV, C, CJ, CH, HP, CJP, CHP, S, S2, O,
-                            EX, CEX, SYS} <: AbstractOptimizationFunction{iip}
+                            EX, CEX, SYS, LH, LHP, HCV, CJCV, CHCV, LHCV} <:
+       AbstractOptimizationFunction{iip}
     f::F
     adtype::AD
     grad::G
@@ -2124,6 +2125,12 @@ struct OptimizationFunction{iip, AD, F, G, H, HV, C, CJ, CH, HP, CJP, CHP, S, S2
     expr::EX
     cons_expr::CEX
     sys::SYS
+    lag_h::LH
+    lag_hess_prototype::LHP
+    hess_colorvec::HCV
+    cons_jac_colorvec::CJCV
+    cons_hess_colorvec::CHCV
+    lag_hess_colorvec::LHCV
 end
 
 TruncatedStacktraces.@truncate_stacktrace OptimizationFunction 1 2
@@ -3918,7 +3925,14 @@ function OptimizationFunction{iip}(f, adtype::AbstractADType = NoAD();
                                    observed = __has_observed(f) ? f.observed :
                                               DEFAULT_OBSERVED_NO_TIME,
                                    expr = nothing, cons_expr = nothing,
-                                   sys = __has_sys(f) ? f.sys : nothing) where {iip}
+                                   sys = __has_sys(f) ? f.sys : nothing,
+                                   lag_h = nothing, lag_hess_prototype = nothing,
+                                   hess_colorvec = __has_colorvec(f) ? f.colorvec : nothing,
+                                   cons_jac_colorvec = __has_colorvec(f) ? f.colorvec :
+                                                       nothing,
+                                   cons_hess_colorvec = __has_colorvec(f) ? f.colorvec :
+                                                        nothing,
+                                   lag_hess_colorvec = nothing) where {iip}
     isinplace(f, 2; has_two_dispatches = false, isoptimization = true)
     OptimizationFunction{iip, typeof(adtype), typeof(f), typeof(grad), typeof(hess),
                          typeof(hv),
@@ -3926,12 +3940,17 @@ function OptimizationFunction{iip}(f, adtype::AbstractADType = NoAD();
                          typeof(hess_prototype),
                          typeof(cons_jac_prototype), typeof(cons_hess_prototype),
                          typeof(syms), typeof(paramsyms), typeof(observed),
-                         typeof(expr), typeof(cons_expr), typeof(sys)
+                         typeof(expr), typeof(cons_expr), typeof(sys), typeof(lag_h),
+                         typeof(lag_hess_prototype), typeof(hess_colorvec),
+                         typeof(cons_jac_colorvec), typeof(cons_hess_colorvec),
+                         typeof(lag_hess_colorvec)
                          }(f, adtype, grad, hess,
                            hv, cons, cons_j, cons_h,
                            hess_prototype, cons_jac_prototype,
                            cons_hess_prototype, syms,
-                           paramsyms, observed, expr, cons_expr, sys)
+                           paramsyms, observed, expr, cons_expr, sys,
+                           lag_h, lag_hess_prototype, hess_colorvec, cons_jac_colorvec,
+                           cons_hess_colorvec, lag_hess_colorvec)
 end
 
 function BVPFunction{iip, specialize, twopoint}(f, bc;

From 10db4ef7adf41e76d6f2210e09ca7f4e310b375d Mon Sep 17 00:00:00 2001
From: Vaibhav Kumar Dixit <vaibhavyashdixit@gmail.com>
Date: Sun, 9 Apr 2023 13:52:07 +0530
Subject: [PATCH 4/9] Update src/problems/basic_problems.jl

Co-authored-by: Qingyu Qu <52615090+ErikQQY@users.noreply.github.com>
---
 src/problems/basic_problems.jl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/problems/basic_problems.jl b/src/problems/basic_problems.jl
index 6419d3bab..7317566a6 100644
--- a/src/problems/basic_problems.jl
+++ b/src/problems/basic_problems.jl
@@ -587,7 +587,7 @@ parameters.
 optimization variables. They should be an `AbstractArray` matching the geometry of `u`,
 where `(lb[i],ub[i])` is the box constraint (lower and upper bounds) for `u[i]`.
 
-`lcons` and `ucons` are the upper and lower bounds incase of inequality constraints on the
+`lcons` and `ucons` are the upper and lower bounds in case of inequality constraints on the
 optimization and if they are set to be equal then it represents an equality constraint.
 They should be an `AbstractArray` matching the geometry of `u`, where `(lcons[i],ucons[i])`
 are the lower and upper bounds for `cons[i]`.

From f50062cb101b609a25a7e73ff195fe2bcf8af139 Mon Sep 17 00:00:00 2001
From: Christopher Rackauckas <accounts@chrisrackauckas.com>
Date: Sun, 9 Apr 2023 06:45:02 -0400
Subject: [PATCH 5/9] Update src/problems/basic_problems.jl

---
 src/problems/basic_problems.jl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/problems/basic_problems.jl b/src/problems/basic_problems.jl
index 7317566a6..de5330b5d 100644
--- a/src/problems/basic_problems.jl
+++ b/src/problems/basic_problems.jl
@@ -616,6 +616,7 @@ Any extra keyword arguments are captured to be sent to the optimizers.
 * `ucons`: the vector of upper bounds for the constraints passed to [`OptimizationFunction`](@ref).
     Defaults to `nothing`, implying no upper bounds for the constraints (i.e. the constraint bound is `Inf`)
 * `sense`: the objective sense, can take `MaxSense` or `MinSense` from Optimization.jl.
+* `kwargs`: the keyword arguments passed on to the solvers.
 
 ## Inequality and Equality Constraints
 

From ff932783fa8cb5c90cb9c912a2729f0b5a3d7d07 Mon Sep 17 00:00:00 2001
From: Christopher Rackauckas <accounts@chrisrackauckas.com>
Date: Sun, 9 Apr 2023 06:56:36 -0400
Subject: [PATCH 6/9] Update src/problems/basic_problems.jl

---
 src/problems/basic_problems.jl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/problems/basic_problems.jl b/src/problems/basic_problems.jl
index de5330b5d..cd788c164 100644
--- a/src/problems/basic_problems.jl
+++ b/src/problems/basic_problems.jl
@@ -554,7 +554,7 @@ which is minimized over the domain of ``u``, the collection of optimization vari
 min_u f(u,p)
 ```
 
-Say ``u₀`` is an initial guess for the minimizer. `f` should be specified as `f(u,p)`
+``u₀`` is an initial guess for the minimizer. `f` should be specified as `f(u,p)`
 and `u₀` should be an `AbstractArray` whose geometry matches the
 desired geometry of `u`. Note that we are not limited to vectors
 for `u₀`; one is allowed to provide `u₀` as arbitrary matrices /

From a2ab868cfbaa1cfe027e52ef51fbede2ec997a2b Mon Sep 17 00:00:00 2001
From: Vaibhav Kumar Dixit <vaibhavyashdixit@gmail.com>
Date: Mon, 10 Apr 2023 13:24:56 +0530
Subject: [PATCH 7/9] Apply suggestions from code review

Co-authored-by: Christopher Rackauckas <accounts@chrisrackauckas.com>
---
 src/problems/basic_problems.jl | 4 ++--
 src/scimlfunctions.jl          | 5 +----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/src/problems/basic_problems.jl b/src/problems/basic_problems.jl
index cd788c164..5bfafe888 100644
--- a/src/problems/basic_problems.jl
+++ b/src/problems/basic_problems.jl
@@ -558,7 +558,7 @@ min_u f(u,p)
 and `u₀` should be an `AbstractArray` whose geometry matches the
 desired geometry of `u`. Note that we are not limited to vectors
 for `u₀`; one is allowed to provide `u₀` as arbitrary matrices /
-higher-dimension tensors as well if the optimization solver supports it.
+higher-dimension tensors as well.
 
 ## Problem Type
 
@@ -609,7 +609,7 @@ Any extra keyword arguments are captured to be sent to the optimizers.
 * `p`: the constant parameters used for defining the problem. Defaults to `NullParameters`.
 * `lb`: the lower bounds for the optimization variables `u`.
 * `ub`: the upper bounds for the optimization variables `u`.
-* `int`: integrality indicator for `u`. If `int[i] == 1`, then `u[i]` is an integer variable.
+* `int`: integrality indicator for `u`. If `int[i] == true`, then `u[i]` is an integer variable.
     Defaults to `nothing`, implying no integrality constraints.
 * `lcons`: the vector of lower bounds for the constraints passed to [OptimizationFunction](@ref).
     Defaults to `nothing`, implying no lower bounds for the constraints (i.e. the constraint bound is `-Inf`)
diff --git a/src/scimlfunctions.jl b/src/scimlfunctions.jl
index 73947bcb8..e8f822f1b 100644
--- a/src/scimlfunctions.jl
+++ b/src/scimlfunctions.jl
@@ -2060,9 +2060,6 @@ function described in [Callback Functions](@ref).
   This is defined as an array of matrices, where `hess[i]` is the Hessian w.r.t. the `i`th output.
   For example, if the Hessian is sparse, then `hess` is a `Vector{SparseMatrixCSC}`.
   The default is `nothing`, which means a dense constraint Hessian.
-
-When [Symbolic Problem Building with ModelingToolkit](@ref) interface is used the following arguments are also relevant:
-
 - `syms`: the symbol names for the elements of the equation. This should match `u0` in size. For
   example, if `u = [0.0,1.0]` and `syms = [:x, :y]`, this will apply a canonical naming to the
   values, allowing `sol[:x]` in the solution and automatically naming values in plots.
@@ -2095,7 +2092,7 @@ own dispatches below in the [Automatic Differentiation Construction Choice Recom
 ## iip: In-Place vs Out-Of-Place
 
 For more details on this argument, see the ODEFunction documentation.
-Note that currently `OptimizationFunction` only supports in-place.
+Note that currently Optimization.jl only supports in-place.
 
 ## specialize: Controlling Compilation and Specialization
 

From b229033422929a425981cc8f4fa6b833e0e968cd Mon Sep 17 00:00:00 2001
From: Vaibhav Dixit <vaibhavyashdixit@gmail.com>
Date: Fri, 5 May 2023 19:22:25 +0530
Subject: [PATCH 8/9] Add back removed docstring parts

---
 src/scimlfunctions.jl | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/scimlfunctions.jl b/src/scimlfunctions.jl
index e8f822f1b..c975f4d15 100644
--- a/src/scimlfunctions.jl
+++ b/src/scimlfunctions.jl
@@ -2019,7 +2019,13 @@ OptimizationFunction{iip}(f, adtype::AbstractADType = NoAD();
                           cons_hess_prototype = nothing,
                           syms = __has_syms(f) ? f.syms : nothing,
                           paramsyms = __has_paramsyms(f) ? f.paramsyms : nothing,
-                          observed = __has_observed(f) ? f.observed : DEFAULT_OBSERVED_NO_TIME)
+                          observed = __has_observed(f) ? f.observed : DEFAULT_OBSERVED_NO_TIME,
+                          lag_h = nothing,
+                          hess_colorvec = __has_colorvec(f) ? f.colorvec : nothing,
+                          cons_jac_colorvec = __has_colorvec(f) ? f.colorvec : nothing,
+                          cons_hess_colorvec = __has_colorvec(f) ? f.colorvec : nothing,
+                          lag_hess_colorvec = nothing,
+                          sys = __has_sys(f) ? f.sys : nothing)
 ```
 
 ## Positional Arguments
@@ -2060,6 +2066,23 @@ function described in [Callback Functions](@ref).
   This is defined as an array of matrices, where `hess[i]` is the Hessian w.r.t. the `i`th output.
   For example, if the Hessian is sparse, then `hess` is a `Vector{SparseMatrixCSC}`.
   The default is `nothing`, which means a dense constraint Hessian.
+- `lag_h(res,x,sigma,mu,p)` or `res=lag_h(x,sigma,mu,p)`: the Hessian of the Lagrangian,
+  where `sigma` is a multiplier of the cost function and `mu` are the Lagrange multipliers
+  multiplying the constraints. This can be provided instead of `hess` and `cons_h`
+  to solvers that directly use the Hessian of the Lagrangian.
+- `hess_colorvec`: a color vector according to the SparseDiffTools.jl definition for the sparsity
+  pattern of the `hess_prototype`. This specializes the Hessian construction when using
+  finite differences and automatic differentiation to be computed in an accelerated manner
+  based on the sparsity pattern. Defaults to `nothing`, which means a color vector will be
+  internally computed on demand when required. The cost of this operation is highly dependent
+  on the sparsity pattern.
+- `cons_jac_colorvec`: a color vector according to the SparseDiffTools.jl definition for the sparsity
+  pattern of the `cons_jac_prototype`.
+- `cons_hess_colorvec`: an array of color vector according to the SparseDiffTools.jl definition for
+  the sparsity pattern of the `cons_hess_prototype`.
+
+When [Symbolic Problem Building with ModelingToolkit](@ref) interface is used the following arguments are also relevant:
+
 - `syms`: the symbol names for the elements of the equation. This should match `u0` in size. For
   example, if `u = [0.0,1.0]` and `syms = [:x, :y]`, this will apply a canonical naming to the
   values, allowing `sol[:x]` in the solution and automatically naming values in plots.

From 7b733bd515312c2b8b084bb8eb7628af72a03035 Mon Sep 17 00:00:00 2001
From: Vaibhav Dixit <vaibhavyashdixit@gmail.com>
Date: Fri, 27 Oct 2023 17:05:48 -0400
Subject: [PATCH 9/9] Oop funcs

---
 src/scimlfunctions.jl | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/src/scimlfunctions.jl b/src/scimlfunctions.jl
index c975f4d15..2be746949 100644
--- a/src/scimlfunctions.jl
+++ b/src/scimlfunctions.jl
@@ -2040,20 +2040,20 @@ function described in [Callback Functions](@ref).
 
 ## Keyword Arguments
 
-- `grad(G,u,p)`: the gradient of `f` with respect to `u`. If `f` takes additional arguments
-    then `grad(G,u,p,args...)` should be used.
-- `hess(H,u,p)`: the Hessian of `f` with respect to `u`. If `f` takes additional arguments
-    then `hess(H,u,p,args...)` should be used.
-- `hv(Hv,u,v,p)`: the Hessian-vector product ``(d^2 f / du^2) v``. If `f` takes additional arguments
-    then `hv(Hv,u,v,p,args...)` should be used.
-- `cons(res,x,p)`: the constraints function, should mutate the passed `res` array
+- `grad(G,u,p)` or `G=grad(u,p)`: the gradient of `f` with respect to `u`. If `f` takes additional arguments
+    then `grad(G,u,p,args...)` or `G=grad(u,p,args...)` should be used.
+- `hess(H,u,p)` or `H=hess(u,p)`: the Hessian of `f` with respect to `u`. If `f` takes additional arguments
+    then `hess(H,u,p,args...)` or `H=hess(u,p,args...)` should be used.
+- `hv(Hv,u,v,p)` or `Hv=hv(u,v,p)`: the Hessian-vector product ``(d^2 f / du^2) v``. If `f` takes additional arguments
+    then `hv(Hv,u,v,p,args...)` or `Hv=hv(u,v,p, args...)` should be used.
+- `cons(res,x,p)` or `res=cons(x,p)` : the constraints function, should mutate the passed `res` array
     with value of the `i`th constraint, evaluated at the current values of variables
     inside the optimization routine. This takes just the function evaluations
     and the equality or inequality assertion is applied by the solver based on the constraint
     bounds passed as `lcons` and `ucons` to [`OptimizationProblem`](@ref), in case of equality
     constraints `lcons` and `ucons` should be passed equal values.
-- `cons_j(res,x,p)`: the Jacobian of the constraints.
-- `cons_h(res,x,p)`: the Hessian of the constraints, provided as
+- `cons_j(J,x,p)` or `J=cons_j(x,p)`: the Jacobian of the constraints.
+- `cons_h(H,x,p)` or `H=cons_h(x,p)`: the Hessian of the constraints, provided as
    an array of Hessians with `res[i]` being the Hessian with respect to the `i`th output on `cons`.
 - `hess_prototype`: a prototype matrix matching the type that matches the Hessian. For example,
   if the Hessian is tridiagonal, then an appropriately sized `Hessian` matrix can be used
@@ -2115,7 +2115,6 @@ own dispatches below in the [Automatic Differentiation Construction Choice Recom
 ## iip: In-Place vs Out-Of-Place
 
 For more details on this argument, see the ODEFunction documentation.
-Note that currently Optimization.jl only supports in-place.
 
 ## specialize: Controlling Compilation and Specialization