894 lines
38 KiB
Plaintext
894 lines
38 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"# Общий случай"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Постановка задачи"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"Пусть есть параметрическая модель $M\\left( \\theta \\right)$, где $\\theta$ - параметры."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"Функция правдоподобия $L\\left( X | M\\left( \\theta \\right) \\right)$ определят достоверность получения набора данных $X$ при заданном наборе параметров и данной модели."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"**Задача**: определить такой набор параметров $\\theta$, для которого функция принимает наибольшее значение."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Классификация\n",
|
||
"\n",
|
||
"По порядку производной:\n",
|
||
"\n",
|
||
"* Не использует производных $L$\n",
|
||
"\n",
|
||
"* Использует первую производную $\\frac{\\partial L}{\\partial \\theta_i}$ (градиент)\n",
|
||
"\n",
|
||
"* Использует вторые прозиводные $\\frac{\\partial^2 L}{\\partial \\theta_i \\partial \\theta_j}$ (гессиан)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Без производных"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Прямой перебор\n",
|
||
"(brute force)\n",
|
||
"* Строим сетку и ищем на ней максимум. \n",
|
||
"* Возможен только для одномерных, максимум двумерных задач. \n",
|
||
"* Точность ограничена размером ячкйки сетки."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Симплекс методы \n",
|
||
"1. Строим многоугольник в пространстве параметров с $n+1$ вершинами, где $n$ - размерность пространства. \n",
|
||
"2. Орпделеляем значения функции в каждой вершине. \n",
|
||
"3. Находим вершину с наименьшим значением и двигаем ее к центру масс многоугольника."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"![Nelder-mead](images/Nelder_Mead1.gif)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 1,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Optimization terminated successfully.\n",
|
||
" Current function value: 0.000000\n",
|
||
" Iterations: 339\n",
|
||
" Function evaluations: 571\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
" final_simplex: (array([[1. , 1. , 1. , 1. , 1. ],\n",
|
||
" [1. , 1. , 1. , 1. , 1. ],\n",
|
||
" [1. , 1. , 1. , 1.00000001, 1.00000001],\n",
|
||
" [1. , 1. , 1. , 1. , 1. ],\n",
|
||
" [1. , 1. , 1. , 1. , 1. ],\n",
|
||
" [1. , 1. , 1. , 1. , 0.99999999]]), array([4.86115343e-17, 7.65182843e-17, 8.11395684e-17, 8.63263255e-17,\n",
|
||
" 8.64080682e-17, 2.17927418e-16]))\n",
|
||
" fun: 4.861153433422115e-17\n",
|
||
" message: 'Optimization terminated successfully.'\n",
|
||
" nfev: 571\n",
|
||
" nit: 339\n",
|
||
" status: 0\n",
|
||
" success: True\n",
|
||
" x: array([1., 1., 1., 1., 1.])"
|
||
]
|
||
},
|
||
"execution_count": 1,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import numpy as np\n",
|
||
"from scipy.optimize import minimize\n",
|
||
"\n",
|
||
"\n",
|
||
"def rosen(x):\n",
|
||
" \"\"\"The Rosenbrock function\"\"\"\n",
|
||
" return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)\n",
|
||
"\n",
|
||
"x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])\n",
|
||
"minimize(rosen, x0, method='nelder-mead', options={'xtol': 1e-8, 'disp': True})"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Help on function minimize in module scipy.optimize._minimize:\n",
|
||
"\n",
|
||
"minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)\n",
|
||
" Minimization of scalar function of one or more variables.\n",
|
||
" \n",
|
||
" Parameters\n",
|
||
" ----------\n",
|
||
" fun : callable\n",
|
||
" The objective function to be minimized.\n",
|
||
" \n",
|
||
" ``fun(x, *args) -> float``\n",
|
||
" \n",
|
||
" where x is an 1-D array with shape (n,) and `args`\n",
|
||
" is a tuple of the fixed parameters needed to completely\n",
|
||
" specify the function.\n",
|
||
" x0 : ndarray, shape (n,)\n",
|
||
" Initial guess. Array of real elements of size (n,),\n",
|
||
" where 'n' is the number of independent variables.\n",
|
||
" args : tuple, optional\n",
|
||
" Extra arguments passed to the objective function and its\n",
|
||
" derivatives (`fun`, `jac` and `hess` functions).\n",
|
||
" method : str or callable, optional\n",
|
||
" Type of solver. Should be one of\n",
|
||
" \n",
|
||
" - 'Nelder-Mead' :ref:`(see here) <optimize.minimize-neldermead>`\n",
|
||
" - 'Powell' :ref:`(see here) <optimize.minimize-powell>`\n",
|
||
" - 'CG' :ref:`(see here) <optimize.minimize-cg>`\n",
|
||
" - 'BFGS' :ref:`(see here) <optimize.minimize-bfgs>`\n",
|
||
" - 'Newton-CG' :ref:`(see here) <optimize.minimize-newtoncg>`\n",
|
||
" - 'L-BFGS-B' :ref:`(see here) <optimize.minimize-lbfgsb>`\n",
|
||
" - 'TNC' :ref:`(see here) <optimize.minimize-tnc>`\n",
|
||
" - 'COBYLA' :ref:`(see here) <optimize.minimize-cobyla>`\n",
|
||
" - 'SLSQP' :ref:`(see here) <optimize.minimize-slsqp>`\n",
|
||
" - 'trust-constr':ref:`(see here) <optimize.minimize-trustconstr>`\n",
|
||
" - 'dogleg' :ref:`(see here) <optimize.minimize-dogleg>`\n",
|
||
" - 'trust-ncg' :ref:`(see here) <optimize.minimize-trustncg>`\n",
|
||
" - 'trust-exact' :ref:`(see here) <optimize.minimize-trustexact>`\n",
|
||
" - 'trust-krylov' :ref:`(see here) <optimize.minimize-trustkrylov>`\n",
|
||
" - custom - a callable object (added in version 0.14.0),\n",
|
||
" see below for description.\n",
|
||
" \n",
|
||
" If not given, chosen to be one of ``BFGS``, ``L-BFGS-B``, ``SLSQP``,\n",
|
||
" depending if the problem has constraints or bounds.\n",
|
||
" jac : {callable, '2-point', '3-point', 'cs', bool}, optional\n",
|
||
" Method for computing the gradient vector. Only for CG, BFGS,\n",
|
||
" Newton-CG, L-BFGS-B, TNC, SLSQP, dogleg, trust-ncg, trust-krylov,\n",
|
||
" trust-exact and trust-constr. If it is a callable, it should be a\n",
|
||
" function that returns the gradient vector:\n",
|
||
" \n",
|
||
" ``jac(x, *args) -> array_like, shape (n,)``\n",
|
||
" \n",
|
||
" where x is an array with shape (n,) and `args` is a tuple with\n",
|
||
" the fixed parameters. Alternatively, the keywords\n",
|
||
" {'2-point', '3-point', 'cs'} select a finite\n",
|
||
" difference scheme for numerical estimation of the gradient. Options\n",
|
||
" '3-point' and 'cs' are available only to 'trust-constr'.\n",
|
||
" If `jac` is a Boolean and is True, `fun` is assumed to return the\n",
|
||
" gradient along with the objective function. If False, the gradient\n",
|
||
" will be estimated using '2-point' finite difference estimation.\n",
|
||
" hess : {callable, '2-point', '3-point', 'cs', HessianUpdateStrategy}, optional\n",
|
||
" Method for computing the Hessian matrix. Only for Newton-CG, dogleg,\n",
|
||
" trust-ncg, trust-krylov, trust-exact and trust-constr. If it is\n",
|
||
" callable, it should return the Hessian matrix:\n",
|
||
" \n",
|
||
" ``hess(x, *args) -> {LinearOperator, spmatrix, array}, (n, n)``\n",
|
||
" \n",
|
||
" where x is a (n,) ndarray and `args` is a tuple with the fixed\n",
|
||
" parameters. LinearOperator and sparse matrix returns are\n",
|
||
" allowed only for 'trust-constr' method. Alternatively, the keywords\n",
|
||
" {'2-point', '3-point', 'cs'} select a finite difference scheme\n",
|
||
" for numerical estimation. Or, objects implementing\n",
|
||
" `HessianUpdateStrategy` interface can be used to approximate\n",
|
||
" the Hessian. Available quasi-Newton methods implementing\n",
|
||
" this interface are:\n",
|
||
" \n",
|
||
" - `BFGS`;\n",
|
||
" - `SR1`.\n",
|
||
" \n",
|
||
" Whenever the gradient is estimated via finite-differences,\n",
|
||
" the Hessian cannot be estimated with options\n",
|
||
" {'2-point', '3-point', 'cs'} and needs to be\n",
|
||
" estimated using one of the quasi-Newton strategies.\n",
|
||
" Finite-difference options {'2-point', '3-point', 'cs'} and\n",
|
||
" `HessianUpdateStrategy` are available only for 'trust-constr' method.\n",
|
||
" hessp : callable, optional\n",
|
||
" Hessian of objective function times an arbitrary vector p. Only for\n",
|
||
" Newton-CG, trust-ncg, trust-krylov, trust-constr.\n",
|
||
" Only one of `hessp` or `hess` needs to be given. If `hess` is\n",
|
||
" provided, then `hessp` will be ignored. `hessp` must compute the\n",
|
||
" Hessian times an arbitrary vector:\n",
|
||
" \n",
|
||
" ``hessp(x, p, *args) -> ndarray shape (n,)``\n",
|
||
" \n",
|
||
" where x is a (n,) ndarray, p is an arbitrary vector with\n",
|
||
" dimension (n,) and `args` is a tuple with the fixed\n",
|
||
" parameters.\n",
|
||
" bounds : sequence or `Bounds`, optional\n",
|
||
" Bounds on variables for L-BFGS-B, TNC, SLSQP and\n",
|
||
" trust-constr methods. There are two ways to specify the bounds:\n",
|
||
" \n",
|
||
" 1. Instance of `Bounds` class.\n",
|
||
" 2. Sequence of ``(min, max)`` pairs for each element in `x`. None\n",
|
||
" is used to specify no bound.\n",
|
||
" \n",
|
||
" constraints : {Constraint, dict} or List of {Constraint, dict}, optional\n",
|
||
" Constraints definition (only for COBYLA, SLSQP and trust-constr).\n",
|
||
" Constraints for 'trust-constr' are defined as a single object or a\n",
|
||
" list of objects specifying constraints to the optimization problem.\n",
|
||
" Available constraints are:\n",
|
||
" \n",
|
||
" - `LinearConstraint`\n",
|
||
" - `NonlinearConstraint`\n",
|
||
" \n",
|
||
" Constraints for COBYLA, SLSQP are defined as a list of dictionaries.\n",
|
||
" Each dictionary with fields:\n",
|
||
" \n",
|
||
" type : str\n",
|
||
" Constraint type: 'eq' for equality, 'ineq' for inequality.\n",
|
||
" fun : callable\n",
|
||
" The function defining the constraint.\n",
|
||
" jac : callable, optional\n",
|
||
" The Jacobian of `fun` (only for SLSQP).\n",
|
||
" args : sequence, optional\n",
|
||
" Extra arguments to be passed to the function and Jacobian.\n",
|
||
" \n",
|
||
" Equality constraint means that the constraint function result is to\n",
|
||
" be zero whereas inequality means that it is to be non-negative.\n",
|
||
" Note that COBYLA only supports inequality constraints.\n",
|
||
" tol : float, optional\n",
|
||
" Tolerance for termination. For detailed control, use solver-specific\n",
|
||
" options.\n",
|
||
" options : dict, optional\n",
|
||
" A dictionary of solver options. All methods accept the following\n",
|
||
" generic options:\n",
|
||
" \n",
|
||
" maxiter : int\n",
|
||
" Maximum number of iterations to perform.\n",
|
||
" disp : bool\n",
|
||
" Set to True to print convergence messages.\n",
|
||
" \n",
|
||
" For method-specific options, see :func:`show_options()`.\n",
|
||
" callback : callable, optional\n",
|
||
" Called after each iteration. For 'trust-constr' it is a callable with\n",
|
||
" the signature:\n",
|
||
" \n",
|
||
" ``callback(xk, OptimizeResult state) -> bool``\n",
|
||
" \n",
|
||
" where ``xk`` is the current parameter vector. and ``state``\n",
|
||
" is an `OptimizeResult` object, with the same fields\n",
|
||
" as the ones from the return. If callback returns True\n",
|
||
" the algorithm execution is terminated.\n",
|
||
" For all the other methods, the signature is:\n",
|
||
" \n",
|
||
" ``callback(xk)``\n",
|
||
" \n",
|
||
" where ``xk`` is the current parameter vector.\n",
|
||
" \n",
|
||
" Returns\n",
|
||
" -------\n",
|
||
" res : OptimizeResult\n",
|
||
" The optimization result represented as a ``OptimizeResult`` object.\n",
|
||
" Important attributes are: ``x`` the solution array, ``success`` a\n",
|
||
" Boolean flag indicating if the optimizer exited successfully and\n",
|
||
" ``message`` which describes the cause of the termination. See\n",
|
||
" `OptimizeResult` for a description of other attributes.\n",
|
||
" \n",
|
||
" \n",
|
||
" See also\n",
|
||
" --------\n",
|
||
" minimize_scalar : Interface to minimization algorithms for scalar\n",
|
||
" univariate functions\n",
|
||
" show_options : Additional options accepted by the solvers\n",
|
||
" \n",
|
||
" Notes\n",
|
||
" -----\n",
|
||
" This section describes the available solvers that can be selected by the\n",
|
||
" 'method' parameter. The default method is *BFGS*.\n",
|
||
" \n",
|
||
" **Unconstrained minimization**\n",
|
||
" \n",
|
||
" Method :ref:`Nelder-Mead <optimize.minimize-neldermead>` uses the\n",
|
||
" Simplex algorithm [1]_, [2]_. This algorithm is robust in many\n",
|
||
" applications. However, if numerical computation of derivative can be\n",
|
||
" trusted, other algorithms using the first and/or second derivatives\n",
|
||
" information might be preferred for their better performance in\n",
|
||
" general.\n",
|
||
" \n",
|
||
" Method :ref:`Powell <optimize.minimize-powell>` is a modification\n",
|
||
" of Powell's method [3]_, [4]_ which is a conjugate direction\n",
|
||
" method. It performs sequential one-dimensional minimizations along\n",
|
||
" each vector of the directions set (`direc` field in `options` and\n",
|
||
" `info`), which is updated at each iteration of the main\n",
|
||
" minimization loop. The function need not be differentiable, and no\n",
|
||
" derivatives are taken.\n",
|
||
" \n",
|
||
" Method :ref:`CG <optimize.minimize-cg>` uses a nonlinear conjugate\n",
|
||
" gradient algorithm by Polak and Ribiere, a variant of the\n",
|
||
" Fletcher-Reeves method described in [5]_ pp. 120-122. Only the\n",
|
||
" first derivatives are used.\n",
|
||
" \n",
|
||
" Method :ref:`BFGS <optimize.minimize-bfgs>` uses the quasi-Newton\n",
|
||
" method of Broyden, Fletcher, Goldfarb, and Shanno (BFGS) [5]_\n",
|
||
" pp. 136. It uses the first derivatives only. BFGS has proven good\n",
|
||
" performance even for non-smooth optimizations. This method also\n",
|
||
" returns an approximation of the Hessian inverse, stored as\n",
|
||
" `hess_inv` in the OptimizeResult object.\n",
|
||
" \n",
|
||
" Method :ref:`Newton-CG <optimize.minimize-newtoncg>` uses a\n",
|
||
" Newton-CG algorithm [5]_ pp. 168 (also known as the truncated\n",
|
||
" Newton method). It uses a CG method to the compute the search\n",
|
||
" direction. See also *TNC* method for a box-constrained\n",
|
||
" minimization with a similar algorithm. Suitable for large-scale\n",
|
||
" problems.\n",
|
||
" \n",
|
||
" Method :ref:`dogleg <optimize.minimize-dogleg>` uses the dog-leg\n",
|
||
" trust-region algorithm [5]_ for unconstrained minimization. This\n",
|
||
" algorithm requires the gradient and Hessian; furthermore the\n",
|
||
" Hessian is required to be positive definite.\n",
|
||
" \n",
|
||
" Method :ref:`trust-ncg <optimize.minimize-trustncg>` uses the\n",
|
||
" Newton conjugate gradient trust-region algorithm [5]_ for\n",
|
||
" unconstrained minimization. This algorithm requires the gradient\n",
|
||
" and either the Hessian or a function that computes the product of\n",
|
||
" the Hessian with a given vector. Suitable for large-scale problems.\n",
|
||
" \n",
|
||
" Method :ref:`trust-krylov <optimize.minimize-trustkrylov>` uses\n",
|
||
" the Newton GLTR trust-region algorithm [14]_, [15]_ for unconstrained\n",
|
||
" minimization. This algorithm requires the gradient\n",
|
||
" and either the Hessian or a function that computes the product of\n",
|
||
" the Hessian with a given vector. Suitable for large-scale problems.\n",
|
||
" On indefinite problems it requires usually less iterations than the\n",
|
||
" `trust-ncg` method and is recommended for medium and large-scale problems.\n",
|
||
" \n",
|
||
" Method :ref:`trust-exact <optimize.minimize-trustexact>`\n",
|
||
" is a trust-region method for unconstrained minimization in which\n",
|
||
" quadratic subproblems are solved almost exactly [13]_. This\n",
|
||
" algorithm requires the gradient and the Hessian (which is\n",
|
||
" *not* required to be positive definite). It is, in many\n",
|
||
" situations, the Newton method to converge in fewer iteraction\n",
|
||
" and the most recommended for small and medium-size problems.\n",
|
||
" \n",
|
||
" **Bound-Constrained minimization**\n",
|
||
" \n",
|
||
" Method :ref:`L-BFGS-B <optimize.minimize-lbfgsb>` uses the L-BFGS-B\n",
|
||
" algorithm [6]_, [7]_ for bound constrained minimization.\n",
|
||
" \n",
|
||
" Method :ref:`TNC <optimize.minimize-tnc>` uses a truncated Newton\n",
|
||
" algorithm [5]_, [8]_ to minimize a function with variables subject\n",
|
||
" to bounds. This algorithm uses gradient information; it is also\n",
|
||
" called Newton Conjugate-Gradient. It differs from the *Newton-CG*\n",
|
||
" method described above as it wraps a C implementation and allows\n",
|
||
" each variable to be given upper and lower bounds.\n",
|
||
" \n",
|
||
" **Constrained Minimization**\n",
|
||
" \n",
|
||
" Method :ref:`COBYLA <optimize.minimize-cobyla>` uses the\n",
|
||
" Constrained Optimization BY Linear Approximation (COBYLA) method\n",
|
||
" [9]_, [10]_, [11]_. The algorithm is based on linear\n",
|
||
" approximations to the objective function and each constraint. The\n",
|
||
" method wraps a FORTRAN implementation of the algorithm. The\n",
|
||
" constraints functions 'fun' may return either a single number\n",
|
||
" or an array or list of numbers.\n",
|
||
" \n",
|
||
" Method :ref:`SLSQP <optimize.minimize-slsqp>` uses Sequential\n",
|
||
" Least SQuares Programming to minimize a function of several\n",
|
||
" variables with any combination of bounds, equality and inequality\n",
|
||
" constraints. The method wraps the SLSQP Optimization subroutine\n",
|
||
" originally implemented by Dieter Kraft [12]_. Note that the\n",
|
||
" wrapper handles infinite values in bounds by converting them into\n",
|
||
" large floating values.\n",
|
||
" \n",
|
||
" Method :ref:`trust-constr <optimize.minimize-trustconstr>` is a\n",
|
||
" trust-region algorithm for constrained optimization. It swiches\n",
|
||
" between two implementations depending on the problem definition.\n",
|
||
" It is the most versatile constrained minimization algorithm\n",
|
||
" implemented in SciPy and the most appropriate for large-scale problems.\n",
|
||
" For equality constrained problems it is an implementation of Byrd-Omojokun\n",
|
||
" Trust-Region SQP method described in [17]_ and in [5]_, p. 549. When\n",
|
||
" inequality constraints are imposed as well, it swiches to the trust-region\n",
|
||
" interior point method described in [16]_. This interior point algorithm,\n",
|
||
" in turn, solves inequality constraints by introducing slack variables\n",
|
||
" and solving a sequence of equality-constrained barrier problems\n",
|
||
" for progressively smaller values of the barrier parameter.\n",
|
||
" The previously described equality constrained SQP method is\n",
|
||
" used to solve the subproblems with increasing levels of accuracy\n",
|
||
" as the iterate gets closer to a solution.\n",
|
||
" \n",
|
||
" **Finite-Difference Options**\n",
|
||
" \n",
|
||
" For Method :ref:`trust-constr <optimize.minimize-trustconstr>`\n",
|
||
" the gradient and the Hessian may be approximated using\n",
|
||
" three finite-difference schemes: {'2-point', '3-point', 'cs'}.\n",
|
||
" The scheme 'cs' is, potentially, the most accurate but it\n",
|
||
" requires the function to correctly handles complex inputs and to\n",
|
||
" be differentiable in the complex plane. The scheme '3-point' is more\n",
|
||
" accurate than '2-point' but requires twice as much operations.\n",
|
||
" \n",
|
||
" **Custom minimizers**\n",
|
||
" \n",
|
||
" It may be useful to pass a custom minimization method, for example\n",
|
||
" when using a frontend to this method such as `scipy.optimize.basinhopping`\n",
|
||
" or a different library. You can simply pass a callable as the ``method``\n",
|
||
" parameter.\n",
|
||
" \n",
|
||
" The callable is called as ``method(fun, x0, args, **kwargs, **options)``\n",
|
||
" where ``kwargs`` corresponds to any other parameters passed to `minimize`\n",
|
||
" (such as `callback`, `hess`, etc.), except the `options` dict, which has\n",
|
||
" its contents also passed as `method` parameters pair by pair. Also, if\n",
|
||
" `jac` has been passed as a bool type, `jac` and `fun` are mangled so that\n",
|
||
" `fun` returns just the function values and `jac` is converted to a function\n",
|
||
" returning the Jacobian. The method shall return an ``OptimizeResult``\n",
|
||
" object.\n",
|
||
" \n",
|
||
" The provided `method` callable must be able to accept (and possibly ignore)\n",
|
||
" arbitrary parameters; the set of parameters accepted by `minimize` may\n",
|
||
" expand in future versions and then these parameters will be passed to\n",
|
||
" the method. You can find an example in the scipy.optimize tutorial.\n",
|
||
" \n",
|
||
" .. versionadded:: 0.11.0\n",
|
||
" \n",
|
||
" References\n",
|
||
" ----------\n",
|
||
" .. [1] Nelder, J A, and R Mead. 1965. A Simplex Method for Function\n",
|
||
" Minimization. The Computer Journal 7: 308-13.\n",
|
||
" .. [2] Wright M H. 1996. Direct search methods: Once scorned, now\n",
|
||
" respectable, in Numerical Analysis 1995: Proceedings of the 1995\n",
|
||
" Dundee Biennial Conference in Numerical Analysis (Eds. D F\n",
|
||
" Griffiths and G A Watson). Addison Wesley Longman, Harlow, UK.\n",
|
||
" 191-208.\n",
|
||
" .. [3] Powell, M J D. 1964. An efficient method for finding the minimum of\n",
|
||
" a function of several variables without calculating derivatives. The\n",
|
||
" Computer Journal 7: 155-162.\n",
|
||
" .. [4] Press W, S A Teukolsky, W T Vetterling and B P Flannery.\n",
|
||
" Numerical Recipes (any edition), Cambridge University Press.\n",
|
||
" .. [5] Nocedal, J, and S J Wright. 2006. Numerical Optimization.\n",
|
||
" Springer New York.\n",
|
||
" .. [6] Byrd, R H and P Lu and J. Nocedal. 1995. A Limited Memory\n",
|
||
" Algorithm for Bound Constrained Optimization. SIAM Journal on\n",
|
||
" Scientific and Statistical Computing 16 (5): 1190-1208.\n",
|
||
" .. [7] Zhu, C and R H Byrd and J Nocedal. 1997. L-BFGS-B: Algorithm\n",
|
||
" 778: L-BFGS-B, FORTRAN routines for large scale bound constrained\n",
|
||
" optimization. ACM Transactions on Mathematical Software 23 (4):\n",
|
||
" 550-560.\n",
|
||
" .. [8] Nash, S G. Newton-Type Minimization Via the Lanczos Method.\n",
|
||
" 1984. SIAM Journal of Numerical Analysis 21: 770-778.\n",
|
||
" .. [9] Powell, M J D. A direct search optimization method that models\n",
|
||
" the objective and constraint functions by linear interpolation.\n",
|
||
" 1994. Advances in Optimization and Numerical Analysis, eds. S. Gomez\n",
|
||
" and J-P Hennart, Kluwer Academic (Dordrecht), 51-67.\n",
|
||
" .. [10] Powell M J D. Direct search algorithms for optimization\n",
|
||
" calculations. 1998. Acta Numerica 7: 287-336.\n",
|
||
" .. [11] Powell M J D. A view of algorithms for optimization without\n",
|
||
" derivatives. 2007.Cambridge University Technical Report DAMTP\n",
|
||
" 2007/NA03\n",
|
||
" .. [12] Kraft, D. A software package for sequential quadratic\n",
|
||
" programming. 1988. Tech. Rep. DFVLR-FB 88-28, DLR German Aerospace\n",
|
||
" Center -- Institute for Flight Mechanics, Koln, Germany.\n",
|
||
" .. [13] Conn, A. R., Gould, N. I., and Toint, P. L.\n",
|
||
" Trust region methods. 2000. Siam. pp. 169-200.\n",
|
||
" .. [14] F. Lenders, C. Kirches, A. Potschka: \"trlib: A vector-free\n",
|
||
" implementation of the GLTR method for iterative solution of\n",
|
||
" the trust region problem\", https://arxiv.org/abs/1611.04718\n",
|
||
" .. [15] N. Gould, S. Lucidi, M. Roma, P. Toint: \"Solving the\n",
|
||
" Trust-Region Subproblem using the Lanczos Method\",\n",
|
||
" SIAM J. Optim., 9(2), 504--525, (1999).\n",
|
||
" .. [16] Byrd, Richard H., Mary E. Hribar, and Jorge Nocedal. 1999.\n",
|
||
" An interior point algorithm for large-scale nonlinear programming.\n",
|
||
" SIAM Journal on Optimization 9.4: 877-900.\n",
|
||
" .. [17] Lalee, Marucha, Jorge Nocedal, and Todd Plantega. 1998. On the\n",
|
||
" implementation of an algorithm for large-scale equality constrained\n",
|
||
" optimization. SIAM Journal on Optimization 8.3: 682-706.\n",
|
||
" \n",
|
||
" Examples\n",
|
||
" --------\n",
|
||
" Let us consider the problem of minimizing the Rosenbrock function. This\n",
|
||
" function (and its respective derivatives) is implemented in `rosen`\n",
|
||
" (resp. `rosen_der`, `rosen_hess`) in the `scipy.optimize`.\n",
|
||
" \n",
|
||
" >>> from scipy.optimize import minimize, rosen, rosen_der\n",
|
||
" \n",
|
||
" A simple application of the *Nelder-Mead* method is:\n",
|
||
" \n",
|
||
" >>> x0 = [1.3, 0.7, 0.8, 1.9, 1.2]\n",
|
||
" >>> res = minimize(rosen, x0, method='Nelder-Mead', tol=1e-6)\n",
|
||
" >>> res.x\n",
|
||
" array([ 1., 1., 1., 1., 1.])\n",
|
||
" \n",
|
||
" Now using the *BFGS* algorithm, using the first derivative and a few\n",
|
||
" options:\n",
|
||
" \n",
|
||
" >>> res = minimize(rosen, x0, method='BFGS', jac=rosen_der,\n",
|
||
" ... options={'gtol': 1e-6, 'disp': True})\n",
|
||
" Optimization terminated successfully.\n",
|
||
" Current function value: 0.000000\n",
|
||
" Iterations: 26\n",
|
||
" Function evaluations: 31\n",
|
||
" Gradient evaluations: 31\n",
|
||
" >>> res.x\n",
|
||
" array([ 1., 1., 1., 1., 1.])\n",
|
||
" >>> print(res.message)\n",
|
||
" Optimization terminated successfully.\n",
|
||
" >>> res.hess_inv\n",
|
||
" array([[ 0.00749589, 0.01255155, 0.02396251, 0.04750988, 0.09495377], # may vary\n",
|
||
" [ 0.01255155, 0.02510441, 0.04794055, 0.09502834, 0.18996269],\n",
|
||
" [ 0.02396251, 0.04794055, 0.09631614, 0.19092151, 0.38165151],\n",
|
||
" [ 0.04750988, 0.09502834, 0.19092151, 0.38341252, 0.7664427 ],\n",
|
||
" [ 0.09495377, 0.18996269, 0.38165151, 0.7664427, 1.53713523]])\n",
|
||
" \n",
|
||
" \n",
|
||
" Next, consider a minimization problem with several constraints (namely\n",
|
||
" Example 16.4 from [5]_). The objective function is:\n",
|
||
" \n",
|
||
" >>> fun = lambda x: (x[0] - 1)**2 + (x[1] - 2.5)**2\n",
|
||
" \n",
|
||
" There are three constraints defined as:\n",
|
||
" \n",
|
||
" >>> cons = ({'type': 'ineq', 'fun': lambda x: x[0] - 2 * x[1] + 2},\n",
|
||
" ... {'type': 'ineq', 'fun': lambda x: -x[0] - 2 * x[1] + 6},\n",
|
||
" ... {'type': 'ineq', 'fun': lambda x: -x[0] + 2 * x[1] + 2})\n",
|
||
" \n",
|
||
" And variables must be positive, hence the following bounds:\n",
|
||
" \n",
|
||
" >>> bnds = ((0, None), (0, None))\n",
|
||
" \n",
|
||
" The optimization problem is solved using the SLSQP method as:\n",
|
||
" \n",
|
||
" >>> res = minimize(fun, (2, 0), method='SLSQP', bounds=bnds,\n",
|
||
" ... constraints=cons)\n",
|
||
" \n",
|
||
" It should converge to the theoretical solution (1.4 ,1.7).\n",
|
||
"\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"help(minimize)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Первые производные"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Наискорейший подъем (спуск)\n",
|
||
"Направление на максимум всегда в направлении градиента функции:\n",
|
||
"\n",
|
||
"$$ \\theta_{k+1} = \\theta_k + \\beta_k \\nabla L $$\n",
|
||
"\n",
|
||
"* Не понятно, как определять $\\beta$\n",
|
||
"* Не понятно, когда останавливаться."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"Модификация метода - метод сопряженных градиентов на самом деле требует второй производной."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"## Вторые производные"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"Главная формула:\n",
|
||
"\n",
|
||
"$$ L(\\theta) = L(\\theta_0) + \\nabla L( \\theta - \\theta_0) + \\frac{1}{2} (\\theta-\\theta_0)^T H (\\theta-\\theta_0) + o(\\theta-\\theta_0)$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Метод Ньютона\n",
|
||
"\n",
|
||
"$$\\nabla f(\\theta_k) + H(\\theta_k)(\\theta_{k+1} - \\theta_k) = 0$$\n",
|
||
"\n",
|
||
"$$ \\theta_{k+1} = \\theta_k - H^{-1}(\\theta_k)\\nabla L(\\theta_k) $$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"source": [
|
||
"Можно добавить выбор шага:\n",
|
||
"\n",
|
||
"$$ \\theta_{k+1} = \\theta_k - \\lambda_i H^{-1}(\\theta_k)\\nabla L(\\theta_k) $$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 12,
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "fragment"
|
||
}
|
||
},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"1.0000000000000016"
|
||
]
|
||
},
|
||
"execution_count": 12,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"from scipy import optimize\n",
|
||
"optimize.newton(lambda x: x**3 - 1, 1.5)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Методы с переменной метрикой\n",
|
||
"\n",
|
||
"* Вычислять $\\nabla L$ и $H$ очень дорого\n",
|
||
"* Давайте вычислять их итеративно.\n",
|
||
"\n",
|
||
"Примеры: \n",
|
||
"* MINUIT\n",
|
||
"* scipy `minimize(method=’L-BFGS-B’)`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "slide"
|
||
}
|
||
},
|
||
"source": [
|
||
"# Случай наименьших квадратов"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"В случае анализа спектров имеем:\n",
|
||
"\n",
|
||
"$$ L(X | \\theta) = \\prod p_i (x_i | \\theta)$$\n",
|
||
"\n",
|
||
"Или:\n",
|
||
"\n",
|
||
"$$\\ln{ L(X | \\theta)} = \\sum \\ln{ p_i (x_i | \\theta)}$$\n",
|
||
"\n",
|
||
"В случае нормальных распределений:\n",
|
||
"\n",
|
||
"$$\\ln{ L(X | \\theta)} \\sim \\sum{ \\left( \\frac{y_i - \\mu(x_i, \\theta)}{\\sigma_i} \\right)^2 }$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Метод Гаусса-Ньютона\n",
|
||
"Пусть:\n",
|
||
"$$r_i = \\frac{y_i - \\mu(x_i, \\theta)}{\\sigma_i}$$ \n",
|
||
"$$J_{ij} = \\frac{\\partial r_i}{\\partial \\theta_j} = - \\frac{\\partial \\mu(x_i, \\theta)}{\\sigma_i \\partial \\theta_j}$$\n",
|
||
"\n",
|
||
"Тогда:\n",
|
||
"\n",
|
||
"$$ \\theta_{(k+1)} = \\theta_{(k)} - \\left( J^TJ \\right)^{-1}J^Tr(\\theta)$$"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Алгоритм Левенберга — Марквардта\n",
|
||
"\n",
|
||
"$$ \\theta_{(k+1)} = \\theta_{(k)} + \\delta$$\n",
|
||
"\n",
|
||
"$$ (J^TJ + \\lambda I)\\delta = J^Tr(\\theta)$$\n",
|
||
"\n",
|
||
"При этом $\\lambda$ - фактор регуляризации, выбирается произвольным образом."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"slideshow": {
|
||
"slide_type": "subslide"
|
||
}
|
||
},
|
||
"source": [
|
||
"### Метод квазиоптимальных весов\n",
|
||
"Идея:\n",
|
||
" Есть некоторая статистика (функция данных) $f(x)$. Для оптимального решения среднее от этой функции по экспериментальным данным и по модели должны совпадать:\n",
|
||
"$$ E_\\theta(f(x)) = \\sum_i{f(x_i)} $$\n",
|
||
"\n",
|
||
"Можно показать, что оптимальная эффективность получается когда\n",
|
||
"\n",
|
||
"$$ f = \\frac{\\partial \\ln L}{\\partial \\theta} $$\n",
|
||
"\n",
|
||
"В этом случае и если ошибки распределены по Гауссу или Пуассону, решение для оптмального $\\theta$ можно получить как:\n",
|
||
"\n",
|
||
"$$ \n",
|
||
"\\sum_{i}{\\frac{\\mu_{i}\\left( \\mathbf{\\theta},E_{i} \\right) - x_{i}}{\\sigma_{i}^{2}}\\left. \\ \\frac{\\partial\\mu_{i}\\left( \\mathbf{\\theta},E_{i} \\right)}{\\partial\\mathbf{\\theta}} \\right|_{\\mathbf{\\theta}_{\\mathbf{0}}}} = 0. \n",
|
||
"$$"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"celltoolbar": "Slideshow",
|
||
"hide_input": false,
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.7.4"
|
||
},
|
||
"toc": {
|
||
"base_numbering": 1,
|
||
"nav_menu": {},
|
||
"number_sections": false,
|
||
"sideBar": false,
|
||
"skip_h1_title": false,
|
||
"title_cell": "Table of Contents",
|
||
"title_sidebar": "Contents",
|
||
"toc_cell": false,
|
||
"toc_position": {},
|
||
"toc_section_display": false,
|
||
"toc_window_display": false
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|