stat-methods/notebooks/python/fitting.ipynb

894 lines
38 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Общий случай"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Постановка задачи"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Пусть есть параметрическая модель $M\\left( \\theta \\right)$, где $\\theta$ - параметры."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Функция правдоподобия $L\\left( X | M\\left( \\theta \\right) \\right)$ определят достоверность получения набора данных $X$ при заданном наборе параметров и данной модели."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"**Задача**: определить такой набор параметров $\\theta$, для которого функция принимает наибольшее значение."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Классификация\n",
"\n",
"По порядку производной:\n",
"\n",
"* Не использует производных $L$\n",
"\n",
"* Использует первую производную $\\frac{\\partial L}{\\partial \\theta_i}$ (градиент)\n",
"\n",
"* Использует вторые прозиводные $\\frac{\\partial^2 L}{\\partial \\theta_i \\partial \\theta_j}$ (гессиан)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Без производных"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Прямой перебор\n",
"(brute force)\n",
"* Строим сетку и ищем на ней максимум. \n",
"* Возможен только для одномерных, максимум двумерных задач. \n",
"* Точность ограничена размером ячкйки сетки."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"### Симплекс методы \n",
"1. Строим многоугольник в пространстве параметров с $n+1$ вершинами, где $n$ - размерность пространства. \n",
"2. Орпделеляем значения функции в каждой вершине. \n",
"3. Находим вершину с наименьшим значением и двигаем ее к центру масс многоугольника."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"![Nelder-mead](images/Nelder_Mead1.gif)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Optimization terminated successfully.\n",
" Current function value: 0.000000\n",
" Iterations: 339\n",
" Function evaluations: 571\n"
]
},
{
"data": {
"text/plain": [
" final_simplex: (array([[1. , 1. , 1. , 1. , 1. ],\n",
" [1. , 1. , 1. , 1. , 1. ],\n",
" [1. , 1. , 1. , 1.00000001, 1.00000001],\n",
" [1. , 1. , 1. , 1. , 1. ],\n",
" [1. , 1. , 1. , 1. , 1. ],\n",
" [1. , 1. , 1. , 1. , 0.99999999]]), array([4.86115343e-17, 7.65182843e-17, 8.11395684e-17, 8.63263255e-17,\n",
" 8.64080682e-17, 2.17927418e-16]))\n",
" fun: 4.861153433422115e-17\n",
" message: 'Optimization terminated successfully.'\n",
" nfev: 571\n",
" nit: 339\n",
" status: 0\n",
" success: True\n",
" x: array([1., 1., 1., 1., 1.])"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"from scipy.optimize import minimize\n",
"\n",
"\n",
"def rosen(x):\n",
" \"\"\"The Rosenbrock function\"\"\"\n",
" return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)\n",
"\n",
"x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])\n",
"minimize(rosen, x0, method='nelder-mead', options={'xtol': 1e-8, 'disp': True})"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on function minimize in module scipy.optimize._minimize:\n",
"\n",
"minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)\n",
" Minimization of scalar function of one or more variables.\n",
" \n",
" Parameters\n",
" ----------\n",
" fun : callable\n",
" The objective function to be minimized.\n",
" \n",
" ``fun(x, *args) -> float``\n",
" \n",
" where x is an 1-D array with shape (n,) and `args`\n",
" is a tuple of the fixed parameters needed to completely\n",
" specify the function.\n",
" x0 : ndarray, shape (n,)\n",
" Initial guess. Array of real elements of size (n,),\n",
" where 'n' is the number of independent variables.\n",
" args : tuple, optional\n",
" Extra arguments passed to the objective function and its\n",
" derivatives (`fun`, `jac` and `hess` functions).\n",
" method : str or callable, optional\n",
" Type of solver. Should be one of\n",
" \n",
" - 'Nelder-Mead' :ref:`(see here) <optimize.minimize-neldermead>`\n",
" - 'Powell' :ref:`(see here) <optimize.minimize-powell>`\n",
" - 'CG' :ref:`(see here) <optimize.minimize-cg>`\n",
" - 'BFGS' :ref:`(see here) <optimize.minimize-bfgs>`\n",
" - 'Newton-CG' :ref:`(see here) <optimize.minimize-newtoncg>`\n",
" - 'L-BFGS-B' :ref:`(see here) <optimize.minimize-lbfgsb>`\n",
" - 'TNC' :ref:`(see here) <optimize.minimize-tnc>`\n",
" - 'COBYLA' :ref:`(see here) <optimize.minimize-cobyla>`\n",
" - 'SLSQP' :ref:`(see here) <optimize.minimize-slsqp>`\n",
" - 'trust-constr':ref:`(see here) <optimize.minimize-trustconstr>`\n",
" - 'dogleg' :ref:`(see here) <optimize.minimize-dogleg>`\n",
" - 'trust-ncg' :ref:`(see here) <optimize.minimize-trustncg>`\n",
" - 'trust-exact' :ref:`(see here) <optimize.minimize-trustexact>`\n",
" - 'trust-krylov' :ref:`(see here) <optimize.minimize-trustkrylov>`\n",
" - custom - a callable object (added in version 0.14.0),\n",
" see below for description.\n",
" \n",
" If not given, chosen to be one of ``BFGS``, ``L-BFGS-B``, ``SLSQP``,\n",
" depending if the problem has constraints or bounds.\n",
" jac : {callable, '2-point', '3-point', 'cs', bool}, optional\n",
" Method for computing the gradient vector. Only for CG, BFGS,\n",
" Newton-CG, L-BFGS-B, TNC, SLSQP, dogleg, trust-ncg, trust-krylov,\n",
" trust-exact and trust-constr. If it is a callable, it should be a\n",
" function that returns the gradient vector:\n",
" \n",
" ``jac(x, *args) -> array_like, shape (n,)``\n",
" \n",
" where x is an array with shape (n,) and `args` is a tuple with\n",
" the fixed parameters. Alternatively, the keywords\n",
" {'2-point', '3-point', 'cs'} select a finite\n",
" difference scheme for numerical estimation of the gradient. Options\n",
" '3-point' and 'cs' are available only to 'trust-constr'.\n",
" If `jac` is a Boolean and is True, `fun` is assumed to return the\n",
" gradient along with the objective function. If False, the gradient\n",
" will be estimated using '2-point' finite difference estimation.\n",
" hess : {callable, '2-point', '3-point', 'cs', HessianUpdateStrategy}, optional\n",
" Method for computing the Hessian matrix. Only for Newton-CG, dogleg,\n",
" trust-ncg, trust-krylov, trust-exact and trust-constr. If it is\n",
" callable, it should return the Hessian matrix:\n",
" \n",
" ``hess(x, *args) -> {LinearOperator, spmatrix, array}, (n, n)``\n",
" \n",
" where x is a (n,) ndarray and `args` is a tuple with the fixed\n",
" parameters. LinearOperator and sparse matrix returns are\n",
" allowed only for 'trust-constr' method. Alternatively, the keywords\n",
" {'2-point', '3-point', 'cs'} select a finite difference scheme\n",
" for numerical estimation. Or, objects implementing\n",
" `HessianUpdateStrategy` interface can be used to approximate\n",
" the Hessian. Available quasi-Newton methods implementing\n",
" this interface are:\n",
" \n",
" - `BFGS`;\n",
" - `SR1`.\n",
" \n",
" Whenever the gradient is estimated via finite-differences,\n",
" the Hessian cannot be estimated with options\n",
" {'2-point', '3-point', 'cs'} and needs to be\n",
" estimated using one of the quasi-Newton strategies.\n",
" Finite-difference options {'2-point', '3-point', 'cs'} and\n",
" `HessianUpdateStrategy` are available only for 'trust-constr' method.\n",
" hessp : callable, optional\n",
" Hessian of objective function times an arbitrary vector p. Only for\n",
" Newton-CG, trust-ncg, trust-krylov, trust-constr.\n",
" Only one of `hessp` or `hess` needs to be given. If `hess` is\n",
" provided, then `hessp` will be ignored. `hessp` must compute the\n",
" Hessian times an arbitrary vector:\n",
" \n",
" ``hessp(x, p, *args) -> ndarray shape (n,)``\n",
" \n",
" where x is a (n,) ndarray, p is an arbitrary vector with\n",
" dimension (n,) and `args` is a tuple with the fixed\n",
" parameters.\n",
" bounds : sequence or `Bounds`, optional\n",
" Bounds on variables for L-BFGS-B, TNC, SLSQP and\n",
" trust-constr methods. There are two ways to specify the bounds:\n",
" \n",
" 1. Instance of `Bounds` class.\n",
" 2. Sequence of ``(min, max)`` pairs for each element in `x`. None\n",
" is used to specify no bound.\n",
" \n",
" constraints : {Constraint, dict} or List of {Constraint, dict}, optional\n",
" Constraints definition (only for COBYLA, SLSQP and trust-constr).\n",
" Constraints for 'trust-constr' are defined as a single object or a\n",
" list of objects specifying constraints to the optimization problem.\n",
" Available constraints are:\n",
" \n",
" - `LinearConstraint`\n",
" - `NonlinearConstraint`\n",
" \n",
" Constraints for COBYLA, SLSQP are defined as a list of dictionaries.\n",
" Each dictionary with fields:\n",
" \n",
" type : str\n",
" Constraint type: 'eq' for equality, 'ineq' for inequality.\n",
" fun : callable\n",
" The function defining the constraint.\n",
" jac : callable, optional\n",
" The Jacobian of `fun` (only for SLSQP).\n",
" args : sequence, optional\n",
" Extra arguments to be passed to the function and Jacobian.\n",
" \n",
" Equality constraint means that the constraint function result is to\n",
" be zero whereas inequality means that it is to be non-negative.\n",
" Note that COBYLA only supports inequality constraints.\n",
" tol : float, optional\n",
" Tolerance for termination. For detailed control, use solver-specific\n",
" options.\n",
" options : dict, optional\n",
" A dictionary of solver options. All methods accept the following\n",
" generic options:\n",
" \n",
" maxiter : int\n",
" Maximum number of iterations to perform.\n",
" disp : bool\n",
" Set to True to print convergence messages.\n",
" \n",
" For method-specific options, see :func:`show_options()`.\n",
" callback : callable, optional\n",
" Called after each iteration. For 'trust-constr' it is a callable with\n",
" the signature:\n",
" \n",
" ``callback(xk, OptimizeResult state) -> bool``\n",
" \n",
" where ``xk`` is the current parameter vector. and ``state``\n",
" is an `OptimizeResult` object, with the same fields\n",
" as the ones from the return. If callback returns True\n",
" the algorithm execution is terminated.\n",
" For all the other methods, the signature is:\n",
" \n",
" ``callback(xk)``\n",
" \n",
" where ``xk`` is the current parameter vector.\n",
" \n",
" Returns\n",
" -------\n",
" res : OptimizeResult\n",
" The optimization result represented as a ``OptimizeResult`` object.\n",
" Important attributes are: ``x`` the solution array, ``success`` a\n",
" Boolean flag indicating if the optimizer exited successfully and\n",
" ``message`` which describes the cause of the termination. See\n",
" `OptimizeResult` for a description of other attributes.\n",
" \n",
" \n",
" See also\n",
" --------\n",
" minimize_scalar : Interface to minimization algorithms for scalar\n",
" univariate functions\n",
" show_options : Additional options accepted by the solvers\n",
" \n",
" Notes\n",
" -----\n",
" This section describes the available solvers that can be selected by the\n",
" 'method' parameter. The default method is *BFGS*.\n",
" \n",
" **Unconstrained minimization**\n",
" \n",
" Method :ref:`Nelder-Mead <optimize.minimize-neldermead>` uses the\n",
" Simplex algorithm [1]_, [2]_. This algorithm is robust in many\n",
" applications. However, if numerical computation of derivative can be\n",
" trusted, other algorithms using the first and/or second derivatives\n",
" information might be preferred for their better performance in\n",
" general.\n",
" \n",
" Method :ref:`Powell <optimize.minimize-powell>` is a modification\n",
" of Powell's method [3]_, [4]_ which is a conjugate direction\n",
" method. It performs sequential one-dimensional minimizations along\n",
" each vector of the directions set (`direc` field in `options` and\n",
" `info`), which is updated at each iteration of the main\n",
" minimization loop. The function need not be differentiable, and no\n",
" derivatives are taken.\n",
" \n",
" Method :ref:`CG <optimize.minimize-cg>` uses a nonlinear conjugate\n",
" gradient algorithm by Polak and Ribiere, a variant of the\n",
" Fletcher-Reeves method described in [5]_ pp. 120-122. Only the\n",
" first derivatives are used.\n",
" \n",
" Method :ref:`BFGS <optimize.minimize-bfgs>` uses the quasi-Newton\n",
" method of Broyden, Fletcher, Goldfarb, and Shanno (BFGS) [5]_\n",
" pp. 136. It uses the first derivatives only. BFGS has proven good\n",
" performance even for non-smooth optimizations. This method also\n",
" returns an approximation of the Hessian inverse, stored as\n",
" `hess_inv` in the OptimizeResult object.\n",
" \n",
" Method :ref:`Newton-CG <optimize.minimize-newtoncg>` uses a\n",
" Newton-CG algorithm [5]_ pp. 168 (also known as the truncated\n",
" Newton method). It uses a CG method to the compute the search\n",
" direction. See also *TNC* method for a box-constrained\n",
" minimization with a similar algorithm. Suitable for large-scale\n",
" problems.\n",
" \n",
" Method :ref:`dogleg <optimize.minimize-dogleg>` uses the dog-leg\n",
" trust-region algorithm [5]_ for unconstrained minimization. This\n",
" algorithm requires the gradient and Hessian; furthermore the\n",
" Hessian is required to be positive definite.\n",
" \n",
" Method :ref:`trust-ncg <optimize.minimize-trustncg>` uses the\n",
" Newton conjugate gradient trust-region algorithm [5]_ for\n",
" unconstrained minimization. This algorithm requires the gradient\n",
" and either the Hessian or a function that computes the product of\n",
" the Hessian with a given vector. Suitable for large-scale problems.\n",
" \n",
" Method :ref:`trust-krylov <optimize.minimize-trustkrylov>` uses\n",
" the Newton GLTR trust-region algorithm [14]_, [15]_ for unconstrained\n",
" minimization. This algorithm requires the gradient\n",
" and either the Hessian or a function that computes the product of\n",
" the Hessian with a given vector. Suitable for large-scale problems.\n",
" On indefinite problems it requires usually less iterations than the\n",
" `trust-ncg` method and is recommended for medium and large-scale problems.\n",
" \n",
" Method :ref:`trust-exact <optimize.minimize-trustexact>`\n",
" is a trust-region method for unconstrained minimization in which\n",
" quadratic subproblems are solved almost exactly [13]_. This\n",
" algorithm requires the gradient and the Hessian (which is\n",
" *not* required to be positive definite). It is, in many\n",
" situations, the Newton method to converge in fewer iteraction\n",
" and the most recommended for small and medium-size problems.\n",
" \n",
" **Bound-Constrained minimization**\n",
" \n",
" Method :ref:`L-BFGS-B <optimize.minimize-lbfgsb>` uses the L-BFGS-B\n",
" algorithm [6]_, [7]_ for bound constrained minimization.\n",
" \n",
" Method :ref:`TNC <optimize.minimize-tnc>` uses a truncated Newton\n",
" algorithm [5]_, [8]_ to minimize a function with variables subject\n",
" to bounds. This algorithm uses gradient information; it is also\n",
" called Newton Conjugate-Gradient. It differs from the *Newton-CG*\n",
" method described above as it wraps a C implementation and allows\n",
" each variable to be given upper and lower bounds.\n",
" \n",
" **Constrained Minimization**\n",
" \n",
" Method :ref:`COBYLA <optimize.minimize-cobyla>` uses the\n",
" Constrained Optimization BY Linear Approximation (COBYLA) method\n",
" [9]_, [10]_, [11]_. The algorithm is based on linear\n",
" approximations to the objective function and each constraint. The\n",
" method wraps a FORTRAN implementation of the algorithm. The\n",
" constraints functions 'fun' may return either a single number\n",
" or an array or list of numbers.\n",
" \n",
" Method :ref:`SLSQP <optimize.minimize-slsqp>` uses Sequential\n",
" Least SQuares Programming to minimize a function of several\n",
" variables with any combination of bounds, equality and inequality\n",
" constraints. The method wraps the SLSQP Optimization subroutine\n",
" originally implemented by Dieter Kraft [12]_. Note that the\n",
" wrapper handles infinite values in bounds by converting them into\n",
" large floating values.\n",
" \n",
" Method :ref:`trust-constr <optimize.minimize-trustconstr>` is a\n",
" trust-region algorithm for constrained optimization. It swiches\n",
" between two implementations depending on the problem definition.\n",
" It is the most versatile constrained minimization algorithm\n",
" implemented in SciPy and the most appropriate for large-scale problems.\n",
" For equality constrained problems it is an implementation of Byrd-Omojokun\n",
" Trust-Region SQP method described in [17]_ and in [5]_, p. 549. When\n",
" inequality constraints are imposed as well, it swiches to the trust-region\n",
" interior point method described in [16]_. This interior point algorithm,\n",
" in turn, solves inequality constraints by introducing slack variables\n",
" and solving a sequence of equality-constrained barrier problems\n",
" for progressively smaller values of the barrier parameter.\n",
" The previously described equality constrained SQP method is\n",
" used to solve the subproblems with increasing levels of accuracy\n",
" as the iterate gets closer to a solution.\n",
" \n",
" **Finite-Difference Options**\n",
" \n",
" For Method :ref:`trust-constr <optimize.minimize-trustconstr>`\n",
" the gradient and the Hessian may be approximated using\n",
" three finite-difference schemes: {'2-point', '3-point', 'cs'}.\n",
" The scheme 'cs' is, potentially, the most accurate but it\n",
" requires the function to correctly handles complex inputs and to\n",
" be differentiable in the complex plane. The scheme '3-point' is more\n",
" accurate than '2-point' but requires twice as much operations.\n",
" \n",
" **Custom minimizers**\n",
" \n",
" It may be useful to pass a custom minimization method, for example\n",
" when using a frontend to this method such as `scipy.optimize.basinhopping`\n",
" or a different library. You can simply pass a callable as the ``method``\n",
" parameter.\n",
" \n",
" The callable is called as ``method(fun, x0, args, **kwargs, **options)``\n",
" where ``kwargs`` corresponds to any other parameters passed to `minimize`\n",
" (such as `callback`, `hess`, etc.), except the `options` dict, which has\n",
" its contents also passed as `method` parameters pair by pair. Also, if\n",
" `jac` has been passed as a bool type, `jac` and `fun` are mangled so that\n",
" `fun` returns just the function values and `jac` is converted to a function\n",
" returning the Jacobian. The method shall return an ``OptimizeResult``\n",
" object.\n",
" \n",
" The provided `method` callable must be able to accept (and possibly ignore)\n",
" arbitrary parameters; the set of parameters accepted by `minimize` may\n",
" expand in future versions and then these parameters will be passed to\n",
" the method. You can find an example in the scipy.optimize tutorial.\n",
" \n",
" .. versionadded:: 0.11.0\n",
" \n",
" References\n",
" ----------\n",
" .. [1] Nelder, J A, and R Mead. 1965. A Simplex Method for Function\n",
" Minimization. The Computer Journal 7: 308-13.\n",
" .. [2] Wright M H. 1996. Direct search methods: Once scorned, now\n",
" respectable, in Numerical Analysis 1995: Proceedings of the 1995\n",
" Dundee Biennial Conference in Numerical Analysis (Eds. D F\n",
" Griffiths and G A Watson). Addison Wesley Longman, Harlow, UK.\n",
" 191-208.\n",
" .. [3] Powell, M J D. 1964. An efficient method for finding the minimum of\n",
" a function of several variables without calculating derivatives. The\n",
" Computer Journal 7: 155-162.\n",
" .. [4] Press W, S A Teukolsky, W T Vetterling and B P Flannery.\n",
" Numerical Recipes (any edition), Cambridge University Press.\n",
" .. [5] Nocedal, J, and S J Wright. 2006. Numerical Optimization.\n",
" Springer New York.\n",
" .. [6] Byrd, R H and P Lu and J. Nocedal. 1995. A Limited Memory\n",
" Algorithm for Bound Constrained Optimization. SIAM Journal on\n",
" Scientific and Statistical Computing 16 (5): 1190-1208.\n",
" .. [7] Zhu, C and R H Byrd and J Nocedal. 1997. L-BFGS-B: Algorithm\n",
" 778: L-BFGS-B, FORTRAN routines for large scale bound constrained\n",
" optimization. ACM Transactions on Mathematical Software 23 (4):\n",
" 550-560.\n",
" .. [8] Nash, S G. Newton-Type Minimization Via the Lanczos Method.\n",
" 1984. SIAM Journal of Numerical Analysis 21: 770-778.\n",
" .. [9] Powell, M J D. A direct search optimization method that models\n",
" the objective and constraint functions by linear interpolation.\n",
" 1994. Advances in Optimization and Numerical Analysis, eds. S. Gomez\n",
" and J-P Hennart, Kluwer Academic (Dordrecht), 51-67.\n",
" .. [10] Powell M J D. Direct search algorithms for optimization\n",
" calculations. 1998. Acta Numerica 7: 287-336.\n",
" .. [11] Powell M J D. A view of algorithms for optimization without\n",
" derivatives. 2007.Cambridge University Technical Report DAMTP\n",
" 2007/NA03\n",
" .. [12] Kraft, D. A software package for sequential quadratic\n",
" programming. 1988. Tech. Rep. DFVLR-FB 88-28, DLR German Aerospace\n",
" Center -- Institute for Flight Mechanics, Koln, Germany.\n",
" .. [13] Conn, A. R., Gould, N. I., and Toint, P. L.\n",
" Trust region methods. 2000. Siam. pp. 169-200.\n",
" .. [14] F. Lenders, C. Kirches, A. Potschka: \"trlib: A vector-free\n",
" implementation of the GLTR method for iterative solution of\n",
" the trust region problem\", https://arxiv.org/abs/1611.04718\n",
" .. [15] N. Gould, S. Lucidi, M. Roma, P. Toint: \"Solving the\n",
" Trust-Region Subproblem using the Lanczos Method\",\n",
" SIAM J. Optim., 9(2), 504--525, (1999).\n",
" .. [16] Byrd, Richard H., Mary E. Hribar, and Jorge Nocedal. 1999.\n",
" An interior point algorithm for large-scale nonlinear programming.\n",
" SIAM Journal on Optimization 9.4: 877-900.\n",
" .. [17] Lalee, Marucha, Jorge Nocedal, and Todd Plantega. 1998. On the\n",
" implementation of an algorithm for large-scale equality constrained\n",
" optimization. SIAM Journal on Optimization 8.3: 682-706.\n",
" \n",
" Examples\n",
" --------\n",
" Let us consider the problem of minimizing the Rosenbrock function. This\n",
" function (and its respective derivatives) is implemented in `rosen`\n",
" (resp. `rosen_der`, `rosen_hess`) in the `scipy.optimize`.\n",
" \n",
" >>> from scipy.optimize import minimize, rosen, rosen_der\n",
" \n",
" A simple application of the *Nelder-Mead* method is:\n",
" \n",
" >>> x0 = [1.3, 0.7, 0.8, 1.9, 1.2]\n",
" >>> res = minimize(rosen, x0, method='Nelder-Mead', tol=1e-6)\n",
" >>> res.x\n",
" array([ 1., 1., 1., 1., 1.])\n",
" \n",
" Now using the *BFGS* algorithm, using the first derivative and a few\n",
" options:\n",
" \n",
" >>> res = minimize(rosen, x0, method='BFGS', jac=rosen_der,\n",
" ... options={'gtol': 1e-6, 'disp': True})\n",
" Optimization terminated successfully.\n",
" Current function value: 0.000000\n",
" Iterations: 26\n",
" Function evaluations: 31\n",
" Gradient evaluations: 31\n",
" >>> res.x\n",
" array([ 1., 1., 1., 1., 1.])\n",
" >>> print(res.message)\n",
" Optimization terminated successfully.\n",
" >>> res.hess_inv\n",
" array([[ 0.00749589, 0.01255155, 0.02396251, 0.04750988, 0.09495377], # may vary\n",
" [ 0.01255155, 0.02510441, 0.04794055, 0.09502834, 0.18996269],\n",
" [ 0.02396251, 0.04794055, 0.09631614, 0.19092151, 0.38165151],\n",
" [ 0.04750988, 0.09502834, 0.19092151, 0.38341252, 0.7664427 ],\n",
" [ 0.09495377, 0.18996269, 0.38165151, 0.7664427, 1.53713523]])\n",
" \n",
" \n",
" Next, consider a minimization problem with several constraints (namely\n",
" Example 16.4 from [5]_). The objective function is:\n",
" \n",
" >>> fun = lambda x: (x[0] - 1)**2 + (x[1] - 2.5)**2\n",
" \n",
" There are three constraints defined as:\n",
" \n",
" >>> cons = ({'type': 'ineq', 'fun': lambda x: x[0] - 2 * x[1] + 2},\n",
" ... {'type': 'ineq', 'fun': lambda x: -x[0] - 2 * x[1] + 6},\n",
" ... {'type': 'ineq', 'fun': lambda x: -x[0] + 2 * x[1] + 2})\n",
" \n",
" And variables must be positive, hence the following bounds:\n",
" \n",
" >>> bnds = ((0, None), (0, None))\n",
" \n",
" The optimization problem is solved using the SLSQP method as:\n",
" \n",
" >>> res = minimize(fun, (2, 0), method='SLSQP', bounds=bnds,\n",
" ... constraints=cons)\n",
" \n",
" It should converge to the theoretical solution (1.4 ,1.7).\n",
"\n"
]
}
],
"source": [
"help(minimize)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Первые производные"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Наискорейший подъем (спуск)\n",
"Направление на максимум всегда в направлении градиента функции:\n",
"\n",
"$$ \\theta_{k+1} = \\theta_k + \\beta_k \\nabla L $$\n",
"\n",
"* Не понятно, как определять $\\beta$\n",
"* Не понятно, когда останавливаться."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Модификация метода - метод сопряженных градиентов на самом деле требует второй производной."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Вторые производные"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Главная формула:\n",
"\n",
"$$ L(\\theta) = L(\\theta_0) + \\nabla L( \\theta - \\theta_0) + \\frac{1}{2} (\\theta-\\theta_0)^T H (\\theta-\\theta_0) + o(\\theta-\\theta_0)$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Метод Ньютона\n",
"\n",
"$$\\nabla f(\\theta_k) + H(\\theta_k)(\\theta_{k+1} - \\theta_k) = 0$$\n",
"\n",
"$$ \\theta_{k+1} = \\theta_k - H^{-1}(\\theta_k)\\nabla L(\\theta_k) $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Можно добавить выбор шага:\n",
"\n",
"$$ \\theta_{k+1} = \\theta_k - \\lambda_i H^{-1}(\\theta_k)\\nabla L(\\theta_k) $$"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"outputs": [
{
"data": {
"text/plain": [
"1.0000000000000016"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from scipy import optimize\n",
"optimize.newton(lambda x: x**3 - 1, 1.5)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Методы с переменной метрикой\n",
"\n",
"* Вычислять $\\nabla L$ и $H$ очень дорого\n",
"* Давайте вычислять их итеративно.\n",
"\n",
"Примеры: \n",
"* MINUIT\n",
"* scipy `minimize(method=L-BFGS-B)`"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Случай наименьших квадратов"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"В случае анализа спектров имеем:\n",
"\n",
"$$ L(X | \\theta) = \\prod p_i (x_i | \\theta)$$\n",
"\n",
"Или:\n",
"\n",
"$$\\ln{ L(X | \\theta)} = \\sum \\ln{ p_i (x_i | \\theta)}$$\n",
"\n",
"В случае нормальных распределений:\n",
"\n",
"$$\\ln{ L(X | \\theta)} \\sim \\sum{ \\left( \\frac{y_i - \\mu(x_i, \\theta)}{\\sigma_i} \\right)^2 }$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Метод Гаусса-Ньютона\n",
"Пусть:\n",
"$$r_i = \\frac{y_i - \\mu(x_i, \\theta)}{\\sigma_i}$$ \n",
"$$J_{ij} = \\frac{\\partial r_i}{\\partial \\theta_j} = - \\frac{\\partial \\mu(x_i, \\theta)}{\\sigma_i \\partial \\theta_j}$$\n",
"\n",
"Тогда:\n",
"\n",
"$$ \\theta_{(k+1)} = \\theta_{(k)} - \\left( J^TJ \\right)^{-1}J^Tr(\\theta)$$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Алгоритм Левенберга — Марквардта\n",
"\n",
"$$ \\theta_{(k+1)} = \\theta_{(k)} + \\delta$$\n",
"\n",
"$$ (J^TJ + \\lambda I)\\delta = J^Tr(\\theta)$$\n",
"\n",
"При этом $\\lambda$ - фактор регуляризации, выбирается произвольным образом."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Метод квазиоптимальных весов\n",
"Идея:\n",
" Есть некоторая статистика (функция данных) $f(x)$. Для оптимального решения среднее от этой функции по экспериментальным данным и по модели должны совпадать:\n",
"$$ E_\\theta(f(x)) = \\sum_i{f(x_i)} $$\n",
"\n",
"Можно показать, что оптимальная эффективность получается когда\n",
"\n",
"$$ f = \\frac{\\partial \\ln L}{\\partial \\theta} $$\n",
"\n",
"В этом случае и если ошибки распределены по Гауссу или Пуассону, решение для оптмального $\\theta$ можно получить как:\n",
"\n",
"$$ \n",
"\\sum_{i}{\\frac{\\mu_{i}\\left( \\mathbf{\\theta},E_{i} \\right) - x_{i}}{\\sigma_{i}^{2}}\\left. \\ \\frac{\\partial\\mu_{i}\\left( \\mathbf{\\theta},E_{i} \\right)}{\\partial\\mathbf{\\theta}} \\right|_{\\mathbf{\\theta}_{\\mathbf{0}}}} = 0. \n",
"$$"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"hide_input": false,
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": false,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": false,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 4
}