{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Общий случай" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Постановка задачи" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Пусть есть параметрическая модель $M\\left( \\theta \\right)$, где $\\theta$ - параметры." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Функция правдоподобия $L\\left( X | M\\left( \\theta \\right) \\right)$ определят достоверность получения набора данных $X$ при заданном наборе параметров и данной модели." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "**Задача**: определить такой набор параметров $\\theta$, для которого функция принимает наибольшее значение." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Классификация\n", "\n", "По порядку производной:\n", "\n", "* Не использует производных $L$\n", "\n", "* Использует первую производную $\\frac{\\partial L}{\\partial \\theta_i}$ (градиент)\n", "\n", "* Использует вторые прозиводные $\\frac{\\partial^2 L}{\\partial \\theta_i \\partial \\theta_j}$ (гессиан)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Без производных" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Прямой перебор\n", "(brute force)\n", "* Строим сетку и ищем на ней максимум. \n", "* Возможен только для одномерных, максимум двумерных задач. \n", "* Точность ограничена размером ячкйки сетки." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "### Симплекс методы \n", "1. Строим многоугольник в пространстве параметров с $n+1$ вершинами, где $n$ - размерность пространства. \n", "2. Орпделеляем значения функции в каждой вершине. \n", "3. Находим вершину с наименьшим значением и двигаем ее к центру масс многоугольника." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![Nelder-mead](images/Nelder_Mead1.gif)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 0.000000\n", " Iterations: 339\n", " Function evaluations: 571\n" ] }, { "data": { "text/plain": [ " final_simplex: (array([[1. , 1. , 1. , 1. , 1. ],\n", " [1. , 1. , 1. , 1. , 1. ],\n", " [1. , 1. , 1. , 1.00000001, 1.00000001],\n", " [1. , 1. , 1. , 1. , 1. ],\n", " [1. , 1. , 1. , 1. , 1. ],\n", " [1. , 1. , 1. , 1. , 0.99999999]]), array([4.86115343e-17, 7.65182843e-17, 8.11395684e-17, 8.63263255e-17,\n", " 8.64080682e-17, 2.17927418e-16]))\n", " fun: 4.861153433422115e-17\n", " message: 'Optimization terminated successfully.'\n", " nfev: 571\n", " nit: 339\n", " status: 0\n", " success: True\n", " x: array([1., 1., 1., 1., 1.])" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "from scipy.optimize import minimize\n", "\n", "\n", "def rosen(x):\n", " \"\"\"The Rosenbrock function\"\"\"\n", " return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)\n", "\n", "x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])\n", "minimize(rosen, x0, method='nelder-mead', options={'xtol': 1e-8, 'disp': True})" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on function minimize in module scipy.optimize._minimize:\n", "\n", "minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)\n", " Minimization of scalar function of one or more variables.\n", " \n", " Parameters\n", " ----------\n", " fun : callable\n", " The objective function to be minimized.\n", " \n", " ``fun(x, *args) -> float``\n", " \n", " where x is an 1-D array with shape (n,) and `args`\n", " is a tuple of the fixed parameters needed to completely\n", " specify the function.\n", " x0 : ndarray, shape (n,)\n", " Initial guess. Array of real elements of size (n,),\n", " where 'n' is the number of independent variables.\n", " args : tuple, optional\n", " Extra arguments passed to the objective function and its\n", " derivatives (`fun`, `jac` and `hess` functions).\n", " method : str or callable, optional\n", " Type of solver. Should be one of\n", " \n", " - 'Nelder-Mead' :ref:`(see here) `\n", " - 'Powell' :ref:`(see here) `\n", " - 'CG' :ref:`(see here) `\n", " - 'BFGS' :ref:`(see here) `\n", " - 'Newton-CG' :ref:`(see here) `\n", " - 'L-BFGS-B' :ref:`(see here) `\n", " - 'TNC' :ref:`(see here) `\n", " - 'COBYLA' :ref:`(see here) `\n", " - 'SLSQP' :ref:`(see here) `\n", " - 'trust-constr':ref:`(see here) `\n", " - 'dogleg' :ref:`(see here) `\n", " - 'trust-ncg' :ref:`(see here) `\n", " - 'trust-exact' :ref:`(see here) `\n", " - 'trust-krylov' :ref:`(see here) `\n", " - custom - a callable object (added in version 0.14.0),\n", " see below for description.\n", " \n", " If not given, chosen to be one of ``BFGS``, ``L-BFGS-B``, ``SLSQP``,\n", " depending if the problem has constraints or bounds.\n", " jac : {callable, '2-point', '3-point', 'cs', bool}, optional\n", " Method for computing the gradient vector. Only for CG, BFGS,\n", " Newton-CG, L-BFGS-B, TNC, SLSQP, dogleg, trust-ncg, trust-krylov,\n", " trust-exact and trust-constr. If it is a callable, it should be a\n", " function that returns the gradient vector:\n", " \n", " ``jac(x, *args) -> array_like, shape (n,)``\n", " \n", " where x is an array with shape (n,) and `args` is a tuple with\n", " the fixed parameters. Alternatively, the keywords\n", " {'2-point', '3-point', 'cs'} select a finite\n", " difference scheme for numerical estimation of the gradient. Options\n", " '3-point' and 'cs' are available only to 'trust-constr'.\n", " If `jac` is a Boolean and is True, `fun` is assumed to return the\n", " gradient along with the objective function. If False, the gradient\n", " will be estimated using '2-point' finite difference estimation.\n", " hess : {callable, '2-point', '3-point', 'cs', HessianUpdateStrategy}, optional\n", " Method for computing the Hessian matrix. Only for Newton-CG, dogleg,\n", " trust-ncg, trust-krylov, trust-exact and trust-constr. If it is\n", " callable, it should return the Hessian matrix:\n", " \n", " ``hess(x, *args) -> {LinearOperator, spmatrix, array}, (n, n)``\n", " \n", " where x is a (n,) ndarray and `args` is a tuple with the fixed\n", " parameters. LinearOperator and sparse matrix returns are\n", " allowed only for 'trust-constr' method. Alternatively, the keywords\n", " {'2-point', '3-point', 'cs'} select a finite difference scheme\n", " for numerical estimation. Or, objects implementing\n", " `HessianUpdateStrategy` interface can be used to approximate\n", " the Hessian. Available quasi-Newton methods implementing\n", " this interface are:\n", " \n", " - `BFGS`;\n", " - `SR1`.\n", " \n", " Whenever the gradient is estimated via finite-differences,\n", " the Hessian cannot be estimated with options\n", " {'2-point', '3-point', 'cs'} and needs to be\n", " estimated using one of the quasi-Newton strategies.\n", " Finite-difference options {'2-point', '3-point', 'cs'} and\n", " `HessianUpdateStrategy` are available only for 'trust-constr' method.\n", " hessp : callable, optional\n", " Hessian of objective function times an arbitrary vector p. Only for\n", " Newton-CG, trust-ncg, trust-krylov, trust-constr.\n", " Only one of `hessp` or `hess` needs to be given. If `hess` is\n", " provided, then `hessp` will be ignored. `hessp` must compute the\n", " Hessian times an arbitrary vector:\n", " \n", " ``hessp(x, p, *args) -> ndarray shape (n,)``\n", " \n", " where x is a (n,) ndarray, p is an arbitrary vector with\n", " dimension (n,) and `args` is a tuple with the fixed\n", " parameters.\n", " bounds : sequence or `Bounds`, optional\n", " Bounds on variables for L-BFGS-B, TNC, SLSQP and\n", " trust-constr methods. There are two ways to specify the bounds:\n", " \n", " 1. Instance of `Bounds` class.\n", " 2. Sequence of ``(min, max)`` pairs for each element in `x`. None\n", " is used to specify no bound.\n", " \n", " constraints : {Constraint, dict} or List of {Constraint, dict}, optional\n", " Constraints definition (only for COBYLA, SLSQP and trust-constr).\n", " Constraints for 'trust-constr' are defined as a single object or a\n", " list of objects specifying constraints to the optimization problem.\n", " Available constraints are:\n", " \n", " - `LinearConstraint`\n", " - `NonlinearConstraint`\n", " \n", " Constraints for COBYLA, SLSQP are defined as a list of dictionaries.\n", " Each dictionary with fields:\n", " \n", " type : str\n", " Constraint type: 'eq' for equality, 'ineq' for inequality.\n", " fun : callable\n", " The function defining the constraint.\n", " jac : callable, optional\n", " The Jacobian of `fun` (only for SLSQP).\n", " args : sequence, optional\n", " Extra arguments to be passed to the function and Jacobian.\n", " \n", " Equality constraint means that the constraint function result is to\n", " be zero whereas inequality means that it is to be non-negative.\n", " Note that COBYLA only supports inequality constraints.\n", " tol : float, optional\n", " Tolerance for termination. For detailed control, use solver-specific\n", " options.\n", " options : dict, optional\n", " A dictionary of solver options. All methods accept the following\n", " generic options:\n", " \n", " maxiter : int\n", " Maximum number of iterations to perform.\n", " disp : bool\n", " Set to True to print convergence messages.\n", " \n", " For method-specific options, see :func:`show_options()`.\n", " callback : callable, optional\n", " Called after each iteration. For 'trust-constr' it is a callable with\n", " the signature:\n", " \n", " ``callback(xk, OptimizeResult state) -> bool``\n", " \n", " where ``xk`` is the current parameter vector. and ``state``\n", " is an `OptimizeResult` object, with the same fields\n", " as the ones from the return. If callback returns True\n", " the algorithm execution is terminated.\n", " For all the other methods, the signature is:\n", " \n", " ``callback(xk)``\n", " \n", " where ``xk`` is the current parameter vector.\n", " \n", " Returns\n", " -------\n", " res : OptimizeResult\n", " The optimization result represented as a ``OptimizeResult`` object.\n", " Important attributes are: ``x`` the solution array, ``success`` a\n", " Boolean flag indicating if the optimizer exited successfully and\n", " ``message`` which describes the cause of the termination. See\n", " `OptimizeResult` for a description of other attributes.\n", " \n", " \n", " See also\n", " --------\n", " minimize_scalar : Interface to minimization algorithms for scalar\n", " univariate functions\n", " show_options : Additional options accepted by the solvers\n", " \n", " Notes\n", " -----\n", " This section describes the available solvers that can be selected by the\n", " 'method' parameter. The default method is *BFGS*.\n", " \n", " **Unconstrained minimization**\n", " \n", " Method :ref:`Nelder-Mead ` uses the\n", " Simplex algorithm [1]_, [2]_. This algorithm is robust in many\n", " applications. However, if numerical computation of derivative can be\n", " trusted, other algorithms using the first and/or second derivatives\n", " information might be preferred for their better performance in\n", " general.\n", " \n", " Method :ref:`Powell ` is a modification\n", " of Powell's method [3]_, [4]_ which is a conjugate direction\n", " method. It performs sequential one-dimensional minimizations along\n", " each vector of the directions set (`direc` field in `options` and\n", " `info`), which is updated at each iteration of the main\n", " minimization loop. The function need not be differentiable, and no\n", " derivatives are taken.\n", " \n", " Method :ref:`CG ` uses a nonlinear conjugate\n", " gradient algorithm by Polak and Ribiere, a variant of the\n", " Fletcher-Reeves method described in [5]_ pp. 120-122. Only the\n", " first derivatives are used.\n", " \n", " Method :ref:`BFGS ` uses the quasi-Newton\n", " method of Broyden, Fletcher, Goldfarb, and Shanno (BFGS) [5]_\n", " pp. 136. It uses the first derivatives only. BFGS has proven good\n", " performance even for non-smooth optimizations. This method also\n", " returns an approximation of the Hessian inverse, stored as\n", " `hess_inv` in the OptimizeResult object.\n", " \n", " Method :ref:`Newton-CG ` uses a\n", " Newton-CG algorithm [5]_ pp. 168 (also known as the truncated\n", " Newton method). It uses a CG method to the compute the search\n", " direction. See also *TNC* method for a box-constrained\n", " minimization with a similar algorithm. Suitable for large-scale\n", " problems.\n", " \n", " Method :ref:`dogleg ` uses the dog-leg\n", " trust-region algorithm [5]_ for unconstrained minimization. This\n", " algorithm requires the gradient and Hessian; furthermore the\n", " Hessian is required to be positive definite.\n", " \n", " Method :ref:`trust-ncg ` uses the\n", " Newton conjugate gradient trust-region algorithm [5]_ for\n", " unconstrained minimization. This algorithm requires the gradient\n", " and either the Hessian or a function that computes the product of\n", " the Hessian with a given vector. Suitable for large-scale problems.\n", " \n", " Method :ref:`trust-krylov ` uses\n", " the Newton GLTR trust-region algorithm [14]_, [15]_ for unconstrained\n", " minimization. This algorithm requires the gradient\n", " and either the Hessian or a function that computes the product of\n", " the Hessian with a given vector. Suitable for large-scale problems.\n", " On indefinite problems it requires usually less iterations than the\n", " `trust-ncg` method and is recommended for medium and large-scale problems.\n", " \n", " Method :ref:`trust-exact `\n", " is a trust-region method for unconstrained minimization in which\n", " quadratic subproblems are solved almost exactly [13]_. This\n", " algorithm requires the gradient and the Hessian (which is\n", " *not* required to be positive definite). It is, in many\n", " situations, the Newton method to converge in fewer iteraction\n", " and the most recommended for small and medium-size problems.\n", " \n", " **Bound-Constrained minimization**\n", " \n", " Method :ref:`L-BFGS-B ` uses the L-BFGS-B\n", " algorithm [6]_, [7]_ for bound constrained minimization.\n", " \n", " Method :ref:`TNC ` uses a truncated Newton\n", " algorithm [5]_, [8]_ to minimize a function with variables subject\n", " to bounds. This algorithm uses gradient information; it is also\n", " called Newton Conjugate-Gradient. It differs from the *Newton-CG*\n", " method described above as it wraps a C implementation and allows\n", " each variable to be given upper and lower bounds.\n", " \n", " **Constrained Minimization**\n", " \n", " Method :ref:`COBYLA ` uses the\n", " Constrained Optimization BY Linear Approximation (COBYLA) method\n", " [9]_, [10]_, [11]_. The algorithm is based on linear\n", " approximations to the objective function and each constraint. The\n", " method wraps a FORTRAN implementation of the algorithm. The\n", " constraints functions 'fun' may return either a single number\n", " or an array or list of numbers.\n", " \n", " Method :ref:`SLSQP ` uses Sequential\n", " Least SQuares Programming to minimize a function of several\n", " variables with any combination of bounds, equality and inequality\n", " constraints. The method wraps the SLSQP Optimization subroutine\n", " originally implemented by Dieter Kraft [12]_. Note that the\n", " wrapper handles infinite values in bounds by converting them into\n", " large floating values.\n", " \n", " Method :ref:`trust-constr ` is a\n", " trust-region algorithm for constrained optimization. It swiches\n", " between two implementations depending on the problem definition.\n", " It is the most versatile constrained minimization algorithm\n", " implemented in SciPy and the most appropriate for large-scale problems.\n", " For equality constrained problems it is an implementation of Byrd-Omojokun\n", " Trust-Region SQP method described in [17]_ and in [5]_, p. 549. When\n", " inequality constraints are imposed as well, it swiches to the trust-region\n", " interior point method described in [16]_. This interior point algorithm,\n", " in turn, solves inequality constraints by introducing slack variables\n", " and solving a sequence of equality-constrained barrier problems\n", " for progressively smaller values of the barrier parameter.\n", " The previously described equality constrained SQP method is\n", " used to solve the subproblems with increasing levels of accuracy\n", " as the iterate gets closer to a solution.\n", " \n", " **Finite-Difference Options**\n", " \n", " For Method :ref:`trust-constr `\n", " the gradient and the Hessian may be approximated using\n", " three finite-difference schemes: {'2-point', '3-point', 'cs'}.\n", " The scheme 'cs' is, potentially, the most accurate but it\n", " requires the function to correctly handles complex inputs and to\n", " be differentiable in the complex plane. The scheme '3-point' is more\n", " accurate than '2-point' but requires twice as much operations.\n", " \n", " **Custom minimizers**\n", " \n", " It may be useful to pass a custom minimization method, for example\n", " when using a frontend to this method such as `scipy.optimize.basinhopping`\n", " or a different library. You can simply pass a callable as the ``method``\n", " parameter.\n", " \n", " The callable is called as ``method(fun, x0, args, **kwargs, **options)``\n", " where ``kwargs`` corresponds to any other parameters passed to `minimize`\n", " (such as `callback`, `hess`, etc.), except the `options` dict, which has\n", " its contents also passed as `method` parameters pair by pair. Also, if\n", " `jac` has been passed as a bool type, `jac` and `fun` are mangled so that\n", " `fun` returns just the function values and `jac` is converted to a function\n", " returning the Jacobian. The method shall return an ``OptimizeResult``\n", " object.\n", " \n", " The provided `method` callable must be able to accept (and possibly ignore)\n", " arbitrary parameters; the set of parameters accepted by `minimize` may\n", " expand in future versions and then these parameters will be passed to\n", " the method. You can find an example in the scipy.optimize tutorial.\n", " \n", " .. versionadded:: 0.11.0\n", " \n", " References\n", " ----------\n", " .. [1] Nelder, J A, and R Mead. 1965. A Simplex Method for Function\n", " Minimization. The Computer Journal 7: 308-13.\n", " .. [2] Wright M H. 1996. Direct search methods: Once scorned, now\n", " respectable, in Numerical Analysis 1995: Proceedings of the 1995\n", " Dundee Biennial Conference in Numerical Analysis (Eds. D F\n", " Griffiths and G A Watson). Addison Wesley Longman, Harlow, UK.\n", " 191-208.\n", " .. [3] Powell, M J D. 1964. An efficient method for finding the minimum of\n", " a function of several variables without calculating derivatives. The\n", " Computer Journal 7: 155-162.\n", " .. [4] Press W, S A Teukolsky, W T Vetterling and B P Flannery.\n", " Numerical Recipes (any edition), Cambridge University Press.\n", " .. [5] Nocedal, J, and S J Wright. 2006. Numerical Optimization.\n", " Springer New York.\n", " .. [6] Byrd, R H and P Lu and J. Nocedal. 1995. A Limited Memory\n", " Algorithm for Bound Constrained Optimization. SIAM Journal on\n", " Scientific and Statistical Computing 16 (5): 1190-1208.\n", " .. [7] Zhu, C and R H Byrd and J Nocedal. 1997. L-BFGS-B: Algorithm\n", " 778: L-BFGS-B, FORTRAN routines for large scale bound constrained\n", " optimization. ACM Transactions on Mathematical Software 23 (4):\n", " 550-560.\n", " .. [8] Nash, S G. Newton-Type Minimization Via the Lanczos Method.\n", " 1984. SIAM Journal of Numerical Analysis 21: 770-778.\n", " .. [9] Powell, M J D. A direct search optimization method that models\n", " the objective and constraint functions by linear interpolation.\n", " 1994. Advances in Optimization and Numerical Analysis, eds. S. Gomez\n", " and J-P Hennart, Kluwer Academic (Dordrecht), 51-67.\n", " .. [10] Powell M J D. Direct search algorithms for optimization\n", " calculations. 1998. Acta Numerica 7: 287-336.\n", " .. [11] Powell M J D. A view of algorithms for optimization without\n", " derivatives. 2007.Cambridge University Technical Report DAMTP\n", " 2007/NA03\n", " .. [12] Kraft, D. A software package for sequential quadratic\n", " programming. 1988. Tech. Rep. DFVLR-FB 88-28, DLR German Aerospace\n", " Center -- Institute for Flight Mechanics, Koln, Germany.\n", " .. [13] Conn, A. R., Gould, N. I., and Toint, P. L.\n", " Trust region methods. 2000. Siam. pp. 169-200.\n", " .. [14] F. Lenders, C. Kirches, A. Potschka: \"trlib: A vector-free\n", " implementation of the GLTR method for iterative solution of\n", " the trust region problem\", https://arxiv.org/abs/1611.04718\n", " .. [15] N. Gould, S. Lucidi, M. Roma, P. Toint: \"Solving the\n", " Trust-Region Subproblem using the Lanczos Method\",\n", " SIAM J. Optim., 9(2), 504--525, (1999).\n", " .. [16] Byrd, Richard H., Mary E. Hribar, and Jorge Nocedal. 1999.\n", " An interior point algorithm for large-scale nonlinear programming.\n", " SIAM Journal on Optimization 9.4: 877-900.\n", " .. [17] Lalee, Marucha, Jorge Nocedal, and Todd Plantega. 1998. On the\n", " implementation of an algorithm for large-scale equality constrained\n", " optimization. SIAM Journal on Optimization 8.3: 682-706.\n", " \n", " Examples\n", " --------\n", " Let us consider the problem of minimizing the Rosenbrock function. This\n", " function (and its respective derivatives) is implemented in `rosen`\n", " (resp. `rosen_der`, `rosen_hess`) in the `scipy.optimize`.\n", " \n", " >>> from scipy.optimize import minimize, rosen, rosen_der\n", " \n", " A simple application of the *Nelder-Mead* method is:\n", " \n", " >>> x0 = [1.3, 0.7, 0.8, 1.9, 1.2]\n", " >>> res = minimize(rosen, x0, method='Nelder-Mead', tol=1e-6)\n", " >>> res.x\n", " array([ 1., 1., 1., 1., 1.])\n", " \n", " Now using the *BFGS* algorithm, using the first derivative and a few\n", " options:\n", " \n", " >>> res = minimize(rosen, x0, method='BFGS', jac=rosen_der,\n", " ... options={'gtol': 1e-6, 'disp': True})\n", " Optimization terminated successfully.\n", " Current function value: 0.000000\n", " Iterations: 26\n", " Function evaluations: 31\n", " Gradient evaluations: 31\n", " >>> res.x\n", " array([ 1., 1., 1., 1., 1.])\n", " >>> print(res.message)\n", " Optimization terminated successfully.\n", " >>> res.hess_inv\n", " array([[ 0.00749589, 0.01255155, 0.02396251, 0.04750988, 0.09495377], # may vary\n", " [ 0.01255155, 0.02510441, 0.04794055, 0.09502834, 0.18996269],\n", " [ 0.02396251, 0.04794055, 0.09631614, 0.19092151, 0.38165151],\n", " [ 0.04750988, 0.09502834, 0.19092151, 0.38341252, 0.7664427 ],\n", " [ 0.09495377, 0.18996269, 0.38165151, 0.7664427, 1.53713523]])\n", " \n", " \n", " Next, consider a minimization problem with several constraints (namely\n", " Example 16.4 from [5]_). The objective function is:\n", " \n", " >>> fun = lambda x: (x[0] - 1)**2 + (x[1] - 2.5)**2\n", " \n", " There are three constraints defined as:\n", " \n", " >>> cons = ({'type': 'ineq', 'fun': lambda x: x[0] - 2 * x[1] + 2},\n", " ... {'type': 'ineq', 'fun': lambda x: -x[0] - 2 * x[1] + 6},\n", " ... {'type': 'ineq', 'fun': lambda x: -x[0] + 2 * x[1] + 2})\n", " \n", " And variables must be positive, hence the following bounds:\n", " \n", " >>> bnds = ((0, None), (0, None))\n", " \n", " The optimization problem is solved using the SLSQP method as:\n", " \n", " >>> res = minimize(fun, (2, 0), method='SLSQP', bounds=bnds,\n", " ... constraints=cons)\n", " \n", " It should converge to the theoretical solution (1.4 ,1.7).\n", "\n" ] } ], "source": [ "help(minimize)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Первые производные" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Наискорейший подъем (спуск)\n", "Направление на максимум всегда в направлении градиента функции:\n", "\n", "$$ \\theta_{k+1} = \\theta_k + \\beta_k \\nabla L $$\n", "\n", "* Не понятно, как определять $\\beta$\n", "* Не понятно, когда останавливаться." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Модификация метода - метод сопряженных градиентов на самом деле требует второй производной." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Вторые производные" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Главная формула:\n", "\n", "$$ L(\\theta) = L(\\theta_0) + \\nabla L( \\theta - \\theta_0) + \\frac{1}{2} (\\theta-\\theta_0)^T H (\\theta-\\theta_0) + o(\\theta-\\theta_0)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Метод Ньютона\n", "\n", "$$\\nabla f(\\theta_k) + H(\\theta_k)(\\theta_{k+1} - \\theta_k) = 0$$\n", "\n", "$$ \\theta_{k+1} = \\theta_k - H^{-1}(\\theta_k)\\nabla L(\\theta_k) $$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Можно добавить выбор шага:\n", "\n", "$$ \\theta_{k+1} = \\theta_k - \\lambda_i H^{-1}(\\theta_k)\\nabla L(\\theta_k) $$" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "1.0000000000000016" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from scipy import optimize\n", "optimize.newton(lambda x: x**3 - 1, 1.5)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Методы с переменной метрикой\n", "\n", "* Вычислять $\\nabla L$ и $H$ очень дорого\n", "* Давайте вычислять их итеративно.\n", "\n", "Примеры: \n", "* MINUIT\n", "* scipy `minimize(method=’L-BFGS-B’)`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Случай наименьших квадратов" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "В случае анализа спектров имеем:\n", "\n", "$$ L(X | \\theta) = \\prod p_i (x_i | \\theta)$$\n", "\n", "Или:\n", "\n", "$$\\ln{ L(X | \\theta)} = \\sum \\ln{ p_i (x_i | \\theta)}$$\n", "\n", "В случае нормальных распределений:\n", "\n", "$$\\ln{ L(X | \\theta)} \\sim \\sum{ \\left( \\frac{y_i - \\mu(x_i, \\theta)}{\\sigma_i} \\right)^2 }$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Метод Гаусса-Ньютона\n", "Пусть:\n", "$$r_i = \\frac{y_i - \\mu(x_i, \\theta)}{\\sigma_i}$$ \n", "$$J_{ij} = \\frac{\\partial r_i}{\\partial \\theta_j} = - \\frac{\\partial \\mu(x_i, \\theta)}{\\sigma_i \\partial \\theta_j}$$\n", "\n", "Тогда:\n", "\n", "$$ \\theta_{(k+1)} = \\theta_{(k)} - \\left( J^TJ \\right)^{-1}J^Tr(\\theta)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Алгоритм Левенберга — Марквардта\n", "\n", "$$ \\theta_{(k+1)} = \\theta_{(k)} + \\delta$$\n", "\n", "$$ (J^TJ + \\lambda I)\\delta = J^Tr(\\theta)$$\n", "\n", "При этом $\\lambda$ - фактор регуляризации, выбирается произвольным образом." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Метод квазиоптимальных весов\n", "Идея:\n", " Есть некоторая статистика (функция данных) $f(x)$. Для оптимального решения среднее от этой функции по экспериментальным данным и по модели должны совпадать:\n", "$$ E_\\theta(f(x)) = \\sum_i{f(x_i)} $$\n", "\n", "Можно показать, что оптимальная эффективность получается когда\n", "\n", "$$ f = \\frac{\\partial \\ln L}{\\partial \\theta} $$\n", "\n", "В этом случае и если ошибки распределены по Гауссу или Пуассону, решение для оптмального $\\theta$ можно получить как:\n", "\n", "$$ \n", "\\sum_{i}{\\frac{\\mu_{i}\\left( \\mathbf{\\theta},E_{i} \\right) - x_{i}}{\\sigma_{i}^{2}}\\left. \\ \\frac{\\partial\\mu_{i}\\left( \\mathbf{\\theta},E_{i} \\right)}{\\partial\\mathbf{\\theta}} \\right|_{\\mathbf{\\theta}_{\\mathbf{0}}}} = 0. \n", "$$" ] } ], "metadata": { "celltoolbar": "Slideshow", "hide_input": false, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": false, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": false, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }