stat-methods/notebooks/python/fitting.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Общий случай"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Постановка задачи"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Пусть есть параметрическая модель $M\\left( \\theta \\right)$, где $\\theta$ - параметры."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Функция правдоподобия $L\\left( X | M\\left( \\theta \\right) \\right)$ определят достоверность получения набора данных $X$ при заданном наборе параметров и данной модели."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "**Задача**: определить такой набор параметров $\\theta$, для которого функция принимает наибольшее значение."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Классификация\n",
    "\n",
    "По порядку производной:\n",
    "\n",
    "* Не использует производных $L$\n",
    "\n",
    "* Использует первую производную $\\frac{\\partial L}{\\partial \\theta_i}$ (градиент)\n",
    "\n",
    "* Использует вторые прозиводные $\\frac{\\partial^2 L}{\\partial \\theta_i \\partial \\theta_j}$ (гессиан)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Без производных"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Прямой перебор\n",
    "(brute force)\n",
    "* Строим сетку и ищем на ней максимум. \n",
    "* Возможен только для одномерных, максимум двумерных задач. \n",
    "* Точность ограничена размером ячкйки сетки."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "### Симплекс методы \n",
    "1. Строим многоугольник в пространстве параметров с $n+1$ вершинами, где $n$ - размерность пространства. \n",
    "2. Орпделеляем значения функции в каждой вершине. \n",
    "3. Находим вершину с наименьшим значением и двигаем ее к центру масс многоугольника."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "![Nelder-mead](images/Nelder_Mead1.gif)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Optimization terminated successfully.\n",
      "         Current function value: 0.000000\n",
      "         Iterations: 339\n",
      "         Function evaluations: 571\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       " final_simplex: (array([[1.        , 1.        , 1.        , 1.        , 1.        ],\n",
       "       [1.        , 1.        , 1.        , 1.        , 1.        ],\n",
       "       [1.        , 1.        , 1.        , 1.00000001, 1.00000001],\n",
       "       [1.        , 1.        , 1.        , 1.        , 1.        ],\n",
       "       [1.        , 1.        , 1.        , 1.        , 1.        ],\n",
       "       [1.        , 1.        , 1.        , 1.        , 0.99999999]]), array([4.86115343e-17, 7.65182843e-17, 8.11395684e-17, 8.63263255e-17,\n",
       "       8.64080682e-17, 2.17927418e-16]))\n",
       "           fun: 4.861153433422115e-17\n",
       "       message: 'Optimization terminated successfully.'\n",
       "          nfev: 571\n",
       "           nit: 339\n",
       "        status: 0\n",
       "       success: True\n",
       "             x: array([1., 1., 1., 1., 1.])"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "from scipy.optimize import minimize\n",
    "\n",
    "\n",
    "def rosen(x):\n",
    "    \"\"\"The Rosenbrock function\"\"\"\n",
    "    return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)\n",
    "\n",
    "x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])\n",
    "minimize(rosen, x0, method='nelder-mead', options={'xtol': 1e-8, 'disp': True})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Help on function minimize in module scipy.optimize._minimize:\n",
      "\n",
      "minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)\n",
      "    Minimization of scalar function of one or more variables.\n",
      "    \n",
      "    Parameters\n",
      "    ----------\n",
      "    fun : callable\n",
      "        The objective function to be minimized.\n",
      "    \n",
      "            ``fun(x, *args) -> float``\n",
      "    \n",
      "        where x is an 1-D array with shape (n,) and `args`\n",
      "        is a tuple of the fixed parameters needed to completely\n",
      "        specify the function.\n",
      "    x0 : ndarray, shape (n,)\n",
      "        Initial guess. Array of real elements of size (n,),\n",
      "        where 'n' is the number of independent variables.\n",
      "    args : tuple, optional\n",
      "        Extra arguments passed to the objective function and its\n",
      "        derivatives (`fun`, `jac` and `hess` functions).\n",
      "    method : str or callable, optional\n",
      "        Type of solver.  Should be one of\n",
      "    \n",
      "            - 'Nelder-Mead' :ref:`(see here) <optimize.minimize-neldermead>`\n",
      "            - 'Powell'      :ref:`(see here) <optimize.minimize-powell>`\n",
      "            - 'CG'          :ref:`(see here) <optimize.minimize-cg>`\n",
      "            - 'BFGS'        :ref:`(see here) <optimize.minimize-bfgs>`\n",
      "            - 'Newton-CG'   :ref:`(see here) <optimize.minimize-newtoncg>`\n",
      "            - 'L-BFGS-B'    :ref:`(see here) <optimize.minimize-lbfgsb>`\n",
      "            - 'TNC'         :ref:`(see here) <optimize.minimize-tnc>`\n",
      "            - 'COBYLA'      :ref:`(see here) <optimize.minimize-cobyla>`\n",
      "            - 'SLSQP'       :ref:`(see here) <optimize.minimize-slsqp>`\n",
      "            - 'trust-constr':ref:`(see here) <optimize.minimize-trustconstr>`\n",
      "            - 'dogleg'      :ref:`(see here) <optimize.minimize-dogleg>`\n",
      "            - 'trust-ncg'   :ref:`(see here) <optimize.minimize-trustncg>`\n",
      "            - 'trust-exact' :ref:`(see here) <optimize.minimize-trustexact>`\n",
      "            - 'trust-krylov' :ref:`(see here) <optimize.minimize-trustkrylov>`\n",
      "            - custom - a callable object (added in version 0.14.0),\n",
      "              see below for description.\n",
      "    \n",
      "        If not given, chosen to be one of ``BFGS``, ``L-BFGS-B``, ``SLSQP``,\n",
      "        depending if the problem has constraints or bounds.\n",
      "    jac : {callable,  '2-point', '3-point', 'cs', bool}, optional\n",
      "        Method for computing the gradient vector. Only for CG, BFGS,\n",
      "        Newton-CG, L-BFGS-B, TNC, SLSQP, dogleg, trust-ncg, trust-krylov,\n",
      "        trust-exact and trust-constr. If it is a callable, it should be a\n",
      "        function that returns the gradient vector:\n",
      "    \n",
      "            ``jac(x, *args) -> array_like, shape (n,)``\n",
      "    \n",
      "        where x is an array with shape (n,) and `args` is a tuple with\n",
      "        the fixed parameters. Alternatively, the keywords\n",
      "        {'2-point', '3-point', 'cs'} select a finite\n",
      "        difference scheme for numerical estimation of the gradient. Options\n",
      "        '3-point' and 'cs' are available only to 'trust-constr'.\n",
      "        If `jac` is a Boolean and is True, `fun` is assumed to return the\n",
      "        gradient along with the objective function. If False, the gradient\n",
      "        will be estimated using '2-point' finite difference estimation.\n",
      "    hess : {callable, '2-point', '3-point', 'cs', HessianUpdateStrategy},  optional\n",
      "        Method for computing the Hessian matrix. Only for Newton-CG, dogleg,\n",
      "        trust-ncg,  trust-krylov, trust-exact and trust-constr. If it is\n",
      "        callable, it should return the  Hessian matrix:\n",
      "    \n",
      "            ``hess(x, *args) -> {LinearOperator, spmatrix, array}, (n, n)``\n",
      "    \n",
      "        where x is a (n,) ndarray and `args` is a tuple with the fixed\n",
      "        parameters. LinearOperator and sparse matrix returns are\n",
      "        allowed only for 'trust-constr' method. Alternatively, the keywords\n",
      "        {'2-point', '3-point', 'cs'} select a finite difference scheme\n",
      "        for numerical estimation. Or, objects implementing\n",
      "        `HessianUpdateStrategy` interface can be used to approximate\n",
      "        the Hessian. Available quasi-Newton methods implementing\n",
      "        this interface are:\n",
      "    \n",
      "            - `BFGS`;\n",
      "            - `SR1`.\n",
      "    \n",
      "        Whenever the gradient is estimated via finite-differences,\n",
      "        the Hessian cannot be estimated with options\n",
      "        {'2-point', '3-point', 'cs'} and needs to be\n",
      "        estimated using one of the quasi-Newton strategies.\n",
      "        Finite-difference options {'2-point', '3-point', 'cs'} and\n",
      "        `HessianUpdateStrategy` are available only for 'trust-constr' method.\n",
      "    hessp : callable, optional\n",
      "        Hessian of objective function times an arbitrary vector p. Only for\n",
      "        Newton-CG, trust-ncg, trust-krylov, trust-constr.\n",
      "        Only one of `hessp` or `hess` needs to be given.  If `hess` is\n",
      "        provided, then `hessp` will be ignored.  `hessp` must compute the\n",
      "        Hessian times an arbitrary vector:\n",
      "    \n",
      "            ``hessp(x, p, *args) ->  ndarray shape (n,)``\n",
      "    \n",
      "        where x is a (n,) ndarray, p is an arbitrary vector with\n",
      "        dimension (n,) and `args` is a tuple with the fixed\n",
      "        parameters.\n",
      "    bounds : sequence or `Bounds`, optional\n",
      "        Bounds on variables for L-BFGS-B, TNC, SLSQP and\n",
      "        trust-constr methods. There are two ways to specify the bounds:\n",
      "    \n",
      "            1. Instance of `Bounds` class.\n",
      "            2. Sequence of ``(min, max)`` pairs for each element in `x`. None\n",
      "               is used to specify no bound.\n",
      "    \n",
      "    constraints : {Constraint, dict} or List of {Constraint, dict}, optional\n",
      "        Constraints definition (only for COBYLA, SLSQP and trust-constr).\n",
      "        Constraints for 'trust-constr' are defined as a single object or a\n",
      "        list of objects specifying constraints to the optimization problem.\n",
      "        Available constraints are:\n",
      "    \n",
      "            - `LinearConstraint`\n",
      "            - `NonlinearConstraint`\n",
      "    \n",
      "        Constraints for COBYLA, SLSQP are defined as a list of dictionaries.\n",
      "        Each dictionary with fields:\n",
      "    \n",
      "            type : str\n",
      "                Constraint type: 'eq' for equality, 'ineq' for inequality.\n",
      "            fun : callable\n",
      "                The function defining the constraint.\n",
      "            jac : callable, optional\n",
      "                The Jacobian of `fun` (only for SLSQP).\n",
      "            args : sequence, optional\n",
      "                Extra arguments to be passed to the function and Jacobian.\n",
      "    \n",
      "        Equality constraint means that the constraint function result is to\n",
      "        be zero whereas inequality means that it is to be non-negative.\n",
      "        Note that COBYLA only supports inequality constraints.\n",
      "    tol : float, optional\n",
      "        Tolerance for termination. For detailed control, use solver-specific\n",
      "        options.\n",
      "    options : dict, optional\n",
      "        A dictionary of solver options. All methods accept the following\n",
      "        generic options:\n",
      "    \n",
      "            maxiter : int\n",
      "                Maximum number of iterations to perform.\n",
      "            disp : bool\n",
      "                Set to True to print convergence messages.\n",
      "    \n",
      "        For method-specific options, see :func:`show_options()`.\n",
      "    callback : callable, optional\n",
      "        Called after each iteration. For 'trust-constr' it is a callable with\n",
      "        the signature:\n",
      "    \n",
      "            ``callback(xk, OptimizeResult state) -> bool``\n",
      "    \n",
      "        where ``xk`` is the current parameter vector. and ``state``\n",
      "        is an `OptimizeResult` object, with the same fields\n",
      "        as the ones from the return.  If callback returns True\n",
      "        the algorithm execution is terminated.\n",
      "        For all the other methods, the signature is:\n",
      "    \n",
      "            ``callback(xk)``\n",
      "    \n",
      "        where ``xk`` is the current parameter vector.\n",
      "    \n",
      "    Returns\n",
      "    -------\n",
      "    res : OptimizeResult\n",
      "        The optimization result represented as a ``OptimizeResult`` object.\n",
      "        Important attributes are: ``x`` the solution array, ``success`` a\n",
      "        Boolean flag indicating if the optimizer exited successfully and\n",
      "        ``message`` which describes the cause of the termination. See\n",
      "        `OptimizeResult` for a description of other attributes.\n",
      "    \n",
      "    \n",
      "    See also\n",
      "    --------\n",
      "    minimize_scalar : Interface to minimization algorithms for scalar\n",
      "        univariate functions\n",
      "    show_options : Additional options accepted by the solvers\n",
      "    \n",
      "    Notes\n",
      "    -----\n",
      "    This section describes the available solvers that can be selected by the\n",
      "    'method' parameter. The default method is *BFGS*.\n",
      "    \n",
      "    **Unconstrained minimization**\n",
      "    \n",
      "    Method :ref:`Nelder-Mead <optimize.minimize-neldermead>` uses the\n",
      "    Simplex algorithm [1]_, [2]_. This algorithm is robust in many\n",
      "    applications. However, if numerical computation of derivative can be\n",
      "    trusted, other algorithms using the first and/or second derivatives\n",
      "    information might be preferred for their better performance in\n",
      "    general.\n",
      "    \n",
      "    Method :ref:`Powell <optimize.minimize-powell>` is a modification\n",
      "    of Powell's method [3]_, [4]_ which is a conjugate direction\n",
      "    method. It performs sequential one-dimensional minimizations along\n",
      "    each vector of the directions set (`direc` field in `options` and\n",
      "    `info`), which is updated at each iteration of the main\n",
      "    minimization loop. The function need not be differentiable, and no\n",
      "    derivatives are taken.\n",
      "    \n",
      "    Method :ref:`CG <optimize.minimize-cg>` uses a nonlinear conjugate\n",
      "    gradient algorithm by Polak and Ribiere, a variant of the\n",
      "    Fletcher-Reeves method described in [5]_ pp.  120-122. Only the\n",
      "    first derivatives are used.\n",
      "    \n",
      "    Method :ref:`BFGS <optimize.minimize-bfgs>` uses the quasi-Newton\n",
      "    method of Broyden, Fletcher, Goldfarb, and Shanno (BFGS) [5]_\n",
      "    pp. 136. It uses the first derivatives only. BFGS has proven good\n",
      "    performance even for non-smooth optimizations. This method also\n",
      "    returns an approximation of the Hessian inverse, stored as\n",
      "    `hess_inv` in the OptimizeResult object.\n",
      "    \n",
      "    Method :ref:`Newton-CG <optimize.minimize-newtoncg>` uses a\n",
      "    Newton-CG algorithm [5]_ pp. 168 (also known as the truncated\n",
      "    Newton method). It uses a CG method to the compute the search\n",
      "    direction. See also *TNC* method for a box-constrained\n",
      "    minimization with a similar algorithm. Suitable for large-scale\n",
      "    problems.\n",
      "    \n",
      "    Method :ref:`dogleg <optimize.minimize-dogleg>` uses the dog-leg\n",
      "    trust-region algorithm [5]_ for unconstrained minimization. This\n",
      "    algorithm requires the gradient and Hessian; furthermore the\n",
      "    Hessian is required to be positive definite.\n",
      "    \n",
      "    Method :ref:`trust-ncg <optimize.minimize-trustncg>` uses the\n",
      "    Newton conjugate gradient trust-region algorithm [5]_ for\n",
      "    unconstrained minimization. This algorithm requires the gradient\n",
      "    and either the Hessian or a function that computes the product of\n",
      "    the Hessian with a given vector. Suitable for large-scale problems.\n",
      "    \n",
      "    Method :ref:`trust-krylov <optimize.minimize-trustkrylov>` uses\n",
      "    the Newton GLTR trust-region algorithm [14]_, [15]_ for unconstrained\n",
      "    minimization. This algorithm requires the gradient\n",
      "    and either the Hessian or a function that computes the product of\n",
      "    the Hessian with a given vector. Suitable for large-scale problems.\n",
      "    On indefinite problems it requires usually less iterations than the\n",
      "    `trust-ncg` method and is recommended for medium and large-scale problems.\n",
      "    \n",
      "    Method :ref:`trust-exact <optimize.minimize-trustexact>`\n",
      "    is a trust-region method for unconstrained minimization in which\n",
      "    quadratic subproblems are solved almost exactly [13]_. This\n",
      "    algorithm requires the gradient and the Hessian (which is\n",
      "    *not* required to be positive definite). It is, in many\n",
      "    situations, the Newton method to converge in fewer iteraction\n",
      "    and the most recommended for small and medium-size problems.\n",
      "    \n",
      "    **Bound-Constrained minimization**\n",
      "    \n",
      "    Method :ref:`L-BFGS-B <optimize.minimize-lbfgsb>` uses the L-BFGS-B\n",
      "    algorithm [6]_, [7]_ for bound constrained minimization.\n",
      "    \n",
      "    Method :ref:`TNC <optimize.minimize-tnc>` uses a truncated Newton\n",
      "    algorithm [5]_, [8]_ to minimize a function with variables subject\n",
      "    to bounds. This algorithm uses gradient information; it is also\n",
      "    called Newton Conjugate-Gradient. It differs from the *Newton-CG*\n",
      "    method described above as it wraps a C implementation and allows\n",
      "    each variable to be given upper and lower bounds.\n",
      "    \n",
      "    **Constrained Minimization**\n",
      "    \n",
      "    Method :ref:`COBYLA <optimize.minimize-cobyla>` uses the\n",
      "    Constrained Optimization BY Linear Approximation (COBYLA) method\n",
      "    [9]_, [10]_, [11]_. The algorithm is based on linear\n",
      "    approximations to the objective function and each constraint. The\n",
      "    method wraps a FORTRAN implementation of the algorithm. The\n",
      "    constraints functions 'fun' may return either a single number\n",
      "    or an array or list of numbers.\n",
      "    \n",
      "    Method :ref:`SLSQP <optimize.minimize-slsqp>` uses Sequential\n",
      "    Least SQuares Programming to minimize a function of several\n",
      "    variables with any combination of bounds, equality and inequality\n",
      "    constraints. The method wraps the SLSQP Optimization subroutine\n",
      "    originally implemented by Dieter Kraft [12]_. Note that the\n",
      "    wrapper handles infinite values in bounds by converting them into\n",
      "    large floating values.\n",
      "    \n",
      "    Method :ref:`trust-constr <optimize.minimize-trustconstr>` is a\n",
      "    trust-region algorithm for constrained optimization. It swiches\n",
      "    between two implementations depending on the problem definition.\n",
      "    It is the most versatile constrained minimization algorithm\n",
      "    implemented in SciPy and the most appropriate for large-scale problems.\n",
      "    For equality constrained problems it is an implementation of Byrd-Omojokun\n",
      "    Trust-Region SQP method described in [17]_ and in [5]_, p. 549. When\n",
      "    inequality constraints  are imposed as well, it swiches to the trust-region\n",
      "    interior point  method described in [16]_. This interior point algorithm,\n",
      "    in turn, solves inequality constraints by introducing slack variables\n",
      "    and solving a sequence of equality-constrained barrier problems\n",
      "    for progressively smaller values of the barrier parameter.\n",
      "    The previously described equality constrained SQP method is\n",
      "    used to solve the subproblems with increasing levels of accuracy\n",
      "    as the iterate gets closer to a solution.\n",
      "    \n",
      "    **Finite-Difference Options**\n",
      "    \n",
      "    For Method :ref:`trust-constr <optimize.minimize-trustconstr>`\n",
      "    the gradient and the Hessian may be approximated using\n",
      "    three finite-difference schemes: {'2-point', '3-point', 'cs'}.\n",
      "    The scheme 'cs' is, potentially, the most accurate but it\n",
      "    requires the function to correctly handles complex inputs and to\n",
      "    be differentiable in the complex plane. The scheme '3-point' is more\n",
      "    accurate than '2-point' but requires twice as much operations.\n",
      "    \n",
      "    **Custom minimizers**\n",
      "    \n",
      "    It may be useful to pass a custom minimization method, for example\n",
      "    when using a frontend to this method such as `scipy.optimize.basinhopping`\n",
      "    or a different library.  You can simply pass a callable as the ``method``\n",
      "    parameter.\n",
      "    \n",
      "    The callable is called as ``method(fun, x0, args, **kwargs, **options)``\n",
      "    where ``kwargs`` corresponds to any other parameters passed to `minimize`\n",
      "    (such as `callback`, `hess`, etc.), except the `options` dict, which has\n",
      "    its contents also passed as `method` parameters pair by pair.  Also, if\n",
      "    `jac` has been passed as a bool type, `jac` and `fun` are mangled so that\n",
      "    `fun` returns just the function values and `jac` is converted to a function\n",
      "    returning the Jacobian.  The method shall return an ``OptimizeResult``\n",
      "    object.\n",
      "    \n",
      "    The provided `method` callable must be able to accept (and possibly ignore)\n",
      "    arbitrary parameters; the set of parameters accepted by `minimize` may\n",
      "    expand in future versions and then these parameters will be passed to\n",
      "    the method.  You can find an example in the scipy.optimize tutorial.\n",
      "    \n",
      "    .. versionadded:: 0.11.0\n",
      "    \n",
      "    References\n",
      "    ----------\n",
      "    .. [1] Nelder, J A, and R Mead. 1965. A Simplex Method for Function\n",
      "        Minimization. The Computer Journal 7: 308-13.\n",
      "    .. [2] Wright M H. 1996. Direct search methods: Once scorned, now\n",
      "        respectable, in Numerical Analysis 1995: Proceedings of the 1995\n",
      "        Dundee Biennial Conference in Numerical Analysis (Eds. D F\n",
      "        Griffiths and G A Watson). Addison Wesley Longman, Harlow, UK.\n",
      "        191-208.\n",
      "    .. [3] Powell, M J D. 1964. An efficient method for finding the minimum of\n",
      "       a function of several variables without calculating derivatives. The\n",
      "       Computer Journal 7: 155-162.\n",
      "    .. [4] Press W, S A Teukolsky, W T Vetterling and B P Flannery.\n",
      "       Numerical Recipes (any edition), Cambridge University Press.\n",
      "    .. [5] Nocedal, J, and S J Wright. 2006. Numerical Optimization.\n",
      "       Springer New York.\n",
      "    .. [6] Byrd, R H and P Lu and J. Nocedal. 1995. A Limited Memory\n",
      "       Algorithm for Bound Constrained Optimization. SIAM Journal on\n",
      "       Scientific and Statistical Computing 16 (5): 1190-1208.\n",
      "    .. [7] Zhu, C and R H Byrd and J Nocedal. 1997. L-BFGS-B: Algorithm\n",
      "       778: L-BFGS-B, FORTRAN routines for large scale bound constrained\n",
      "       optimization. ACM Transactions on Mathematical Software 23 (4):\n",
      "       550-560.\n",
      "    .. [8] Nash, S G. Newton-Type Minimization Via the Lanczos Method.\n",
      "       1984. SIAM Journal of Numerical Analysis 21: 770-778.\n",
      "    .. [9] Powell, M J D. A direct search optimization method that models\n",
      "       the objective and constraint functions by linear interpolation.\n",
      "       1994. Advances in Optimization and Numerical Analysis, eds. S. Gomez\n",
      "       and J-P Hennart, Kluwer Academic (Dordrecht), 51-67.\n",
      "    .. [10] Powell M J D. Direct search algorithms for optimization\n",
      "       calculations. 1998. Acta Numerica 7: 287-336.\n",
      "    .. [11] Powell M J D. A view of algorithms for optimization without\n",
      "       derivatives. 2007.Cambridge University Technical Report DAMTP\n",
      "       2007/NA03\n",
      "    .. [12] Kraft, D. A software package for sequential quadratic\n",
      "       programming. 1988. Tech. Rep. DFVLR-FB 88-28, DLR German Aerospace\n",
      "       Center -- Institute for Flight Mechanics, Koln, Germany.\n",
      "    .. [13] Conn, A. R., Gould, N. I., and Toint, P. L.\n",
      "       Trust region methods. 2000. Siam. pp. 169-200.\n",
      "    .. [14] F. Lenders, C. Kirches, A. Potschka: \"trlib: A vector-free\n",
      "       implementation of the GLTR method for iterative solution of\n",
      "       the trust region problem\", https://arxiv.org/abs/1611.04718\n",
      "    .. [15] N. Gould, S. Lucidi, M. Roma, P. Toint: \"Solving the\n",
      "       Trust-Region Subproblem using the Lanczos Method\",\n",
      "       SIAM J. Optim., 9(2), 504--525, (1999).\n",
      "    .. [16] Byrd, Richard H., Mary E. Hribar, and Jorge Nocedal. 1999.\n",
      "        An interior point algorithm for large-scale nonlinear  programming.\n",
      "        SIAM Journal on Optimization 9.4: 877-900.\n",
      "    .. [17] Lalee, Marucha, Jorge Nocedal, and Todd Plantega. 1998. On the\n",
      "        implementation of an algorithm for large-scale equality constrained\n",
      "        optimization. SIAM Journal on Optimization 8.3: 682-706.\n",
      "    \n",
      "    Examples\n",
      "    --------\n",
      "    Let us consider the problem of minimizing the Rosenbrock function. This\n",
      "    function (and its respective derivatives) is implemented in `rosen`\n",
      "    (resp. `rosen_der`, `rosen_hess`) in the `scipy.optimize`.\n",
      "    \n",
      "    >>> from scipy.optimize import minimize, rosen, rosen_der\n",
      "    \n",
      "    A simple application of the *Nelder-Mead* method is:\n",
      "    \n",
      "    >>> x0 = [1.3, 0.7, 0.8, 1.9, 1.2]\n",
      "    >>> res = minimize(rosen, x0, method='Nelder-Mead', tol=1e-6)\n",
      "    >>> res.x\n",
      "    array([ 1.,  1.,  1.,  1.,  1.])\n",
      "    \n",
      "    Now using the *BFGS* algorithm, using the first derivative and a few\n",
      "    options:\n",
      "    \n",
      "    >>> res = minimize(rosen, x0, method='BFGS', jac=rosen_der,\n",
      "    ...                options={'gtol': 1e-6, 'disp': True})\n",
      "    Optimization terminated successfully.\n",
      "             Current function value: 0.000000\n",
      "             Iterations: 26\n",
      "             Function evaluations: 31\n",
      "             Gradient evaluations: 31\n",
      "    >>> res.x\n",
      "    array([ 1.,  1.,  1.,  1.,  1.])\n",
      "    >>> print(res.message)\n",
      "    Optimization terminated successfully.\n",
      "    >>> res.hess_inv\n",
      "    array([[ 0.00749589,  0.01255155,  0.02396251,  0.04750988,  0.09495377],  # may vary\n",
      "           [ 0.01255155,  0.02510441,  0.04794055,  0.09502834,  0.18996269],\n",
      "           [ 0.02396251,  0.04794055,  0.09631614,  0.19092151,  0.38165151],\n",
      "           [ 0.04750988,  0.09502834,  0.19092151,  0.38341252,  0.7664427 ],\n",
      "           [ 0.09495377,  0.18996269,  0.38165151,  0.7664427,   1.53713523]])\n",
      "    \n",
      "    \n",
      "    Next, consider a minimization problem with several constraints (namely\n",
      "    Example 16.4 from [5]_). The objective function is:\n",
      "    \n",
      "    >>> fun = lambda x: (x[0] - 1)**2 + (x[1] - 2.5)**2\n",
      "    \n",
      "    There are three constraints defined as:\n",
      "    \n",
      "    >>> cons = ({'type': 'ineq', 'fun': lambda x:  x[0] - 2 * x[1] + 2},\n",
      "    ...         {'type': 'ineq', 'fun': lambda x: -x[0] - 2 * x[1] + 6},\n",
      "    ...         {'type': 'ineq', 'fun': lambda x: -x[0] + 2 * x[1] + 2})\n",
      "    \n",
      "    And variables must be positive, hence the following bounds:\n",
      "    \n",
      "    >>> bnds = ((0, None), (0, None))\n",
      "    \n",
      "    The optimization problem is solved using the SLSQP method as:\n",
      "    \n",
      "    >>> res = minimize(fun, (2, 0), method='SLSQP', bounds=bnds,\n",
      "    ...                constraints=cons)\n",
      "    \n",
      "    It should converge to the theoretical solution (1.4 ,1.7).\n",
      "\n"
     ]
    }
   ],
   "source": [
    "help(minimize)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Первые производные"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Наискорейший подъем (спуск)\n",
    "Направление на максимум всегда в направлении градиента функции:\n",
    "\n",
    "$$ \\theta_{k+1} = \\theta_k + \\beta_k \\nabla L $$\n",
    "\n",
    "* Не понятно, как определять $\\beta$\n",
    "* Не понятно, когда останавливаться."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Модификация метода - метод сопряженных градиентов на самом деле требует второй производной."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Вторые производные"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Главная формула:\n",
    "\n",
    "$$ L(\\theta) = L(\\theta_0) + \\nabla L( \\theta - \\theta_0) + \\frac{1}{2} (\\theta-\\theta_0)^T H (\\theta-\\theta_0)  + o(\\theta-\\theta_0)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Метод Ньютона\n",
    "\n",
    "$$\\nabla f(\\theta_k) + H(\\theta_k)(\\theta_{k+1} - \\theta_k)  = 0$$\n",
    "\n",
    "$$ \\theta_{k+1} = \\theta_k - H^{-1}(\\theta_k)\\nabla L(\\theta_k) $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Можно добавить выбор шага:\n",
    "\n",
    "$$ \\theta_{k+1} = \\theta_k - \\lambda_i H^{-1}(\\theta_k)\\nabla L(\\theta_k) $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0000000000000016"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from scipy import optimize\n",
    "optimize.newton(lambda x: x**3 - 1, 1.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Методы с переменной метрикой\n",
    "\n",
    "* Вычислять $\\nabla L$ и $H$ очень дорого\n",
    "* Давайте вычислять их итеративно.\n",
    "\n",
    "Примеры: \n",
    "* MINUIT\n",
    "* scipy `minimize(method=’L-BFGS-B’)`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Случай наименьших квадратов"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "В случае анализа спектров имеем:\n",
    "\n",
    "$$ L(X | \\theta) = \\prod p_i (x_i | \\theta)$$\n",
    "\n",
    "Или:\n",
    "\n",
    "$$\\ln{ L(X | \\theta)} = \\sum \\ln{ p_i (x_i | \\theta)}$$\n",
    "\n",
    "В случае нормальных распределений:\n",
    "\n",
    "$$\\ln{ L(X | \\theta)} \\sim \\sum{ \\left( \\frac{y_i - \\mu(x_i, \\theta)}{\\sigma_i} \\right)^2 }$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Метод Гаусса-Ньютона\n",
    "Пусть:\n",
    "$$r_i = \\frac{y_i - \\mu(x_i, \\theta)}{\\sigma_i}$$ \n",
    "$$J_{ij} = \\frac{\\partial r_i}{\\partial \\theta_j} = - \\frac{\\partial \\mu(x_i, \\theta)}{\\sigma_i \\partial \\theta_j}$$\n",
    "\n",
    "Тогда:\n",
    "\n",
    "$$ \\theta_{(k+1)} = \\theta_{(k)} - \\left( J^TJ \\right)^{-1}J^Tr(\\theta)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Алгоритм Левенберга — Марквардта\n",
    "\n",
    "$$ \\theta_{(k+1)} = \\theta_{(k)} + \\delta$$\n",
    "\n",
    "$$ (J^TJ + \\lambda I)\\delta = J^Tr(\\theta)$$\n",
    "\n",
    "При этом $\\lambda$ - фактор регуляризации, выбирается произвольным образом."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Метод квазиоптимальных весов\n",
    "Идея:\n",
    "    Есть некоторая статистика (функция данных) $f(x)$. Для оптимального решения среднее от этой функции по экспериментальным данным и по модели должны совпадать:\n",
    "$$ E_\\theta(f(x)) = \\sum_i{f(x_i)} $$\n",
    "\n",
    "Можно показать, что оптимальная эффективность получается когда\n",
    "\n",
    "$$ f = \\frac{\\partial \\ln L}{\\partial \\theta} $$\n",
    "\n",
    "В этом случае и если ошибки распределены по Гауссу или Пуассону, решение для оптмального $\\theta$ можно получить как:\n",
    "\n",
    "$$ \n",
    "\\sum_{i}{\\frac{\\mu_{i}\\left( \\mathbf{\\theta},E_{i} \\right) - x_{i}}{\\sigma_{i}^{2}}\\left. \\ \\frac{\\partial\\mu_{i}\\left( \\mathbf{\\theta},E_{i} \\right)}{\\partial\\mathbf{\\theta}} \\right|_{\\mathbf{\\theta}_{\\mathbf{0}}}} = 0. \n",
    "$$"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": false,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": false,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}