_/numerics-2022

forked from SPC-education/numerics-2022

2022-11-04 23:55:03 +03:00

52 KiB

Raw Permalink Blame History

In [2]:

from IPython.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

Lecture 4: Conditioning. Floating point arithmetic and stability. Systems of linear equations.¶

Introductory example¶

Consider solving a linear systems of equations $$Ax = f$$ with

$$A_{ij} = \frac{1}{i + j + 1}, \quad i,j = 0, \ldots, n-1.$$

For example, at $n=5$:

$$ A=\begin{bmatrix}1&{\frac {1}{2}}&{\frac {1}{3}}&{\frac {1}{4}}&{\frac {1}{5}}\\{\frac {1}{2}}&{\frac {1}{3}}&{\frac {1}{4}}&{\frac {1}{5}}&{\frac {1}{6}}\\{\frac {1}{3}}&{\frac {1}{4}}&{\frac {1}{5}}&{\frac {1}{6}}&{\frac {1}{7}}\\{\frac {1}{4}}&{\frac {1}{5}}&{\frac {1}{6}}&{\frac {1}{7}}&{\frac {1}{8}}\\{\frac {1}{5}}&{\frac {1}{6}}&{\frac {1}{7}}&{\frac {1}{8}}&{\frac {1}{9}}\end{bmatrix}. $$

In [3]:

def hilbert_fl(n):
    return np.array([[1.0/(i + j + 1) for i in range(n)] for j in range(n)])

Lets try to solve a small system numerically¶

In [4]:

import numpy as np
n = 3
A = hilbert_fl(n)
f = np.random.randn(n)
x = np.linalg.solve(A, f) # we have not discussed yet what this function is doing but apparently it solves a system of equations

In [6]:

err = np.linalg.norm(A @ x - f) / np.linalg.norm(f)
print(err)

2.7277615519878704e-15

So far so good, what about a larger system?¶

In [45]:

n = 15
A = hilbert_fl(n)
f = np.random.randint(-3,3,n)
x = np.linalg.solve(A, f)

In [46]:

err = np.linalg.norm(A @ x - f) / np.linalg.norm(f)
print(err)

0.5838742081211422

The error grows with increase of $n$, and we have to find out why. It is not a problem with the algorithm!

Lets solve the system exactly symbolically¶

In [47]:

from sympy import Rational
from sympy.matrices import Matrix
def hilbert_sym(n):
    return Matrix([[Rational(1, i+j+1) for i in range(n)] for j in range(n)])

In [48]:

hilbert_fl(2)

Out[48]:

array([[1.        , 0.5       ],
       [0.5       , 0.33333333]])

In [49]:

hilbert_sym(2)

Out[49]:

$\displaystyle \left[\begin{matrix}1 & \frac{1}{2}\\\frac{1}{2} & \frac{1}{3}\end{matrix}\right]$

In [26]:

n = 12

In [50]:

A_sym = hilbert_sym(n)
A_sym_inv = A_sym.inv(method="LU")

In [31]:

A_sym_inv

Out[31]:

$\displaystyle \left[\begin{array}{cccccccccccc}144 & -10296 & 240240 & -2702700 & 17297280 & -68612544 & 176432256 & -299304720 & 332560800 & -232792560 & 93117024 & -16224936\\-10296 & 981552 & -25765740 & 309188880 & -2061259200 & 8409937536 & -22076086032 & 38044955520 & -42800574960 & 30263032800 & -12206089896 & 2141691552\\240240 & -25765740 & 721440720 & -9018009000 & 61837776000 & -257554337040 & 686811565440 & -1198416098880 & 1361836476000 & -970938969000 & 394350596640 & -69604975440\\-2702700 & 309188880 & -9018009000 & 115945830000 & -811620810000 & 3434057827200 & -9271956133440 & 16342037712000 & -18725251545000 & 13443770340000 & -5492740453200 & 974469656160\\17297280 & -2061259200 & 61837776000 & -811620810000 & 5771525760000 & -24725216355840 & 67432408243200 & -119841609888000 & 138278780640000 & -99868008240000 & 41012462050560 & -7308522421200\\-68612544 & 8409937536 & -257554337040 & 3434057827200 & -24725216355840 & 106992754412544 & -294230074634496 & 526565596677120 & -611192210428800 & 443680271274240 & -183018111900624 & 32742180446976\\176432256 & -22076086032 & 686811565440 & -9271956133440 & 67432408243200 & -294230074634496 & 814790975910912 & -1466861305029120 & 1711338189200640 & -1247850762958800 & 516757021837056 & -92769511266432\\-299304720 & 38044955520 & -1198416098880 & 16342037712000 & -119841609888000 & 526565596677120 & -1466861305029120 & 2654320456719360 & -3110531785218000 & 2276990587872000 & -946216088737920 & 170392979877120\\332560800 & -42800574960 & 1361836476000 & -18725251545000 & 138278780640000 & -611192210428800 & 1711338189200640 & -3110531785218000 & 3659449159080000 & -2688113888460000 & 1120519052452800 & -202341663604080\\-232792560 & 30263032800 & -970938969000 & 13443770340000 & -99868008240000 & 443680271274240 & -1247850762958800 & 2276990587872000 & -2688113888460000 & 1980715496760000 & -827939077645680 & 149882713780800\\93117024 & -12206089896 & 394350596640 & -5492740453200 & 41012462050560 & -183018111900624 & 516757021837056 & -946216088737920 & 1120519052452800 & -827939077645680 & 346945899203904 & -62950739787936\\-16224936 & 2141691552 & -69604975440 & 974469656160 & -7308522421200 & 32742180446976 & -92769511266432 & 170392979877120 & -202341663604080 & 149882713780800 & -62950739787936 & 11445589052352\end{array}\right]$

In [51]:

A_sym_inv[:3,:3]

Out[51]:

$\displaystyle \left[\begin{matrix}225 & -25200 & 928200\\-25200 & 3763200 & -155937600\\928200 & -155937600 & 6892441920\end{matrix}\right]$

In [52]:

x_sym = A_sym_inv @ f

In [53]:

x_sym[:3]

Out[53]:

array([178587533700, -36343113031680, 1840126802426640], dtype=object)

In [54]:

A_sym @ x_sym

Out[54]:

array([2, -2, -1, 2, 2, -2, 1, -2, -1, -2, 0, -2, -2, -1, -2],
      dtype=object)

In [55]:

Out[55]:

array([ 2, -2, -1,  2,  2, -2,  1, -2, -1, -2,  0, -2, -2, -1, -2])

Thats cool: we found an exact solution. How does it relate to the one we found numerically above?

In [56]:

Out[56]:

array([ 1.68481673e+09, -2.49810169e+11,  9.23256364e+12, -1.47865831e+14,
        1.26548177e+15, -6.35346650e+15,  1.90893727e+16, -3.10903988e+16,
        8.47002490e+15,  7.74591773e+16, -1.83635048e+17,  2.14030465e+17,
       -1.44209897e+17,  5.37157771e+16, -8.60260839e+15])

In [57]:

x_sym

Out[57]:

array([178587533700, -36343113031680, 1840126802426640,
       -40763720391095520, 494478649301036400, -3692652241725326880,
       18178212566190156480, -61465382245798399440, 146065938596015157720,
       -246060765797279312880, 292261743908214880800,
       -239289867888435655200, 128509153948247763600,
       -40739658302071890000, 5777758685388081600], dtype=object)

In [59]:

np.linalg.norm(x - x_sym.astype(np.float64))/np.linalg.norm(x)

Out[59]:

1494.0414789748738

So we have two errors: error on the solution, $|x-x_0|/|x_0|$ and error on the residual, $|A x-f|/|f|$. Note that we can easily estimate the second type of error but estimating the first one is tricky: you may think you have to know the true solution $x_0$.

Lets try to further increase the system size $n$.

Whats going on? Why dows np.linalg.solve fail so miserably?

In [60]:

A_sym

Out[60]:

$\displaystyle \left[\begin{array}{ccccccccccccccc}1 & \frac{1}{2} & \frac{1}{3} & \frac{1}{4} & \frac{1}{5} & \frac{1}{6} & \frac{1}{7} & \frac{1}{8} & \frac{1}{9} & \frac{1}{10} & \frac{1}{11} & \frac{1}{12} & \frac{1}{13} & \frac{1}{14} & \frac{1}{15}\\\frac{1}{2} & \frac{1}{3} & \frac{1}{4} & \frac{1}{5} & \frac{1}{6} & \frac{1}{7} & \frac{1}{8} & \frac{1}{9} & \frac{1}{10} & \frac{1}{11} & \frac{1}{12} & \frac{1}{13} & \frac{1}{14} & \frac{1}{15} & \frac{1}{16}\\\frac{1}{3} & \frac{1}{4} & \frac{1}{5} & \frac{1}{6} & \frac{1}{7} & \frac{1}{8} & \frac{1}{9} & \frac{1}{10} & \frac{1}{11} & \frac{1}{12} & \frac{1}{13} & \frac{1}{14} & \frac{1}{15} & \frac{1}{16} & \frac{1}{17}\\\frac{1}{4} & \frac{1}{5} & \frac{1}{6} & \frac{1}{7} & \frac{1}{8} & \frac{1}{9} & \frac{1}{10} & \frac{1}{11} & \frac{1}{12} & \frac{1}{13} & \frac{1}{14} & \frac{1}{15} & \frac{1}{16} & \frac{1}{17} & \frac{1}{18}\\\frac{1}{5} & \frac{1}{6} & \frac{1}{7} & \frac{1}{8} & \frac{1}{9} & \frac{1}{10} & \frac{1}{11} & \frac{1}{12} & \frac{1}{13} & \frac{1}{14} & \frac{1}{15} & \frac{1}{16} & \frac{1}{17} & \frac{1}{18} & \frac{1}{19}\\\frac{1}{6} & \frac{1}{7} & \frac{1}{8} & \frac{1}{9} & \frac{1}{10} & \frac{1}{11} & \frac{1}{12} & \frac{1}{13} & \frac{1}{14} & \frac{1}{15} & \frac{1}{16} & \frac{1}{17} & \frac{1}{18} & \frac{1}{19} & \frac{1}{20}\\\frac{1}{7} & \frac{1}{8} & \frac{1}{9} & \frac{1}{10} & \frac{1}{11} & \frac{1}{12} & \frac{1}{13} & \frac{1}{14} & \frac{1}{15} & \frac{1}{16} & \frac{1}{17} & \frac{1}{18} & \frac{1}{19} & \frac{1}{20} & \frac{1}{21}\\\frac{1}{8} & \frac{1}{9} & \frac{1}{10} & \frac{1}{11} & \frac{1}{12} & \frac{1}{13} & \frac{1}{14} & \frac{1}{15} & \frac{1}{16} & \frac{1}{17} & \frac{1}{18} & \frac{1}{19} & \frac{1}{20} & \frac{1}{21} & \frac{1}{22}\\\frac{1}{9} & \frac{1}{10} & \frac{1}{11} & \frac{1}{12} & \frac{1}{13} & \frac{1}{14} & \frac{1}{15} & \frac{1}{16} & \frac{1}{17} & \frac{1}{18} & \frac{1}{19} & \frac{1}{20} & \frac{1}{21} & \frac{1}{22} & \frac{1}{23}\\\frac{1}{10} & \frac{1}{11} & \frac{1}{12} & \frac{1}{13} & \frac{1}{14} & \frac{1}{15} & \frac{1}{16} & \frac{1}{17} & \frac{1}{18} & \frac{1}{19} & \frac{1}{20} & \frac{1}{21} & \frac{1}{22} & \frac{1}{23} & \frac{1}{24}\\\frac{1}{11} & \frac{1}{12} & \frac{1}{13} & \frac{1}{14} & \frac{1}{15} & \frac{1}{16} & \frac{1}{17} & \frac{1}{18} & \frac{1}{19} & \frac{1}{20} & \frac{1}{21} & \frac{1}{22} & \frac{1}{23} & \frac{1}{24} & \frac{1}{25}\\\frac{1}{12} & \frac{1}{13} & \frac{1}{14} & \frac{1}{15} & \frac{1}{16} & \frac{1}{17} & \frac{1}{18} & \frac{1}{19} & \frac{1}{20} & \frac{1}{21} & \frac{1}{22} & \frac{1}{23} & \frac{1}{24} & \frac{1}{25} & \frac{1}{26}\\\frac{1}{13} & \frac{1}{14} & \frac{1}{15} & \frac{1}{16} & \frac{1}{17} & \frac{1}{18} & \frac{1}{19} & \frac{1}{20} & \frac{1}{21} & \frac{1}{22} & \frac{1}{23} & \frac{1}{24} & \frac{1}{25} & \frac{1}{26} & \frac{1}{27}\\\frac{1}{14} & \frac{1}{15} & \frac{1}{16} & \frac{1}{17} & \frac{1}{18} & \frac{1}{19} & \frac{1}{20} & \frac{1}{21} & \frac{1}{22} & \frac{1}{23} & \frac{1}{24} & \frac{1}{25} & \frac{1}{26} & \frac{1}{27} & \frac{1}{28}\\\frac{1}{15} & \frac{1}{16} & \frac{1}{17} & \frac{1}{18} & \frac{1}{19} & \frac{1}{20} & \frac{1}{21} & \frac{1}{22} & \frac{1}{23} & \frac{1}{24} & \frac{1}{25} & \frac{1}{26} & \frac{1}{27} & \frac{1}{28} & \frac{1}{29}\end{array}\right]$

In [62]:

Out[62]:

array([[1.        , 0.5       , 0.33333333, 0.25      , 0.2       ,
        0.16666667, 0.14285714, 0.125     , 0.11111111, 0.1       ,
        0.09090909, 0.08333333, 0.07692308, 0.07142857, 0.06666667],
       [0.5       , 0.33333333, 0.25      , 0.2       , 0.16666667,
        0.14285714, 0.125     , 0.11111111, 0.1       , 0.09090909,
        0.08333333, 0.07692308, 0.07142857, 0.06666667, 0.0625    ],
       [0.33333333, 0.25      , 0.2       , 0.16666667, 0.14285714,
        0.125     , 0.11111111, 0.1       , 0.09090909, 0.08333333,
        0.07692308, 0.07142857, 0.06666667, 0.0625    , 0.05882353],
       [0.25      , 0.2       , 0.16666667, 0.14285714, 0.125     ,
        0.11111111, 0.1       , 0.09090909, 0.08333333, 0.07692308,
        0.07142857, 0.06666667, 0.0625    , 0.05882353, 0.05555556],
       [0.2       , 0.16666667, 0.14285714, 0.125     , 0.11111111,
        0.1       , 0.09090909, 0.08333333, 0.07692308, 0.07142857,
        0.06666667, 0.0625    , 0.05882353, 0.05555556, 0.05263158],
       [0.16666667, 0.14285714, 0.125     , 0.11111111, 0.1       ,
        0.09090909, 0.08333333, 0.07692308, 0.07142857, 0.06666667,
        0.0625    , 0.05882353, 0.05555556, 0.05263158, 0.05      ],
       [0.14285714, 0.125     , 0.11111111, 0.1       , 0.09090909,
        0.08333333, 0.07692308, 0.07142857, 0.06666667, 0.0625    ,
        0.05882353, 0.05555556, 0.05263158, 0.05      , 0.04761905],
       [0.125     , 0.11111111, 0.1       , 0.09090909, 0.08333333,
        0.07692308, 0.07142857, 0.06666667, 0.0625    , 0.05882353,
        0.05555556, 0.05263158, 0.05      , 0.04761905, 0.04545455],
       [0.11111111, 0.1       , 0.09090909, 0.08333333, 0.07692308,
        0.07142857, 0.06666667, 0.0625    , 0.05882353, 0.05555556,
        0.05263158, 0.05      , 0.04761905, 0.04545455, 0.04347826],
       [0.1       , 0.09090909, 0.08333333, 0.07692308, 0.07142857,
        0.06666667, 0.0625    , 0.05882353, 0.05555556, 0.05263158,
        0.05      , 0.04761905, 0.04545455, 0.04347826, 0.04166667],
       [0.09090909, 0.08333333, 0.07692308, 0.07142857, 0.06666667,
        0.0625    , 0.05882353, 0.05555556, 0.05263158, 0.05      ,
        0.04761905, 0.04545455, 0.04347826, 0.04166667, 0.04      ],
       [0.08333333, 0.07692308, 0.07142857, 0.06666667, 0.0625    ,
        0.05882353, 0.05555556, 0.05263158, 0.05      , 0.04761905,
        0.04545455, 0.04347826, 0.04166667, 0.04      , 0.03846154],
       [0.07692308, 0.07142857, 0.06666667, 0.0625    , 0.05882353,
        0.05555556, 0.05263158, 0.05      , 0.04761905, 0.04545455,
        0.04347826, 0.04166667, 0.04      , 0.03846154, 0.03703704],
       [0.07142857, 0.06666667, 0.0625    , 0.05882353, 0.05555556,
        0.05263158, 0.05      , 0.04761905, 0.04545455, 0.04347826,
        0.04166667, 0.04      , 0.03846154, 0.03703704, 0.03571429],
       [0.06666667, 0.0625    , 0.05882353, 0.05555556, 0.05263158,
        0.05      , 0.04761905, 0.04545455, 0.04347826, 0.04166667,
        0.04      , 0.03846154, 0.03703704, 0.03571429, 0.03448276]])

Condition number of a problem¶

Conditioning of a problem¶

Conditioning of a problem is defined without reference to any particular algorithm.

A well-conditioned problem $f(x)$: all small perturbations of $x$ lead to small changes in $f(x)$.

An ill-conditioned problem $f(x)$: some small perturbation of $x$ leads to large changes in $f(x)$.

The absolute condition number of a problem $f(x)$ is:

$$\kappa = \lim _{\varepsilon \rightarrow 0}\,\sup _{\|\delta x\|\,\leq \,\varepsilon }{\frac {\|\delta f(x)\|}{\|\delta x\|}}$$

and the relative condition number is:

$$\kappa = \lim _{\varepsilon \rightarrow 0}\,\sup _{\|\delta x\|\,\leq \,\varepsilon }{\frac {\|\delta f(x)\|/\|f(x)\|}{\|\delta x\|/\|x\|}}.$$

Relative $\kappa$ is normally more important (its at least dimensionless).

If the function $f(x)$ is differentiable: $$ \kappa = \frac{\Vert J(x) \Vert}{\Vert f(x)\Vert / \Vert x \Vert}, $$ where $J$ is Jacobian matrix: $J _{ij}={\frac {\partial f_{i}}{\partial x_{j}}}$.

Well- and ill-conditioned problems¶

A problem is well-conditioned if $\kappa$ is small ($\lesssim 10^2$) and ill-conditioned if $\kappa$ is large ($\gtrsim 10^{6}$)

Consider a problem of computing $f(x)=\sqrt{x}$: $$\kappa = \frac{\Vert J(x) \Vert}{\Vert f(x)\Vert / \Vert x \Vert} = \frac{1/(2\sqrt{x})}{\sqrt{x}/x}=\frac12.$$ This problem is well-conditioned by any standard.

Consider computing $f(x)=\tan x$ for $x=10^{100}$. The problem is ill-conditioned. Why?

Surely You're Joking, Mr. Feynman

Condition numbers in linear algebra computations¶

In matrix computations, the basic operations are computing $f(x) = A x$ and computing $g(x) = A^{-1}x$ (that is, solving a linear system)

They have the condition numbers $\kappa_f(x) = \Vert A \Vert \frac{\Vert x \Vert}{\Vert A x \Vert}$ and $\kappa_g(x)=\Vert A^{-1} \Vert \frac{\Vert A x \Vert}{\Vert x \Vert}$

The worst-case bounds (independent on $x$): $$ \kappa_{f,g}(x) \leq \kappa(A), $$ where the right-hand-side is known as a condition number of a matrix $A$: $$\kappa(A) = \Vert A \Vert \Vert A^{-1} \Vert.$$

Note, that the condition number is different for different norms. Which are worst-case vectors $x$ for $f(x)$ and $g(x)$ and $L_2$ norm?

Condition number of a linear system¶

Now consider the perturbation of a system of linear equations: from $$ A x = f$$ to $$ (A + \Delta A) (x + \Delta x) = f + \Delta f.$$

Perturbation to the solution:

$$ \frac{\Vert \Delta x \Vert}{\Vert x \Vert} \leq \frac{\mathrm{cond}(A)}{1 - \mathrm{cond}(A)\frac{\|\Delta A\|}{\|A\|}} \Big(\frac{\Vert\Delta A\Vert}{\Vert A \Vert} + \frac{\Vert \Delta f \Vert}{ \Vert f \Vert}\Big) $$

The crucial role is played by the condition number $\kappa(A) = \Vert A \Vert \Vert A^{-1} \Vert$.

The larger the condition number, the less number of digits we can recover.

Note, that if $\Delta A = 0$, then

$$ \frac{\Vert \Delta x \Vert}{\Vert x \Vert} \leq \mathrm{cond}(A) \frac{\|\Delta f\|}{\|f\|} $$

The condition number in spectral norm is equal to the ratio of the largest singular value and the smallest singular value.

$$ \mathrm{cond}_2 (A) = \|A\|_2 \|A^{-1}\|_2 = \frac{\sigma_{\max}}{\sigma_{\min}} $$

Hilbert matrix (again)¶

In [87]:

n = 15
A = hilbert_fl(n)
np.linalg.cond(A)

Out[87]:

2.495951750009794e+17

The Hilbert matrix is very poorly conditioned, indeed. This implies that solution of the equation $Ax = b$ for x is very sensitive to the small changes in $A$ and $b$. Here it is time to recall how exactly are real numbers represented in a CPU.

Floating point¶

The numbers in computer memory are typically represented as floating point numbers

A floating point number is represented as

$$\textrm{number} = \textrm{significand} \times \textrm{base}^{\textrm{exponent}},$$

where significand is integer, base is positive integer and exponent is integer (can be negative), i.e.

$$ 1.2 = 12 \cdot 10^{-1}.$$

IEEE 754¶

In modern computers, the floating point representation is controlled by IEEE 754 standard which was published in 1985. Before that point different computers behaved differently with floating point numbers.

IEEE 754 has:

Floating point representation (as described above), $(-1)^s \times c \times b^q$.
Two infinities, $+\infty$ and $-\infty$
Rules for assigning kinds of NaN
Rules for rounding
Rules for $\frac{0}{0}, \frac{1}{-0}, \ldots$

The two most common format, single & double¶

The two most common formats, called binary32 and binary64 (called also single and double formats). Recently, the format binary16 plays important role in learning deep neural networks.

Name	Common Name	Base	Digits	Emin	Emax
binary16	half precision	2	11	-14	+ 15
binary32	single precision	2	24	-126	+ 127
binary64	double precision	2	53	-1022	+1023

For all real $x$ (in an admissible range, such as $[1.1\cdot 10^{-38}, 3.4\cdot 10^{38}]$ for single precision) there exists floating point number $\tilde x$ such that $|x-\tilde x| \leq |x|\epsilon_{machine}$

For single precision: $$\epsilon_{machine} = 2^{-24}\approx 5.96\cdot 10^{-8}$$
For double precision: $$\epsilon_{machine} = 2^{-53}\approx 1.11\cdot 10^{-16}$$

For single precision, we have $8388608$ numbers in $(1,2)$ which is quite dense. But we still have the same $8388608$ numbers in $(2^{126}, 2^{127})$ -- much larger interval! At each arithmetic operation, the result will typically not fit into the grid and will be replaced by some proxy from the grid. This can be disastrous for the final result if accumulates as a result of careless programming.

Accuracy and memory¶

The relative accuracy of single precision is $10^{-7}-10^{-8}$, while for double precision is $10^{-14}-10^{-16}$.

A float16 takes 2 bytes, float32 takes 4 bytes, float64, or double precision, takes 8 bytes
These are the only two floating point-types supported in hardware (float32 and float64).
Normally, double precision is for Scientific Computing and float is for GPU/Data Science.

Accuracy demos¶

In [92]:

z = np.arange(0.5, 0.9, 0.1)
print(z)
print(z[-1])

[0.5 0.6 0.7 0.8]
0.7999999999999999

In [101]:

a = np.float32(6.0)
b = np.sqrt(a)
print(b ** 2 - a)

4.3670121385730454e-07

In [102]:

a = np.array(0.4147369269524216, dtype=np.float32)
b = np.exp(a)
print(np.log(b) - a)

0.0

Loss of significance¶

Many operations lead to the loss of significance
For example, it is a bad idea to subtract two big numbers that are close, the difference will have fewer correct digits
This is related to algorithms and their properties (stability)

In [103]:

import math
test_list = [1, 1e20, 1, -1e20]
print(np.sum(test_list))
print(1 + 1e20 + 1 - 1e20)
print(math.fsum(test_list))

0.0
0.0
2.0

In [104]:

print(1e20 - 1e20 + 1 + 1)

2.0

Some disasters attibuted to bad numerical computing¶

Vancouver Stock Exchange Index computation error: in January 1982 the index was initialized at 1000 and subsequently updated and truncated to three decimal places on each trade. Such a thing was done about 3000 times each day. The accumulated truncations led to an erroneous loss of around 25 points per month. On 25 Nov 1983, the error was corrected, raising the value of the index from its Friday closing figure of 524 to 1098.
The Patriot Missile failure, happened In Dharan, Saudi Arabia, on February 25, 1991, resulted in 28 deaths, is ultimately attributable to poor handling of rounding errors.
The explosion of the Ariane 5 rocket just after lift-off on its maiden voyage off French Guiana, on June 4, 1996, was ultimately the consequence of a simple overflow.
The sinking of the Sleipner A offshore platform in Gandsfjorden near Stavanger, Norway, on August 23, 1991, resulted in a loss of nearly one billion dollars. It was found to be the result of inaccurate finite element analysis.

Stability of algorithms¶

Let $x$ be an object (for example, a vector)
Let $f(x)$ be the function (functional) you want to evaluate
You have a numerical algorithm working in finite precision arithmetics will deliver an approximation $alg(x)$ to the correct result $f(x)$

The algorithm is called forward stable, if $$\Vert alg(x) - f(x) \Vert \leq \varepsilon $$

The algorithm is called backward stable, if for any $x$ there is a close vector $x + \delta x$ such that

$$alg(x) = f(x + \delta x)$$

and $\Vert \delta x \Vert$ is small.

The algorithm is called stable if it gives approximately correct answers for approximately correct problems: for each $x$ there exists $\tilde x$ such that $\Vert x-\tilde x\Vert = \mathcal{O}(\epsilon_{machine})$ such that $$ \Vert alg(x) - f(\tilde x) \Vert = \mathcal{O}(\epsilon_{machine}) $$

The best algorithms usually compute exact answers for slightly perturbed data. If the output is not too sentitive to input data (characterized by the condition number), then this is sufficient in practice.

Examples of instability - 1¶

How to compute the following function in numerically stable manner?

$$\log(1 - \tanh^2(u))$$

In [107]:

u = 30
print("Original function:", np.log(1 - np.tanh(u)**2))

Original function: -inf

C:\Users\Zenon\AppData\Local\Temp\ipykernel_21092\1730504921.py:2: RuntimeWarning: divide by zero encountered in log
  print("Original function:", np.log(1 - np.tanh(u)**2))

In [109]:

print("Use more numerically stable form:", np.log(4) - 2 * np.log(np.exp(-u) + np.exp(u)))

Use more numerically stable form: -58.61370563888011

Examples of instability - 2¶

How to compute the following function in numerically stable manner? $$SoftMax(x)_j = \dfrac{e^{x_j}}{\sum\limits_{i=1}^n e^{x_i}}$$

In [118]:

n = 5
x = np.random.randn(n)
x[0] = 10

In [119]:

print(np.exp(x) / np.sum(np.exp(x)))

[9.99869717e-01 4.25877087e-05 1.71219662e-05 4.14169156e-05
 2.91559416e-05]

In [120]:

Out[120]:

array([10.        , -0.06381458, -0.97501805, -0.09169088, -0.44272154])

In [121]:

print(np.exp(x - np.max(x)) / np.sum(np.exp(x - np.max(x))))

[9.99869717e-01 4.25877087e-05 1.71219662e-05 4.14169156e-05
 2.91559416e-05]

Linear systems¶

Scales of linear systems¶

In different applications, the typical size of the linear systems can be different.

Small: $n \leq 10^4$ (full matrix can be stored in memory, dense matrix)
Medium: $n = 10^4 - 10^6$ (typically, sparse or structured matrix)
Large: $n = 10^8 - 10^9$ (typically sparse matrix + parallel computations)

We will talk only about approaches to small systems today.

Main questions about linear systems¶

What is the accuracy we get from the solution (due to rounding errors)?
How we compute the solution? (using the Cramer's rule with dets is good only for $2 \times 2$ matrices)
What is the complexity of the solution of linear systems?

Solving $Ax=b$ via QR factorization¶

Factor $A$ into $QR$: $A = QR$
$Ax=b$ is equivalent to $QRx=b$ that is $Rx = y$ where $y = Q^*b$
Solve triangular system $Rx=y$

The algorithm is stable. It means that the solution $\tilde x$ computed by this algorithm satisfies: $$ \frac{\Vert\tilde x - x\Vert}{\Vert x\Vert} = \mathcal{O}(\kappa(A)\epsilon_{machine}) $$
Solving linear system with this method takes $2n^3$ flops [dominated by complexity of QR]

Solving $Ax=b$ via Gaussian elimination¶

Gaussian elimination proceeds via the computation of one of the most important matrix decompositions: LU-decomposition.
LU-decomposition of the square matrix $A$ is the representation

$$A = LU,$$

where

$L$ is lower triangular (elements strictly above the diagonal are zero)
$U$ is upper triangular matrix (elements strictly below the diagonal are zero)

$$ \begin{bmatrix}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33}\end{bmatrix}=\begin{bmatrix}\ell _{11}&0&0\\\ell _{21}&\ell _{22}&0\\\ell _{31}&\ell _{32}&\ell _{33}\end{bmatrix}\begin{bmatrix}u_{11}&u_{12}&u_{13}\\0&u_{22}&u_{23}\\0&0&u_{33}\end{bmatrix} $$

This factorization is non-unique, so it is typical to require that the matrix $L$ has ones on the diagonal.

LU decomposition is useful to solve a linear system since

$$ A^{-1} f = (L U)^{-1} f = U^{-1} L^{-1} f, $$

and this reduces to the solution of two linear systems forward step

$$ L y = f, $$

and backward step

$$ U x = y. $$

Complexity of the Gaussian elimination/LU decomposition¶

The complexity is dominated by LU decomposition, which takes $\frac23 n^3$: several times faster than QR
In many cases, when solution with multiple right hand sides is required, computing LU-decomposition once is a good idea
Once the decomposition is found then solving linear systems with $L$ and $U$ costs only $\mathcal{O}(n^2)$ operations.

Computing LU decomposition¶

When LU fails¶

What happens, if pivots in the Gaussian elimination are really small or zero?.
There is classical $2 \times 2$ example of a matrix with a bad LU decomposition.
The matrix we look at is

$$ A = \begin{pmatrix} \varepsilon & 1 \\ 1 & 1 \end{pmatrix} $$

If $\varepsilon$ is sufficiently small, we might fail.

In [170]:

import numpy as np
eps = 1e-18
A = np.array([[eps, 1],[1.0,  1]])
A0 = A.copy()
n = A.shape[0]
L = np.zeros((n, n))
U = np.zeros((n, n))
for k in range(n):   
    L[k, k] = 1
    for i in range(k+1, n):
        L[i, k] = A[i, k] / A[k, k]
        for j in range(k+1, n):
            A[i, j] = A[i, j] - L[i, k] * A[k, j]
    for j in range(k, n):
        U[k, j] = A[k, j]
print('L:\n', L)
print('U:\n', U)
print('A:\n', A0)
print('L * U - A:\n', np.dot(L, U) - A0)

L:
 [[1.e+00 0.e+00]
 [1.e+18 1.e+00]]
U:
 [[ 1.e-18  1.e+00]
 [ 0.e+00 -1.e+18]]
A:
 [[1.e-18 1.e+00]
 [1.e+00 1.e+00]]
L * U - A:
 [[ 0.00000000e+00  0.00000000e+00]
 [-1.11022302e-16 -1.00000000e+00]]

The concept of pivoting¶

We can do pivoting, i.e. permute rows and columns to maximize $A_{kk}$ that we divide over.
The simplest but effective strategy is the row pivoting: at each step, select the index that is maximal in modulus, and put it onto the diagonal.
It gives us the decomposition

$$A = P L U,$$

where $P$ is a permutation matrix.

Pivoting ensures $| L_{ij}|<1,$ but the elements of $U$ can grow, up to $2^n$. Theoretically, gaussian elimination is unstable, though (in practice, this is almost never encountered).

Linear equations: Summary¶

Linear systems can be solved by Gaussian elimination, complexity is $\mathcal{O}(n^3)$.
Linear systems can be solved by LU-decomposition, complexity is $\mathcal{O}(n^3)$ for the decomposition, $\mathcal{O}(n^2)$ for each solve
Linear least squares can be solved by normal equations (bad)
Linear least squares can be solved by QR-decomposition (good)
Without structure, we can solve up to $10^4$ linear systems on a laptop (memory restrictions)

We will discuss iterative methods for linear systems in what follows.

52 KiB Raw Permalink Blame History