kmath/docs/nd-structure.md

# ND-structure generation and operations

**TODO**

# Performance for n-dimensional structures operations

One of the most sought after features of mathematical libraries is the high-performance operations on n-dimensional
structures. In `kmath` performance depends on which particular context was used for operation.

Let us consider following contexts:

```kotlin
    // automatically build context most suited for given type.
    val autoField = NDField.auto(DoubleField, dim, dim)
    // specialized nd-field for Double. It works as generic Double field as well.
    val specializedField = NDField.real(dim, dim)
    //A generic boxing field. It should be used for objects, not primitives.
    val genericField = NDField.buffered(DoubleField, dim, dim)
```

Now let us perform several tests and see, which implementation is best suited for each case:

## Test case

To test performance we will take 2d-structures with `dim = 1000` and add a structure filled with `1.0`
to it `n = 1000` times.

## Specialized

The code to run this looks like:

```kotlin
    specializedField.run {
        var res: NDBuffer<Double> = one
        repeat(n) {
            res += 1.0
        }
    }
```

The performance of this code is the best of all tests since it inlines all operations and is specialized for operation
with doubles. We will measure everything else relative to this one, so time for this test will be `1x` (real time
on my computer is about 4.5 seconds). The only problem with this approach is that it requires specifying type
from the beginning. Everyone does so anyway, so it is the recommended approach.

## Automatic

Let's do the same with automatic field inference:

```kotlin
    autoField.run {
        var res = one
        repeat(n) {
            res += 1.0
        }
    }
```

Ths speed of this operation is approximately the same as for specialized case since `NDField.auto` just
returns the same `RealNDField` in this case. Of course, it is usually better to use specialized method to be sure.

## Lazy

Lazy field does not produce a structure when asked, instead it generates an empty structure and fills it on-demand
using coroutines to parallelize computations.
When one calls

```kotlin
    lazyField.run {
        var res = one
        repeat(n) {
            res += 1.0
        }
    }
```

The result will be calculated almost immediately but the result will be empty. To get the full result
structure one needs to call all its elements. In this case computation overhead will be huge. So this field never
should be used if one expects to use the full result structure. Though if one wants only small fraction, it could
save a lot of time.

This field still could be used with reasonable performance if call code is changed:

```kotlin
    lazyField.run {
        val res = one.map {
            var c = 0.0
            repeat(n) {
                c += 1.0
            }
            c
        }

        res.elements().forEach { it.second }
    }
```

In this case it completes in about `4x-5x` time due to boxing.

## Boxing

The boxing field produced by

```kotlin
    genericField.run {
        var res: NDBuffer<Double> = one
        repeat(n) {
            res += 1.0
        }
    }
```

is the slowest one, because it requires boxing and unboxing the `double` on each operation. It takes about
`15x` time (**TODO: there seems to be a problem here, it should be slow, but not that slow**). This field should
never be used for primitives.

## Element operation

Let us also check the speed for direct operations on elements:

```kotlin
    var res = genericField.one
    repeat(n) {
        res += 1.0
    }
```

One would expect to be at least as slow as field operation, but in fact, this one takes only `2x` time to complete.
It happens, because in this particular case it does not use actual `NDField` but instead calculated directly
via extension function.

## What about python?

Usually it is bad idea to compare the direct numerical operation performance in different languages, but it hard to
work completely without frame of reference. In this case, simple numpy code:

```python
import numpy as np

res = np.ones((1000,1000))
for i in range(1000):
    res = res + 1.0
```

gives the completion time of about `1.1x`, which means that specialized kotlin code in fact is working faster (I think
it is
because better memory management). Of course if one writes `res += 1.0`, the performance will be different,
but it would be different case, because numpy overrides `+=` with in-place operations. In-place operations are
available in `kmath` with `MutableNDStructure` but there is no field for it (one can still work with mapping
functions).
Edit doc files, update readmes, document coroutines API 2020-08-08 11:51:04 +03:00			`# ND-structure generation and operations`
Documentation update. Bump version to 0.1.0-dev 2019-02-20 15:24:51 +03:00
			`TODO`

Buffer factories removed from global scope 2019-01-05 20:15:36 +03:00			`# Performance for n-dimensional structures operations`

			`One of the most sought after features of mathematical libraries is the high-performance operations on n-dimensional`
			structures. In `kmath` performance depends on which particular context was used for operation.

			`Let us consider following contexts:`
Reformat code 2024-03-27 09:11:12 +03:00
Buffer factories removed from global scope 2019-01-05 20:15:36 +03:00			```kotlin
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			`// automatically build context most suited for given type.`
Real -> Double 2021-03-16 21:17:26 +03:00			`val autoField = NDField.auto(DoubleField, dim, dim)`
Revise grammar of KDoc comments, refresh documentation files 2021-05-07 15:59:21 +03:00			`// specialized nd-field for Double. It works as generic Double field as well.`
Examples fix 2019-06-08 16:30:06 +03:00			`val specializedField = NDField.real(dim, dim)`
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			`//A generic boxing field. It should be used for objects, not primitives.`
Real -> Double 2021-03-16 21:17:26 +03:00			`val genericField = NDField.buffered(DoubleField, dim, dim)`
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			```
Reformat code 2024-03-27 09:11:12 +03:00
Revise grammar of KDoc comments, refresh documentation files 2021-05-07 15:59:21 +03:00			`Now let us perform several tests and see, which implementation is best suited for each case:`
Documentation for nd-performance 2019-01-07 17:18:31 +03:00
			`## Test case`

Revise grammar of KDoc comments, refresh documentation files 2021-05-07 15:59:21 +03:00			To test performance we will take 2d-structures with `dim = 1000` and add a structure filled with `1.0`
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			to it `n = 1000` times.

			`## Specialized`
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			`The code to run this looks like:`
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			```kotlin
			`specializedField.run {`
Examples fix 2019-06-08 16:30:06 +03:00			`var res: NDBuffer<Double> = one`
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			`repeat(n) {`
			`res += 1.0`
			`}`
			`}`
			```
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			`The performance of this code is the best of all tests since it inlines all operations and is specialized for operation`
			with doubles. We will measure everything else relative to this one, so time for this test will be `1x` (real time
Revise grammar of KDoc comments, refresh documentation files 2021-05-07 15:59:21 +03:00			`on my computer is about 4.5 seconds). The only problem with this approach is that it requires specifying type`
			`from the beginning. Everyone does so anyway, so it is the recommended approach.`
Documentation for nd-performance 2019-01-07 17:18:31 +03:00
			`## Automatic`
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			`Let's do the same with automatic field inference:`
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			```kotlin
			`autoField.run {`
			`var res = one`
			`repeat(n) {`
			`res += 1.0`
			`}`
			`}`
			```
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			Ths speed of this operation is approximately the same as for specialized case since `NDField.auto` just
Revise grammar of KDoc comments, refresh documentation files 2021-05-07 15:59:21 +03:00			returns the same `RealNDField` in this case. Of course, it is usually better to use specialized method to be sure.
Documentation for nd-performance 2019-01-07 17:18:31 +03:00
			`## Lazy`
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			`Lazy field does not produce a structure when asked, instead it generates an empty structure and fills it on-demand`
			`using coroutines to parallelize computations.`
			`When one calls`
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			```kotlin
			`lazyField.run {`
			`var res = one`
			`repeat(n) {`
			`res += 1.0`
			`}`
			`}`
			```
Reformat code 2024-03-27 09:11:12 +03:00
Revise grammar of KDoc comments, refresh documentation files 2021-05-07 15:59:21 +03:00			`The result will be calculated almost immediately but the result will be empty. To get the full result`
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			`structure one needs to call all its elements. In this case computation overhead will be huge. So this field never`
			`should be used if one expects to use the full result structure. Though if one wants only small fraction, it could`
			`save a lot of time.`

			`This field still could be used with reasonable performance if call code is changed:`
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			```kotlin
			`lazyField.run {`
			`val res = one.map {`
			`var c = 0.0`
			`repeat(n) {`
			`c += 1.0`
			`}`
			`c`
			`}`

			`res.elements().forEach { it.second }`
			`}`
			```
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			In this case it completes in about `4x-5x` time due to boxing.

			`## Boxing`
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			`The boxing field produced by`
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			```kotlin
			`genericField.run {`
Examples fix 2019-06-08 16:30:06 +03:00			`var res: NDBuffer<Double> = one`
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			`repeat(n) {`
			`res += 1.0`
			`}`
			`}`
			```
Reformat code 2024-03-27 09:11:12 +03:00
Revise grammar of KDoc comments, refresh documentation files 2021-05-07 15:59:21 +03:00			is the slowest one, because it requires boxing and unboxing the `double` on each operation. It takes about
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			`15x` time (TODO: there seems to be a problem here, it should be slow, but not that slow). This field should
			`never be used for primitives.`

			`## Element operation`
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			`Let us also check the speed for direct operations on elements:`
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			```kotlin
			`var res = genericField.one`
			`repeat(n) {`
			`res += 1.0`
			`}`
			```
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			One would expect to be at least as slow as field operation, but in fact, this one takes only `2x` time to complete.
			It happens, because in this particular case it does not use actual `NDField` but instead calculated directly
			`via extension function.`

			`## What about python?`

			`Usually it is bad idea to compare the direct numerical operation performance in different languages, but it hard to`
			`work completely without frame of reference. In this case, simple numpy code:`
Reformat code 2024-03-27 09:11:12 +03:00
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			```python
Add contracts to some functions, fix multiple style issues 2021-07-12 20:21:46 +03:00			`import numpy as np`

Documentation for nd-performance 2019-01-07 17:18:31 +03:00			`res = np.ones((1000,1000))`
			`for i in range(1000):`
			`res = res + 1.0`
			```
Reformat code 2024-03-27 09:11:12 +03:00
			gives the completion time of about `1.1x`, which means that specialized kotlin code in fact is working faster (I think
			`it is`
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			because better memory management). Of course if one writes `res += 1.0`, the performance will be different,
Revise grammar of KDoc comments, refresh documentation files 2021-05-07 15:59:21 +03:00			but it would be different case, because numpy overrides `+=` with in-place operations. In-place operations are
Documentation for nd-performance 2019-01-07 17:18:31 +03:00			available in `kmath` with `MutableNDStructure` but there is no field for it (one can still work with mapping
			`functions).`