Implement much faster dot product algorithm for tensors #460

2022-02-17T11:47:49+03:00

ivandev0 commented

2022-02-17 11:47:49 +03:00

(Migrated from github.com)

It is very simple optimization. Using async profiler I found that get method for MutableStructure is bottle neck because of excess work with array index. So I replaced it with direct access to buffer.
Before optimization:

Warm-up 1: 9.578 ops/s
Iteration 1: 11.363 ops/s
Iteration 2: 13.047 ops/s
Iteration 3: 13.005 ops/s
Iteration 4: 13.100 ops/s
Iteration 5: 13.213 ops/s

12.746 ±(99.9%) 2.992 ops/s [Average]
  (min, avg, max) = (11.363, 12.746, 13.213), stdev = 0.777
  CI (99.9%): [9.754, 15.738] (assumes normal distribution)


jvm summary:
Benchmark                      Mode  Cnt   Score   Error  Units
DotBenchmark.doubleTensorDot  thrpt    5  12.746 ± 2.992  ops/s

With optimization:

Warm-up 1: 1011.034 ops/s
Iteration 1: 1118.790 ops/s
Iteration 2: 1125.854 ops/s
Iteration 3: 1130.888 ops/s
Iteration 4: 1148.842 ops/s
Iteration 5: 1149.498 ops/s

1134.774 ±(99.9%) 53.246 ops/s [Average]
  (min, avg, max) = (1118.790, 1134.774, 1149.498), stdev = 13.828
  CI (99.9%): [1081.528, 1188.021] (assumes normal distribution)


jvm summary:
Benchmark                      Mode  Cnt     Score    Error  Units
DotBenchmark.doubleTensorDot  thrpt    5  1134.774 ± 53.246  ops/s

Total increase in performance is around 100x...
Correctness of dot product was checked using space.kscience.kmath.tensors.core.TestDoubleTensorAlgebra#testDot test.

It is very simple optimization. Using async profiler I found that get method for MutableStructure is bottle neck because of excess work with array index. So I replaced it with direct access to buffer. Before optimization: ``` Warm-up 1: 9.578 ops/s Iteration 1: 11.363 ops/s Iteration 2: 13.047 ops/s Iteration 3: 13.005 ops/s Iteration 4: 13.100 ops/s Iteration 5: 13.213 ops/s 12.746 ±(99.9%) 2.992 ops/s [Average] (min, avg, max) = (11.363, 12.746, 13.213), stdev = 0.777 CI (99.9%): [9.754, 15.738] (assumes normal distribution) jvm summary: Benchmark Mode Cnt Score Error Units DotBenchmark.doubleTensorDot thrpt 5 12.746 ± 2.992 ops/s ``` With optimization: ``` Warm-up 1: 1011.034 ops/s Iteration 1: 1118.790 ops/s Iteration 2: 1125.854 ops/s Iteration 3: 1130.888 ops/s Iteration 4: 1148.842 ops/s Iteration 5: 1149.498 ops/s 1134.774 ±(99.9%) 53.246 ops/s [Average] (min, avg, max) = (1118.790, 1134.774, 1149.498), stdev = 13.828 CI (99.9%): [1081.528, 1188.021] (assumes normal distribution) jvm summary: Benchmark Mode Cnt Score Error Units DotBenchmark.doubleTensorDot thrpt 5 1134.774 ± 53.246 ops/s ``` Total increase in performance is around 100x... Correctness of dot product was checked using space.kscience.kmath.tensors.core.TestDoubleTensorAlgebra#testDot test.

🚀 1

altavir (Migrated from github.com) reviewed 2022-02-17 11:47:49 +03:00

grinisrit (Migrated from github.com) approved these changes 2022-02-17 19:52:58 +03:00

CommanderTvis (Migrated from github.com) approved these changes 2022-02-17 20:13:17 +03:00

Sign in to join this conversation.

No reviewers

altavir

grinisrit

CommanderTvis