WIP: feature/emd #521
Draft
teldufalsari
wants to merge 11 commits from
teldufalsari/kmath:feature/emd into dev
pull from: teldufalsari/kmath:feature/emd
merge into: kscience:dev
kscience:main
kscience:dev
kscience:add-contracts-to-operators
kscience:bug/KT-81618
kscience:beta/kotlin-2.3.0
kscience:lounres/polytopic-constructions
kscience:feature/forking-test
kscience:beta/kotlin-2.1.20
kscience:feature/random-fork
kscience:feature/metropolis
kscience:beta/kotlin-2.0.20
kscience:beta/kotlin2.0.0
kscience:dev-0.4
kscience:kotlin/1.9.20
kscience:feature/big-decimal
kscience:mrFendel/ejeny_branch_
kscience:kotlin/1.9.0-Beta
kscience:gh-pages
kscience:feature/mercator
kscience:apomytkina/hse_fix
kscience:feature/tensors-performance
kscience:feature/noa
kscience:commandertvis/kotlin-1.6.20
kscience:refactor/tensor-prefix
kscience:commandertvis/units-poc
kscience:ojalgo
kscience:commandertvis/nd4j
kscience:commandertvis/reduce
kscience:commandertvis/js-benchmark
kscience:feature/tensorflow
kscience:feature/series
kscience:feature/advanced-optimization
kscience:foreign-memory
kscience:commandertvis/generic-complex
kscience:commandertvis/torchscript
kscience:bug/suspend-inline
kscience:feature/torch
kscience:commandertvis/bignums
kscience:split-random
kscience:zelenyy
Labels
Clear labels
bug
dependencies
discussion
documentation
duplicate
feature
good first issue
misc
performance
question
test
use case
wontfix
Something isn't working
Pull requests that update a dependency file
Improvements or additions to documentation
This issue or pull request already exists
A new feature request
The issue awaits its hero. Contributions are welcome
Infrastructure tasks, cosmetic changes and miscellaneous tasks
Performance optimization issue
Further information is requested
Add tests for some functionality
A use case for library or feature
This will not be worked on
No Label
Milestone
No items
No Milestone
Projects
Clear projects
No project
Assignees
InsanusMokrassar
VMarkov
altavir
chernov
kolpakov.mm
lounres
pklimai
teldufalsari
zelenyy
zhig02
Clear assignees
No Assignees
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: kscience/kmath#521
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "teldufalsari/kmath:feature/emd"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Empirical Mode Decomposition
Implement the empirical mode decomposition algorithm. Supports several criteria for process termination, more may be added in future. See
emdDemo.ktfile (examples/series) for an example on how to use the algorithm.For now the algorithm work with
Doubleseries only, making it generic over a series algebra is the top priority.Description
The main algorithm is implemented in the
EmpiricalModeDEcompositionclass, which can be treated as a factory that produces decompositions. Parameters of a decomposition, such as termination criteria parameters and the number of modes to extract, are passed as parameters when creating an instance of the class. The user can then call thedecomposemethod to produce a decomposition result, which as of now consists of a decomposition itself (asList<Series<T>>) and a reason for termination (as anenumelement).Each empirical mode is extracted in an iterative process called sifting. The algorithm allows to determine the reason for sifting to terminate using a sealed interface (see
SiftingResultinterface documentation). There is no use for this other than determining when it is impossible to extract another mode from the signal (the only case when sifting fails completely with no valid value (mode) to be returned), so that place can ultimately be simplified to a simple null check. I decided to leave it hoping that use cases for this may arise, e.g. returning verbose statistics together with the decomposition itself.Issues encountered
SeriesAlgebra:mapWithLabelfunction was failing on series with a non-zero offset.PolynomialInterpolatortreats the right boundary exclusively. For that reason,SplineInterpolatordoes not work as its counterpart from Apache Commons, which returns the right boundary value properly. I don not know if this is intended behaviour. Also, many works that explain the empirical decomposition do not use the leftmost and the rightmost points of the series as knot point for constructing the envelopes.Instead, they use points that are extrapolated using the first or the last polynomials of the spline interpolation. This version seems to work fine too, but probably it could be beneficial to modify theActually, an intricate extrema padding algorithm is usually involved, soSplineInterpolatorclass to yield the points outside its boundaries if a special parameter is passed when creating an interpolator.SplineInterpolatorshould only properly return the value at its right boundary.Further work
Double);Seriesextensions: envelopes, extrema, median values; nonparametricstatistical tests like Lepage test or Cucconi test.
Acknowledgements
Most of the algorithm details are taken from the Python EMD library reference and the sources cited there.
feature/emdto WIP: feature/emdGood work! But there is a bit of work to do.
Also, I would like to see several (at least, 2 or 3) tests for the feature. And it will be cool to add short description of the features to docs (you can ask help about it here).
@@ -0,0 +10,4 @@import space.kscience.plotly.models.Scatterimport kotlin.math.sinfun main(): Unit = with(Double.seriesAlgebra()) {is enough if you import
import space.kscience.kmath.operations.invoke.@@ -0,0 +36,4 @@private val maxSiftIterations: Int = 20,private val siftingDelta: Double = 1e-2,private val nModes: Int = 6) where BA: BufferAlgebra<Double, *>, BA: RingOps<Buffer<Double>>, L : Number {Please, move the only type argument
L's bound toL's declaration cite:By the way, in general you'll have to use
TasL. Otherwise, eitherinterpolatecan not be resolved in snippetor there is no function to map
Lvalues toT.@@ -0,0 +39,4 @@) where BA: BufferAlgebra<Double, *>, BA: RingOps<Buffer<Double>>, L : Number {/*** Take a signal, construct an upper and a lower envelope, find the mean value of two,Possible typos fix suggestion:
@@ -0,0 +43,4 @@* represented as a series.* @param signal Signal to compute on** @return null is returned if the signal has not enough extrema to construct envelopes.I suggest adding a sentence that describes what is actually returned. And possible typos fix as well.
@@ -0,0 +45,4 @@** @return null is returned if the signal has not enough extrema to construct envelopes.*/private fun findMean(signal: Series<Double>): Series<Double>? = with(seriesAlgebra) {Just this is enough. Because there is
invokeextension operator implemented that is imported here.@@ -0,0 +64,4 @@return if (maxima.size < 3 || minima.size < 3) null else {val upperEnvelope = generateEnvelope(maxima)val lowerEnvelope = generateEnvelope(minima)return (upperEnvelope + lowerEnvelope).map { it * 0.5 }@@ -0,0 +76,4 @@* @return [SiftingResult.NotEnoughExtrema] is returned if the signal has too few extrema to extract a mode.* Success of an appropriate type (See [SiftingResult.Success] class) is returned otherwise.*/private fun sift(signal: Series<Double>): SiftingResult = with(seriesAlgebra) {As well as I understand, whole body of the function can be replaced with just
Also
siftInnershould be markedtailrec, shouldn't it?Yeah, it is better if the function is marked
tailrec. But I am not sure if compiler understands the case. So I need a bit of time for a small test.It works. With
tailrecit does not produce stack overflow when I run sifting with 5000 iterations per mode@@ -0,0 +98,4 @@val mean = findMean(prevMode) ?: return SiftingResult.SignalFlattened(prevMode)val mode = prevMode.zip(mean) { p, m -> p - m }val newSNumber = if (mode.sCondition()) sNumber + 1 else sNumberreturn if (iterationNumber >= maxSiftIterations) SiftingResult.MaxIterationsReached(mode)Gosh. Use
wheninstead of long if-else sequence, please:It's idiom, and it's clearer to read.
@@ -0,0 +100,4 @@val newSNumber = if (mode.sCondition()) sNumber + 1 else sNumberreturn if (iterationNumber >= maxSiftIterations) SiftingResult.MaxIterationsReached(mode)else if (sNumber >= sConditionThreshold) SiftingResult.SNumberReached(mode)else if (relativeDifference(prevMode, mode) < siftingDelta * mode.size) SiftingResult.DeltaReached(mode)Is it intended that previous mode is assigned to
currentparameter ofrelativeDifferenceand current (new) mode is assigned topreviousparameter ofrelativeDifference? The parameters names say that there is a swap of parameters.I would use explicit parameters' assignation:
@@ -0,0 +112,4 @@* Modes returned in a list which contains as many modes as it was possible* to extract before triggering one of the termination conditions.*/public fun decompose(signal: Series<Double>): EMDecompositionResult = with(seriesAlgebra) {The whole function can be rewritten in such way:
It's shorter but as readable as the previous version.
@@ -0,0 +121,4 @@}modes.add(mode)var residual = signal.zip(mode) { l, r -> l - r }You should use
SeriesAlgebra'selementAlgebrain getting difference of twoDoublevalues instead ofl - r. And in 8 strings below too.@@ -0,0 +168,4 @@*/private fun Series<Double>.countZeros(): Int {require(size >= 2) { "Expected series with at least 2 elements, but got $size elements" }return fold(Pair(get(0), 0)) { acc: Pair<Double, Int>, it: Double ->Do not use
Pair! It's non-idiomatic in Kotlin. It is really hard to always keep in mind whatfirstandsecondfields hold. Instead, define and use custom data class with understandable name and understandable parameters' names.Is making a one-line data class declaration above considered idiomatic?
Yeah, that's the idiomatic way. But place it outside the function. Otherwise, you won't able to access it :)
@@ -0,0 +169,4 @@private fun Series<Double>.countZeros(): Int {require(size >= 2) { "Expected series with at least 2 elements, but got $size elements" }return fold(Pair(get(0), 0)) { acc: Pair<Double, Int>, it: Double ->if (sign(acc.first) != sign(it)) Pair(it, acc.second + 1) else Pair(it, acc.second)It's better to store sign instead of the value which sign is compared, to not compute it each iteration.
@@ -0,0 +179,4 @@private fun <L, BA> SeriesAlgebra<Double, *, BA, L>.relativeDifference(current: Series<Double>,previous: Series<Double>):Double where BA: BufferAlgebra<Double, *>, BA: RingOps<Buffer<Double>>, L: Number {Use
=notation for inline bodies, please:Also:
Lis not used, so I removed it.SeriesAlgebra'selementAlgebrais needed. So I replaced it. It will also help with refactoring fromDouble"algebras" to general algebras.foldusesDoubleas a type parameter, so boxing is not avoided. I would replace.fold(0.0) { acc, d -> acc + d }with.sum(elementAlgebra)but there is no such operation for some reason.I also wondered why there is no
.sum()method. I could implement it for a 1-d series, but doing it for a general buffer seems a bit too much if there is need for arbitrary axis like in NumPy@altavir There is a need for a function
Buffer<T>.sum(elementAlgebra: Group<T>). Where should we place it?@@ -0,0 +193,4 @@*/private fun Series<Double>.countExtrema(): Int {require(size >= 3) { "Expected series with at least 3 elements, but got $size elements" }return slice(1..< size - 1).asIterable().foldIndexed(0) { index, acc, it ->I would recommend writing
@@ -0,0 +203,4 @@* Retrieve indices of knot points used to construct an upper envelope,* namely maxima together with the first last point in a series.*/private fun<T> Series<T>.maxima(): List<Int> where T: Comparable<T> {@@ -0,0 +206,4 @@private fun<T> Series<T>.maxima(): List<Int> where T: Comparable<T> {require(size >= 3) { "Expected series with at least 3 elements, but got $size elements" }val maxima = mutableListOf(0)slice(1..< size - 1).asIterable().forEachIndexed { index, it ->I would recommend rewriting it with old good plain loop on indices:
or using
Also, is it intended that the spline will ignore double extrema?
I mean for series
[0.0, -1.0, -1.0, 1.0, 1.0, -1.0, -1.0, 0.0]there will be no maxima and minima points but the end points.No, I'm planning on improving this function and making it
publicplaced inSeriesExtensions.kt@@ -0,0 +207,4 @@require(size >= 3) { "Expected series with at least 3 elements, but got $size elements" }val maxima = mutableListOf(0)slice(1..< size - 1).asIterable().forEachIndexed { index, it ->if (it > get(index) && it > get(index + 2)) { // weird offset, is there a way to do it better?Could you comment what question
// weird offset, is there a way to do it better?means in more detail?@@ -0,0 +219,4 @@* Retrieve indices of knot points used to construct a lower envelope,* namely minima together with the first last point in a series.*/private fun<T> Series<T>.minima(): List<Int> where T: Comparable<T> {@@ -0,0 +222,4 @@private fun<T> Series<T>.minima(): List<Int> where T: Comparable<T> {require(size >= 3) { "Expected series with at least 3 elements, but got $size elements" }val minima = mutableListOf(0)slice(1..< size - 1).asIterable().forEachIndexed { index, it ->I would recommend rewriting it with old good plain loop on indices:
or using
Also, similar question to this one.
@@ -0,0 +243,4 @@* Represents a condition when a mode has been successfully* extracted in a sifting process.*/open class Success(val result: Series<Double>): SiftingResultI don't understand what the
Successinheritors are for. I mean, they are all instantiated but never distinguished. All of them can be used only internally, but are cast toSuccessanyway.Thank you kindly for the review. I have submitted corrections for the majority of issues. I'll notify you when I make significant improvements to the algorithm or the code in general.
View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.