WIP: feature/emd #521
Labels
No Label
bug
dependencies
discussion
documentation
duplicate
feature
good first issue
misc
performance
question
test
use case
wontfix
No Milestone
No project
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: kscience/kmath#521
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "teldufalsari/kmath:feature/emd"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Empirical Mode Decomposition
Implement the empirical mode decomposition algorithm. Supports several criteria for process termination, more may be added in future. See
emdDemo.kt
file (examples/series
) for an example on how to use the algorithm.For now the algorithm work with
Double
series only, making it generic over a series algebra is the top priority.Description
The main algorithm is implemented in the
EmpiricalModeDEcomposition
class, which can be treated as a factory that produces decompositions. Parameters of a decomposition, such as termination criteria parameters and the number of modes to extract, are passed as parameters when creating an instance of the class. The user can then call thedecompose
method to produce a decomposition result, which as of now consists of a decomposition itself (asList<Series<T>>
) and a reason for termination (as anenum
element).Each empirical mode is extracted in an iterative process called sifting. The algorithm allows to determine the reason for sifting to terminate using a sealed interface (see
SiftingResult
interface documentation). There is no use for this other than determining when it is impossible to extract another mode from the signal (the only case when sifting fails completely with no valid value (mode) to be returned), so that place can ultimately be simplified to a simple null check. I decided to leave it hoping that use cases for this may arise, e.g. returning verbose statistics together with the decomposition itself.Issues encountered
SeriesAlgebra
:mapWithLabel
function was failing on series with a non-zero offset.PolynomialInterpolator
treats the right boundary exclusively. For that reason,SplineInterpolator
does not work as its counterpart from Apache Commons, which returns the right boundary value properly. I don not know if this is intended behaviour. Also, many works that explain the empirical decomposition do not use the leftmost and the rightmost points of the series as knot point for constructing the envelopes.Instead, they use points that are extrapolated using the first or the last polynomials of the spline interpolation. This version seems to work fine too, but probably it could be beneficial to modify theActually, an intricate extrema padding algorithm is usually involved, soSplineInterpolator
class to yield the points outside its boundaries if a special parameter is passed when creating an interpolator.SplineInterpolator
should only properly return the value at its right boundary.Further work
Double
);Series
extensions: envelopes, extrema, median values; nonparametricstatistical tests like Lepage test or Cucconi test.
Acknowledgements
Most of the algorithm details are taken from the Python EMD library reference and the sources cited there.
feature/emdto WIP: feature/emdGood work! But there is a bit of work to do.
Also, I would like to see several (at least, 2 or 3) tests for the feature. And it will be cool to add short description of the features to docs (you can ask help about it here).
@ -0,0 +10,4 @@
import space.kscience.plotly.models.Scatter
import kotlin.math.sin
fun main(): Unit = with(Double.seriesAlgebra()) {
is enough if you import
import space.kscience.kmath.operations.invoke
.@ -0,0 +36,4 @@
private val maxSiftIterations: Int = 20,
private val siftingDelta: Double = 1e-2,
private val nModes: Int = 6
) where BA: BufferAlgebra<Double, *>, BA: RingOps<Buffer<Double>>, L : Number {
Please, move the only type argument
L
's bound toL
's declaration cite:By the way, in general you'll have to use
T
asL
. Otherwise, eitherinterpolate
can not be resolved in snippetor there is no function to map
L
values toT
.@ -0,0 +39,4 @@
) where BA: BufferAlgebra<Double, *>, BA: RingOps<Buffer<Double>>, L : Number {
/**
* Take a signal, construct an upper and a lower envelope, find the mean value of two,
Possible typos fix suggestion:
@ -0,0 +43,4 @@
* represented as a series.
* @param signal Signal to compute on
*
* @return null is returned if the signal has not enough extrema to construct envelopes.
I suggest adding a sentence that describes what is actually returned. And possible typos fix as well.
@ -0,0 +45,4 @@
*
* @return null is returned if the signal has not enough extrema to construct envelopes.
*/
private fun findMean(signal: Series<Double>): Series<Double>? = with(seriesAlgebra) {
Just this is enough. Because there is
invoke
extension operator implemented that is imported here.@ -0,0 +64,4 @@
return if (maxima.size < 3 || minima.size < 3) null else {
val upperEnvelope = generateEnvelope(maxima)
val lowerEnvelope = generateEnvelope(minima)
return (upperEnvelope + lowerEnvelope).map { it * 0.5 }
@ -0,0 +76,4 @@
* @return [SiftingResult.NotEnoughExtrema] is returned if the signal has too few extrema to extract a mode.
* Success of an appropriate type (See [SiftingResult.Success] class) is returned otherwise.
*/
private fun sift(signal: Series<Double>): SiftingResult = with(seriesAlgebra) {
As well as I understand, whole body of the function can be replaced with just
Also
siftInner
should be markedtailrec
, shouldn't it?Yeah, it is better if the function is marked
tailrec
. But I am not sure if compiler understands the case. So I need a bit of time for a small test.It works. With
tailrec
it does not produce stack overflow when I run sifting with 5000 iterations per mode@ -0,0 +98,4 @@
val mean = findMean(prevMode) ?: return SiftingResult.SignalFlattened(prevMode)
val mode = prevMode.zip(mean) { p, m -> p - m }
val newSNumber = if (mode.sCondition()) sNumber + 1 else sNumber
return if (iterationNumber >= maxSiftIterations) SiftingResult.MaxIterationsReached(mode)
Gosh. Use
when
instead of long if-else sequence, please:It's idiom, and it's clearer to read.
@ -0,0 +100,4 @@
val newSNumber = if (mode.sCondition()) sNumber + 1 else sNumber
return if (iterationNumber >= maxSiftIterations) SiftingResult.MaxIterationsReached(mode)
else if (sNumber >= sConditionThreshold) SiftingResult.SNumberReached(mode)
else if (relativeDifference(prevMode, mode) < siftingDelta * mode.size) SiftingResult.DeltaReached(mode)
Is it intended that previous mode is assigned to
current
parameter ofrelativeDifference
and current (new) mode is assigned toprevious
parameter ofrelativeDifference
? The parameters names say that there is a swap of parameters.I would use explicit parameters' assignation:
@ -0,0 +112,4 @@
* Modes returned in a list which contains as many modes as it was possible
* to extract before triggering one of the termination conditions.
*/
public fun decompose(signal: Series<Double>): EMDecompositionResult = with(seriesAlgebra) {
The whole function can be rewritten in such way:
It's shorter but as readable as the previous version.
@ -0,0 +121,4 @@
}
modes.add(mode)
var residual = signal.zip(mode) { l, r -> l - r }
You should use
SeriesAlgebra
'selementAlgebra
in getting difference of twoDouble
values instead ofl - r
. And in 8 strings below too.@ -0,0 +168,4 @@
*/
private fun Series<Double>.countZeros(): Int {
require(size >= 2) { "Expected series with at least 2 elements, but got $size elements" }
return fold(Pair(get(0), 0)) { acc: Pair<Double, Int>, it: Double ->
Do not use
Pair
! It's non-idiomatic in Kotlin. It is really hard to always keep in mind whatfirst
andsecond
fields hold. Instead, define and use custom data class with understandable name and understandable parameters' names.Is making a one-line data class declaration above considered idiomatic?
Yeah, that's the idiomatic way. But place it outside the function. Otherwise, you won't able to access it :)
@ -0,0 +169,4 @@
private fun Series<Double>.countZeros(): Int {
require(size >= 2) { "Expected series with at least 2 elements, but got $size elements" }
return fold(Pair(get(0), 0)) { acc: Pair<Double, Int>, it: Double ->
if (sign(acc.first) != sign(it)) Pair(it, acc.second + 1) else Pair(it, acc.second)
It's better to store sign instead of the value which sign is compared, to not compute it each iteration.
@ -0,0 +179,4 @@
private fun <L, BA> SeriesAlgebra<Double, *, BA, L>.relativeDifference(
current: Series<Double>,
previous: Series<Double>
):Double where BA: BufferAlgebra<Double, *>, BA: RingOps<Buffer<Double>>, L: Number {
Use
=
notation for inline bodies, please:Also:
L
is not used, so I removed it.SeriesAlgebra
'selementAlgebra
is needed. So I replaced it. It will also help with refactoring fromDouble
"algebras" to general algebras.fold
usesDouble
as a type parameter, so boxing is not avoided. I would replace.fold(0.0) { acc, d -> acc + d }
with.sum(elementAlgebra)
but there is no such operation for some reason.I also wondered why there is no
.sum()
method. I could implement it for a 1-d series, but doing it for a general buffer seems a bit too much if there is need for arbitrary axis like in NumPy@altavir There is a need for a function
Buffer<T>.sum(elementAlgebra: Group<T>)
. Where should we place it?@ -0,0 +193,4 @@
*/
private fun Series<Double>.countExtrema(): Int {
require(size >= 3) { "Expected series with at least 3 elements, but got $size elements" }
return slice(1..< size - 1).asIterable().foldIndexed(0) { index, acc, it ->
I would recommend writing
@ -0,0 +203,4 @@
* Retrieve indices of knot points used to construct an upper envelope,
* namely maxima together with the first last point in a series.
*/
private fun<T> Series<T>.maxima(): List<Int> where T: Comparable<T> {
@ -0,0 +206,4 @@
private fun<T> Series<T>.maxima(): List<Int> where T: Comparable<T> {
require(size >= 3) { "Expected series with at least 3 elements, but got $size elements" }
val maxima = mutableListOf(0)
slice(1..< size - 1).asIterable().forEachIndexed { index, it ->
I would recommend rewriting it with old good plain loop on indices:
or using
Also, is it intended that the spline will ignore double extrema?
I mean for series
[0.0, -1.0, -1.0, 1.0, 1.0, -1.0, -1.0, 0.0]
there will be no maxima and minima points but the end points.No, I'm planning on improving this function and making it
public
placed inSeriesExtensions.kt
@ -0,0 +207,4 @@
require(size >= 3) { "Expected series with at least 3 elements, but got $size elements" }
val maxima = mutableListOf(0)
slice(1..< size - 1).asIterable().forEachIndexed { index, it ->
if (it > get(index) && it > get(index + 2)) { // weird offset, is there a way to do it better?
Could you comment what question
// weird offset, is there a way to do it better?
means in more detail?@ -0,0 +219,4 @@
* Retrieve indices of knot points used to construct a lower envelope,
* namely minima together with the first last point in a series.
*/
private fun<T> Series<T>.minima(): List<Int> where T: Comparable<T> {
@ -0,0 +222,4 @@
private fun<T> Series<T>.minima(): List<Int> where T: Comparable<T> {
require(size >= 3) { "Expected series with at least 3 elements, but got $size elements" }
val minima = mutableListOf(0)
slice(1..< size - 1).asIterable().forEachIndexed { index, it ->
I would recommend rewriting it with old good plain loop on indices:
or using
Also, similar question to this one.
@ -0,0 +243,4 @@
* Represents a condition when a mode has been successfully
* extracted in a sifting process.
*/
open class Success(val result: Series<Double>): SiftingResult
I don't understand what the
Success
inheritors are for. I mean, they are all instantiated but never distinguished. All of them can be used only internally, but are cast toSuccess
anyway.Thank you kindly for the review. I have submitted corrections for the majority of issues. I'll notify you when I make significant improvements to the algorithm or the code in general.
Checkout
From your project repository, check out a new branch and test the changes.