Our aim is to cover a wide set of applications from bayesian computation and deep learning to particle physics simulations. In fact, we support any differentiable program written on top of AutoGrad & ATen.

Installation from source

Currently, we support only the linux platform for the native artifacts. For GPU kernels, we require a compatible CUDA installation. If you are on Windows, we recommend setting up everything on WSL.

To install the library, you can simply publish KMath to the local Maven repository:

$ ./gradlew -Dorg.gradle.java.home=/path/to/local/jdk -q publishToMavenLocal

This will fetch and build the JNI wrapper jnoa.

The library has been tested with graalvm-ce-java11-linux-amd64-22.0.0.2.

In your own application add the local dependency:

repositories {
    mavenCentral()
    mavenLocal()
}

dependencies {
    implementation("space.kscience:kmath-noa:0.3.0-dev-17")
}

To load the native library you will need to add to the VM options:

-Djava.library.path=${HOME}/.kmath/third-party/noa-v0.0.1/cpp-build/jnoa

Usage

The library is under active development. Many more features will be available soon.

Tensors and Linear Algebra

We implement the tensor algebra interfaces from kmath-tensors:

NoaFloat {
    val tensor = 
        randNormal(
            shape = intArrayOf(7, 5, 3), 
            device = Device.CPU) // or Device.CUDA(0) for GPU
    
    // Compute SVD
    val (tensorU, tensorS, tensorV) = tensor.svd()
    
    // Reconstruct tensor
    val tensorReg =
        tensorU dot (diagonalEmbedding(tensorS) dot tensorV.transpose(-2, -1))
    
    // Serialise tensor for later
    tensorReg.save("tensorReg.pt")
}

The saved tensor can be loaded in C++ or in python:

import torch
tensor_reg = list(torch.jit.load('tensorReg.pt').parameters())[0]

The most efficient way passing data between the JVM and the native backend is to rely on primitive arrays:

val array = (1..8).map { 100f * it }.toFloatArray()
val updateArray = floatArrayOf(15f, 20f)
val resArray = NoaFloat {
    val tensor = copyFromArray(array, intArrayOf(2, 2, 2))
    NoaFloat {
        // The call `tensor[0]` creates a native tensor instance pointing to a slice of `tensor`
        // The second call `[1]` is a setter call and does not create any new instances
        tensor[0][1] = updateArray
        // The instance `tensor[0]` is destroyed as we move out of the scope
    }!! // if the computation fails the result fill be null
    tensor.copyToArray()
    // the instance `tensor` is destroyed here
}!!

Automatic Differentiation

The AutoGrad engine is exposed:

NoaFloat {
    // Create a quadratic function
    val dim = 3
    val tensorX = randNormal(shape = intArrayOf(dim))
    val randFeatures = randNormal(shape = intArrayOf(dim, dim))
    val tensorSigma = randFeatures + randFeatures.transpose(0, 1)
    val tensorMu = randNormal(shape = intArrayOf(dim))

    // Create a differentiable expression
    val expressionAtX = withGradAt(tensorX) { x ->
        0.5f * (x dot (tensorSigma dot x)) + (tensorMu dot x) + 25.9f
    }

    // Evaluate the gradient at tensorX
    // retaining the graph for the hessian computation
    val gradientAtX = expressionAtX.autoGradient(tensorX, retainGraph = true)
    
    // Compute the hessian at tensorX
    val hessianAtX = expressionAtX.autoHessian(tensorX)
}

Deep Learning

You can train any TorchScript model. For example, you can build in python the following neural network and prepare the training data:

import torch

n_tr = 7
n_val = 300
x_val = torch.linspace(-5, 5, n_val).view(-1, 1)
y_val = torch.sin(x_val)
x_train = torch.linspace(-3.14, 3.14, n_tr).view(-1, 1)
y_train = torch.sin(x_train) + torch.randn_like(x_train) * 0.1

class Data(torch.nn.Module):
    def __init__(self):
        super(Data, self).__init__()
        self.register_buffer('x_val', x_val)
        self.register_buffer('y_val', y_val)
        self.register_buffer('x_train', x_train)
        self.register_buffer('y_train', y_train)

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.l1 = torch.nn.Linear(1, 10, bias = True)
        self.l2 = torch.nn.Linear(10, 10, bias = True)
        self.l3 = torch.nn.Linear(10, 1, bias = True)

    def forward(self, x):
        x = self.l1(x)
        x = torch.relu(x)
        x = self.l2(x)
        x = torch.relu(x)
        x = self.l3(x)
        return x

class Loss(torch.nn.Module):
    def __init__(self, target):
        super(Loss, self).__init__()
        self.register_buffer('target', target)
        self.loss = torch.nn.MSELoss()
        
    def forward(self, x):
        return self.loss(x, self.target)

# Generate TorchScript modules and serialise them
torch.jit.script(Data()).save('data.pt')
torch.jit.script(Net()).save('net.pt')
torch.jit.script(Loss(y_train)).save('loss.pt')

You can then load the modules into kotlin and train them:

NoaFloat { 
    
    // Load the serialised JIT modules
    // The training data
    val dataModule = loadJitModule("data.pt")
    // The DL model
    val netModule = loadJitModule("net.pt")
    // The loss function
    val lossModule = loadJitModule("loss.pt")

    // Get the tensors from the module
    val xTrain = dataModule.getBuffer("x_train")
    val yTrain = dataModule.getBuffer("y_train")
    val xVal = dataModule.getBuffer("x_val")
    val yVal = dataModule.getBuffer("y_val")

    // Set the model in training mode
    netModule.train(true)
    // Loss function for training 
    lossModule.setBuffer("target", yTrain)

    // Compute the predictions
    val yPred = netModule.forward(xTrain)
    // Compute the training loss
    val loss = lossModule.forward(yPred)
    println(loss)
    
    // Set-up the Adam optimiser with learning rate 0.005
    val optimiser = netModule.adamOptimiser(0.005)

    // Train for 250 epochs
    repeat(250){
        // Clean gradients
        optimiser.zeroGrad()
        // Use forwardAssign to for better memory management
        netModule.forwardAssign(xTrain, yPred)
        lossModule.forwardAssign(yPred, loss)
        // Backward pass
        loss.backward()
        // Update model parameters 
        optimiser.step()
        if(it % 50 == 0)
            println("Training loss: $loss")
    }

    // Finally validate the model 
    // Compute the predictions for the validation features
    netModule.forwardAssign(xVal, yPred)
    // Set the loss for validation 
    lossModule.setBuffer("target", yVal)
    // Compute the loss on validation dataset
    lossModule.forwardAssign(yPred, loss)
    println("Validation loss: $loss")
    
    // The model can be serialised in its current state
    netModule.save("trained_net.pt")
}

Custom memory management

Native memory management relies on scoping with NoaScope which is readily available within an algebra context. Manual management is also possible:

// Create a scope
val scope = NoaScope()

val tensor = NoaFloat(scope){
    full(5f, intArrayOf(1))
}!! // the result might be null

// If the computation fails resources will be freed automatically
// Otherwise it's your responsibility:
scope.disposeAll()

// Attempts to use tensor here is undefined behaviour

For more examples have a look at NOA docs.

Contributed by Roland Grinis