migrating to common toolset

This commit is contained in:
Alexander Nozik 2019-06-28 16:28:54 +03:00
parent 3bf49d3fae
commit b13d980c02
9 changed files with 104 additions and 363 deletions

View File

@ -0,0 +1,96 @@
# Questions and Answers #
In this section we will try to cover DataForge main ideas in the form of questions and answers.
## General ##
**Q:** I have a lot of data to analyze. The analysis process is complicated, requires a lot of stages and data flow is not always obvious. To top it the data size is huge, so I don't want to perform operation I don't need (calculate something I won't need or calculate something twice). And yes, I need it to be performed in parallel and probably on remote computer. By the way, I am sick and tired of scripts that modify other scripts that control scripts. Could you help me?
**A:** Yes, that is the precisely the problem DataForge was made to solve. It allows to perform some automated data manipulations with automatic optimization and parallelization. The important thing that data processing recipes are made in the declarative way, so it is quite easy to perform computations on a remote station. Also DataForge guarantees reproducibility of analysis results.
<hr>
**Q:** How does it work?
**A:** At the core of DataForge lies the idea of **metadata processor**. It utilizes the statement that in order to analyze something you need data itself and some additional information about what does that data represent and what does user want as a result. This additional information is called metadata and could be organized in a regular structure (a tree of values not unlike XML or JSON). The important thing is that this distinction leaves no place for user instructions (or scripts). Indeed, the idea of DataForge logic is that one do not need imperative commands. The framework configures itself according to input meta-data and decides what operations should be performed in the most efficient way.
<hr>
**Q:** But where does it take algorithms to use?
**A:** Of course algorithms must be written somewhere. No magic here. The logic is written in specialized modules. Some modules are provided out of the box at the system core, some need to be developed for specific problem.
<hr>
**Q:** So I still need to write the code? What is the difference then?
**A:** Yes, someone still need to write the code. But not necessary you. Simple operations could be performed using provided core logic. Also your group can have one programmer writing the logic and all other using it without any real programming expertise. Also the framework organized in a such way that one writes some additional logic, he do not need to thing about complicated thing like parallel computing, resource handling, logging, caching etc. Most of the things are done by the DataForge.
<hr>
## Platform ##
**Q:** Which platform does DataForge use? Which operation system is it working on?
**A:** The DataForge is mostly written in Java and utilizes JVM as a platform. It works on any system that supports JVM (meaning almost any modern system excluding some mobile platforms).
<hr>
**Q:** But Java... it is slow!
**A:** [It is not](https://stackoverflow.com/questions/2163411/is-java-really-slow/2163570#2163570). It lacks some hardware specific optimizations and requires some additional time to start (due to JIT nature), but otherwise it is at least as fast as other languages traditionally used in science. More importantly, the memory safety, tooling support and vast ecosystem makes it №1 candidate for data analysis framework.
<hr>
**Q:** Can I use my C++/Fortran/Python code in DataForge?
**A:** Yes, as long as the code could be called from Java. Most of common languages have a bridge for Java access. There are completely no problems with compiled C/Fortran libraries. Python code could be called via one of existing python-java interfaces. It is also planned to implement remote method invocation for common languages, so your Python, or, say, Julia, code could run in its native environment. The metadata processor paradigm makes it much easier to do so.
<hr>
## Features ##
**Q:** What other features does DataForge provide?
**A:** Alongside metadata processing (and a lot of tools for metadata manipulation and layering), DataForge has two additional important concepts:
* **Modularisation**. Contrary to lot other frameworks, DataForge is intrinsically modular. The mandatory part is a rather tiny core module. Everything else could be customized.
* **Context encapsulation**. Every DataForge task is executed in some context. The context isolates environment for the task and also works as dependency injection base and specifies interaction of the task with the external world.
<hr>
**Q:** OK, but now I want to work directly with my measuring devices. How can I do that?
**A:** The [dataforge-control](${site.url}/docs.html#control) module provides interfaces to interact with the hardware. Out of the box it supports safe communication with TCP/IP or COM/tty based devices. Specific device declaration could be done via additional modules. It is also possible to maintain data storage with [datforge-storage](${site.url}/docs.htm#storage) module.
<hr>
**Q:** Declarations and metadata are good, but I want my scripts back!
**A:** We can do that. [GRIND](${site.url}/docs.html#grind) provides a shell-like environment called GrindShell. It allows to run imperative scripts with full access to all of the DataForge functionality. Grind scripts are basically context-encapsulated. Also there are convenient feature wrappers called helpers that could be loaded into the shell when new features modules are added.
<hr>
## Misc ##
**Q:** So everything looks great, can I replace my ROOT / other data analysis framework with DataForge?
**A:** One must note, that DataForge is made for analysis, not for visualisation. The visualisation and user interaction capabilities of DataForge are rather limited compared to frameworks like ROOT, JAS3 or DataMelt. The idea is to provide reliable API and core functionality. In fact JAS3 and DataMelt could be used as a frontend for DataForge mechanics. It is planned to add an interface to ROOT via JFreeHep AIDA.
<hr>
**Q:** How does DataForge compare to cluster computation frameworks like Hadoop or Spark?
**A:** Again, it is not the purpose of DataForge to replace cluster software. DataForge has some internal parallelism mechanics and implementations, but they are most certainly worse then specially developed programs. Still, DataForge is not fixed on one single implementation. Your favourite parallel processing tool could be still used as a back-end for the DataForge. With full benefit of configuration tools, integrations and no performance overhead.
<hr>
**Q:** Is it possible to use DataForge in notebook mode?
**A:** Yes, it is. DataForge can be used as is from [beaker/beakerx](http://beakernotebook.com/) groovy kernel with minor additional adjustments. It is planned to provide separate DataForge kernel to `beakerx` which will automatically call a specific GRIND shell.
<hr>
**Q:** Can I use DataForge on a mobile platform?
**A:** DataForge is modular. Core and the most of api are pretty compact, so it could be used in Android applications. Some modules are designed for PC and could not be used on other platforms. IPhone does not support Java and therefore could use only client-side DataForge applications.

View File

@ -1,3 +1,8 @@
plugins {
id("scientifik.mpp") apply false
id("scientifik.publish") apply false
}
val dataforgeVersion by extra("0.1.3-dev-7")
val bintrayRepo by extra("dataforge")

View File

@ -1,33 +0,0 @@
plugins {
`kotlin-dsl`
}
repositories {
gradlePluginPortal()
jcenter()
}
val kotlinVersion = "1.3.40"
// Add plugins used in buildSrc as dependencies, also we should specify version only here
dependencies {
implementation("org.jetbrains.kotlin:kotlin-gradle-plugin:$kotlinVersion")
implementation("org.jfrog.buildinfo:build-info-extractor-gradle:4.9.6")
implementation("com.jfrog.bintray.gradle:gradle-bintray-plugin:1.8.4")
implementation("org.jetbrains.dokka:dokka-gradle-plugin:0.9.18")
implementation("com.moowork.gradle:gradle-node-plugin:1.3.1")
implementation("org.openjfx:javafx-plugin:0.0.7")
}
gradlePlugin{
plugins {
create("scientifik-publish") {
id = "scientifik.publish"
implementationClass = "ScientifikPublishPlugin"
}
create("scientifik-mpp"){
id = "scientifik.mpp"
implementationClass = "ScientifikMPPlugin"
}
}
}

View File

@ -1,9 +0,0 @@
/**
* Build constants
*/
object Scientifik {
val ioVersion = "0.1.10"
val coroutinesVersion = "1.2.2"
val atomicfuVersion = "0.12.9"
val serializationVersion = "0.11.1"
}

View File

@ -1,79 +0,0 @@
import org.gradle.api.Plugin
import org.gradle.api.Project
import org.gradle.kotlin.dsl.configure
import org.gradle.kotlin.dsl.getValue
import org.gradle.kotlin.dsl.getting
import org.gradle.kotlin.dsl.invoke
import org.jetbrains.kotlin.gradle.dsl.KotlinMultiplatformExtension
open class ScientifikMPPlugin : Plugin<Project> {
override fun apply(project: Project) {
project.plugins.apply("org.jetbrains.kotlin.multiplatform")
project.configure<KotlinMultiplatformExtension> {
jvm {
compilations.all {
kotlinOptions {
jvmTarget = "1.8"
}
}
}
js {
compilations.all {
kotlinOptions {
sourceMap = true
sourceMapEmbedSources = "always"
moduleKind = "umd"
}
}
}
sourceSets.invoke {
val commonMain by getting {
dependencies {
api(kotlin("stdlib"))
}
}
val commonTest by getting {
dependencies {
implementation(kotlin("test-common"))
implementation(kotlin("test-annotations-common"))
}
}
val jvmMain by getting {
dependencies {
api(kotlin("stdlib-jdk8"))
}
}
val jvmTest by getting {
dependencies {
implementation(kotlin("test"))
implementation(kotlin("test-junit"))
}
}
val jsMain by getting {
dependencies {
api(kotlin("stdlib-js"))
}
}
val jsTest by getting {
dependencies {
implementation(kotlin("test-js"))
}
}
}
targets.all {
sourceSets.all {
languageSettings.apply{
progressiveMode = true
enableLanguageFeature("InlineClasses")
useExperimentalAnnotation("ExperimentalUnsignedType")
}
}
}
}
}
}

View File

@ -1,237 +0,0 @@
import com.jfrog.bintray.gradle.BintrayExtension
import com.jfrog.bintray.gradle.tasks.BintrayUploadTask
import groovy.lang.GroovyObject
import org.gradle.api.Plugin
import org.gradle.api.Project
import org.gradle.api.plugins.JavaBasePlugin
import org.gradle.api.publish.PublishingExtension
import org.gradle.api.publish.maven.MavenPublication
import org.gradle.api.publish.maven.internal.artifact.FileBasedMavenArtifact
import org.gradle.api.tasks.bundling.Jar
import org.gradle.kotlin.dsl.*
import org.jetbrains.dokka.gradle.DokkaTask
import org.jetbrains.kotlin.gradle.dsl.KotlinJvmProjectExtension
import org.jetbrains.kotlin.gradle.dsl.KotlinMultiplatformExtension
import org.jfrog.gradle.plugin.artifactory.dsl.ArtifactoryPluginConvention
import org.jfrog.gradle.plugin.artifactory.dsl.PublisherConfig
import org.jfrog.gradle.plugin.artifactory.dsl.ResolverConfig
import org.jfrog.gradle.plugin.artifactory.task.ArtifactoryTask
open class ScientifikExtension {
var vcs: String? = null
var bintrayRepo: String? = null
var kdoc: Boolean = true
}
// recursively search up the project chain for configuration
private val Project.bintrayRepo: String?
get() = extensions.findByType<ScientifikExtension>()?.bintrayRepo
?: parent?.bintrayRepo
?: (findProperty("bintrayRepo") as? String)
private val Project.vcs: String?
get() = extensions.findByType<ScientifikExtension>()?.vcs
?: parent?.vcs
?: (findProperty("vcs") as? String)
open class ScientifikPublishPlugin : Plugin<Project> {
override fun apply(project: Project) {
project.plugins.apply("maven-publish")
val extension = project.extensions.create<ScientifikExtension>("scientifik")
val bintrayRepo = project.bintrayRepo
val vcs = project.vcs
if (bintrayRepo == null || vcs == null) {
project.logger.warn("[${project.name}] Missing deployment configuration. Skipping publish.")
}
project.configure<PublishingExtension> {
repositories {
maven("https://bintray.com/mipt-npm/$bintrayRepo")
}
// Process each publication we have in this project
publications.filterIsInstance<MavenPublication>().forEach { publication ->
@Suppress("UnstableApiUsage")
publication.pom {
name.set(project.name)
description.set(project.description)
url.set(vcs)
licenses {
license {
name.set("The Apache Software License, Version 2.0")
url.set("http://www.apache.org/licenses/LICENSE-2.0.txt")
distribution.set("repo")
}
}
developers {
developer {
id.set("MIPT-NPM")
name.set("MIPT nuclear physics methods laboratory")
organization.set("MIPT")
organizationUrl.set("http://npm.mipt.ru")
}
}
scm {
url.set(extension.vcs)
}
}
}
}
if (extension.kdoc) {
project.plugins.apply("org.jetbrains.dokka")
project.afterEvaluate {
extensions.findByType<KotlinMultiplatformExtension>()?.apply {
val dokka by tasks.getting(DokkaTask::class) {
outputFormat = "html"
outputDirectory = "$buildDir/javadoc"
jdkVersion = 8
kotlinTasks {
// dokka fails to retrieve sources from MPP-tasks so we only define the jvm task
listOf(tasks.getByPath("compileKotlinJvm"))
}
sourceRoot {
// assuming only single source dir
path = sourceSets["commonMain"].kotlin.srcDirs.first().toString()
platforms = listOf("Common")
}
// although the JVM sources are now taken from the task,
// we still define the jvm source root to get the JVM marker in the generated html
sourceRoot {
// assuming only single source dir
path = sourceSets["jvmMain"].kotlin.srcDirs.first().toString()
platforms = listOf("JVM")
}
}
val kdocJar by tasks.registering(Jar::class) {
group = JavaBasePlugin.DOCUMENTATION_GROUP
dependsOn(dokka)
archiveClassifier.set("javadoc")
from("$buildDir/javadoc")
}
configure<PublishingExtension> {
targets.all {
val publication = publications.findByName(name) as MavenPublication
// Patch publications with fake javadoc
publication.artifact(kdocJar.get())
}
tasks.filter { it is ArtifactoryTask || it is BintrayUploadTask }.forEach {
it.doFirst {
publications.filterIsInstance<MavenPublication>()
.forEach { publication ->
val moduleFile =
buildDir.resolve("publications/${publication.name}/module.json")
if (moduleFile.exists()) {
publication.artifact(object : FileBasedMavenArtifact(moduleFile) {
override fun getDefaultExtension() = "module"
})
}
}
}
}
}
}
extensions.findByType<KotlinJvmProjectExtension>()?.apply {
val dokka by tasks.getting(DokkaTask::class) {
outputFormat = "html"
outputDirectory = "$buildDir/javadoc"
jdkVersion = 8
}
val kdocJar by tasks.registering(Jar::class) {
group = JavaBasePlugin.DOCUMENTATION_GROUP
dependsOn(dokka)
archiveClassifier.set("javadoc")
from("$buildDir/javadoc")
}
configure<PublishingExtension> {
publications.filterIsInstance<MavenPublication>().forEach { publication ->
publication.artifact(kdocJar.get())
}
}
}
}
}
project.plugins.apply("com.jfrog.bintray")
project.configure<BintrayExtension> {
user = project.findProperty("bintrayUser") as? String ?: System.getenv("BINTRAY_USER")
key = project.findProperty("bintrayApiKey") as? String? ?: System.getenv("BINTRAY_API_KEY")
publish = true
override = true // for multi-platform Kotlin/Native publishing
// We have to use delegateClosureOf because bintray supports only dynamic groovy syntax
// this is a problem of this plugin
pkg.apply {
userOrg = "mipt-npm"
repo = bintrayRepo
name = project.name
issueTrackerUrl = "${extension.vcs}/issues"
setLicenses("Apache-2.0")
vcsUrl = extension.vcs
version.apply {
name = project.version.toString()
vcsTag = project.version.toString()
released = java.util.Date().toString()
}
}
//workaround bintray bug
project.afterEvaluate {
setPublications(*project.extensions.findByType<PublishingExtension>()!!.publications.names.toTypedArray())
}
// project.tasks.figetByPath("bintrayUpload") {
// dependsOn(publishToMavenLocal)
// }
}
project.plugins.apply("com.jfrog.artifactory")
project.configure<ArtifactoryPluginConvention> {
val artifactoryUser: String? by project
val artifactoryPassword: String? by project
val artifactoryContextUrl = "http://npm.mipt.ru:8081/artifactory"
setContextUrl(artifactoryContextUrl)//The base Artifactory URL if not overridden by the publisher/resolver
publish(delegateClosureOf<PublisherConfig> {
repository(delegateClosureOf<GroovyObject> {
setProperty("repoKey", "gradle-dev-local")
setProperty("username", artifactoryUser)
setProperty("password", artifactoryPassword)
})
defaults(delegateClosureOf<GroovyObject> {
invokeMethod("publications", arrayOf("jvm", "js", "kotlinMultiplatform", "metadata"))
})
})
resolve(delegateClosureOf<ResolverConfig> {
repository(delegateClosureOf<GroovyObject> {
setProperty("repoKey", "gradle-dev")
setProperty("username", artifactoryUser)
setProperty("password", artifactoryPassword)
})
})
}
}
}

View File

@ -3,7 +3,3 @@ plugins {
}
description = "Meta definition and basic operations on meta"
scientifik{
}

View File

@ -3,6 +3,7 @@ pluginManagement {
jcenter()
gradlePluginPortal()
maven("https://dl.bintray.com/kotlin/kotlin-eap")
maven("https://dl.bintray.com/mipt-npm/scientifik")
}
resolutionStrategy {
eachPlugin {
@ -11,6 +12,7 @@ pluginManagement {
"kotlin-multiplatform" -> useModule("org.jetbrains.kotlin:kotlin-gradle-plugin:${requested.version}")
"kotlin2js" -> useModule("org.jetbrains.kotlin:kotlin-gradle-plugin:${requested.version}")
"org.jetbrains.kotlin.frontend" -> useModule("org.jetbrains.kotlin:kotlin-frontend-plugin:0.0.45")
"scientifik.mpp", "scientifik.publish" -> useModule("scientifik:gradle-tools:0.1.0")
}
}
}