Update readme

This commit is contained in:
Alexander Nozik 2024-10-05 11:11:38 +03:00
parent e2ef4b1c5b
commit 52da9fe52e
8 changed files with 203 additions and 115 deletions

102
README.md
View File

@ -1,2 +1,100 @@
# snark # SNARK
Scientific Notation And Representation in Kotlin
In Lewis Carroll "The hunting of the Snark", the Snark itself is something everybody want to get, but nobody know what it is. It is the same in case of this project, but it also has narrower scope. SNARK could be read as "Scientific Notation And Research works in Kotlin" because it could be used for automatic creation of research papers. But it has other purposes as well.
To sum it up, **SNARK is an automated data transformation tool with the main focus on document and web page generation**. It is based on [DataForge framework](https://github.com/SciProgCentre/dataforge-core).
SNARK **is not a typesetting system** itself, but it could utilize typesetting systems such as Markdown, Latex or Typst to do data transformations.
## Concepts
The SNARK process it the transformation of a data tree. Initial data could include texts, images, static binary or textual data or even active external data subscriptions. The result is usually a tree of documents or a directly served web-site.
**Data** is any kind of content, generated lazily with additional metadata (DataForge Meta).
## Using DataForge context
DataForge module management is based on **Contexts** and **Plugins**. Context is used both as dependency injection system, lifecycle object and API discoverability root for all executions. To use some subsystem, one needs to:
* Create a Context with a Plugin like this:
```kotlin
Context("Optional context name"){
plugin(SnarkHtml)
// Here SnarkHtml is a Plugin factory declared as a companion object to a Plugin itself
}
```
* Get the loaded plugin instance via `val snarkHtml = context.request(SnarkHtml)`
* Use plugin like
```kotlin
val siteData = snarkHtml.readSiteData(context) {
directory(snark.io, Name.EMPTY, dataDirectory)
}
```
## SNARK-html
SNARK-HTML module defines tools to work with HTML output format. The API root for it is `SnarkHtml` plugin. Its primary function (`parse` action) is to parse raw binary DataTree with objects specific for HTML rendering, assets and metadata. It uses `SnarkReader` and more specifically `SnarkHtmlReader` to parse binary data into formats like `Meta` and `PageFragment`. If `parse` could not recognize the format of the input, it leaves it as (lazy) binary.
### Preprocessing and postprocessing
Snark uses DataForge data tree transformation ideology so there could be any number of data transformation steps both before parsing and after parsing, but there is a key difference: before parsing we work with binaries that could be transformed directly (yet lazily because this is how DataForge works), after parsing we have not a hard data, but a rendering function that could only be transformed by wrapping it in another function (which could be complicated). The raw data transformation before parsing is called preprocessing. It could include both raw binary transformation and metadata transformation. The postprocessing is usually done inside the rendering function produced by parser or created directly from code.
The interface for `PageFragment` looks like this:
```kotlin
public fun interface PageFragment {
context(PageContextWithData, FlowContent) public fun renderFragment()
}
```
It takes a reference to parsed data tree and rendering context of the page as well as HTML mounting root and provides action to render HTML. The reason for such complication is that some things are not known before the actual page rendering happens. For example, absolute links in HTML could be resolved only when the page is rendered on specific REST request that contain information about host and port. Another example is providing automatic counters for chapters, formulas and images in document rendering. The order is not known until all fragments are placed in correct order.
Postprocessors are functions that transform fragments of HTML wrapped in them according to data tree and page rendering context.
Other details on HTML rendering could be found in [snark-html](./snark-html) module
### [examples](examples)
>
> **Maturity**: EXPERIMENTAL
### [snark-core](snark-core)
>
> **Maturity**: EXPERIMENTAL
### [snark-gradle-plugin](snark-gradle-plugin)
>
> **Maturity**: EXPERIMENTAL
### [snark-html](snark-html)
>
> **Maturity**: EXPERIMENTAL
>
> **Features:**
> - [data](snark-html/#) : Data-based processing. Instead of traditional layout-based
> - [layouts](snark-html/#) : Use custom layouts to represent a data tree
> - [parsers](snark-html/#) : Add custom file formats and parsers using DataForge dependency injection
> - [preprocessor](snark-html/#) : Preprocessing text files using templates
> - [metadata](snark-html/#) : Trademark DataForge metadata layering and transformations
> - [dynamic](snark-html/#) : Generating dynamic site using KTor server
> - [static](snark-html/#) : Generating static site
### [snark-ktor](snark-ktor)
>
> **Maturity**: EXPERIMENTAL
### [snark-pandoc](snark-pandoc)
>
> **Maturity**: EXPERIMENTAL
### [examples/document](examples/document)
>
> **Maturity**: EXPERIMENTAL

View File

@ -22,6 +22,10 @@ ksciencePublish {
useApache2Licence() useApache2Licence()
useSPCTeam() useSPCTeam()
} }
repository("spc","https://maven.sciprog.center/kscience") repository("spc", "https://maven.sciprog.center/kscience")
// sonatype() // sonatype()
} }
readme {
this.useDefaultReadmeTemplate
}

62
docs/README-TEMPLATE.md Normal file
View File

@ -0,0 +1,62 @@
# SNARK
In Lewis Carroll "The hunting of the Snark", the Snark itself is something everybody want to get, but nobody know what it is. It is the same in case of this project, but it also has narrower scope. SNARK could be read as "Scientific Notation And Research works in Kotlin" because it could be used for automatic creation of research papers. But it has other purposes as well.
To sum it up, **SNARK is an automated data transformation tool with the main focus on document and web page generation**. It is based on [DataForge framework](https://github.com/SciProgCentre/dataforge-core).
SNARK **is not a typesetting system** itself, but it could utilize typesetting systems such as Markdown, Latex or Typst to do data transformations.
## Concepts
The SNARK process it the transformation of a data tree. Initial data could include texts, images, static binary or textual data or even active external data subscriptions. The result is usually a tree of documents or a directly served web-site.
**Data** is any kind of content, generated lazily with additional metadata (DataForge Meta).
## Using DataForge context
DataForge module management is based on **Contexts** and **Plugins**. Context is used both as dependency injection system, lifecycle object and API discoverability root for all executions. To use some subsystem, one needs to:
* Create a Context with a Plugin like this:
```kotlin
Context("Optional context name"){
plugin(SnarkHtml)
// Here SnarkHtml is a Plugin factory declared as a companion object to a Plugin itself
}
```
* Get the loaded plugin instance via `val snarkHtml = context.request(SnarkHtml)`
* Use plugin like
```kotlin
val siteData = snarkHtml.readSiteData(context) {
directory(snark.io, Name.EMPTY, dataDirectory)
}
```
## SNARK-html
SNARK-HTML module defines tools to work with HTML output format. The API root for it is `SnarkHtml` plugin. Its primary function (`parse` action) is to parse raw binary DataTree with objects specific for HTML rendering, assets and metadata. It uses `SnarkReader` and more specifically `SnarkHtmlReader` to parse binary data into formats like `Meta` and `PageFragment`. If `parse` could not recognize the format of the input, it leaves it as (lazy) binary.
### Preprocessing and postprocessing
Snark uses DataForge data tree transformation ideology so there could be any number of data transformation steps both before parsing and after parsing, but there is a key difference: before parsing we work with binaries that could be transformed directly (yet lazily because this is how DataForge works), after parsing we have not a hard data, but a rendering function that could only be transformed by wrapping it in another function (which could be complicated). The raw data transformation before parsing is called preprocessing. It could include both raw binary transformation and metadata transformation. The postprocessing is usually done inside the rendering function produced by parser or created directly from code.
The interface for `PageFragment` looks like this:
```kotlin
public fun interface PageFragment {
context(PageContextWithData, FlowContent) public fun renderFragment()
}
```
It takes a reference to parsed data tree and rendering context of the page as well as HTML mounting root and provides action to render HTML. The reason for such complication is that some things are not known before the actual page rendering happens. For example, absolute links in HTML could be resolved only when the page is rendered on specific REST request that contain information about host and port. Another example is providing automatic counters for chapters, formulas and images in document rendering. The order is not known until all fragments are placed in correct order.
Postprocessors are functions that transform fragments of HTML wrapped in them according to data tree and page rendering context.
Other details on HTML rendering could be found in [snark-html](./snark-html) module
${modules}

View File

@ -1,46 +0,0 @@
# SNARK
In Lewis Carroll "The hunting of the Snark", the Snark itself is something everybody want to get, but nobody know what it is. It is the same in case of this project, but it also has narrower scope. SNARK could be read as "Scientific Notation And Research works in Kotlin" because it could be used for automatic creation of research papers. But it has other purposes as well.
To sum it up, **SNARK is an automated data transformation tool with the main focus on document and web page generation**. It is based on [DataForge framework](https://github.com/SciProgCentre/dataforge-core).
SNARK **is not a typesetting system** itself, but it could utilize typesetting systems such as Markdown, Latex or Typst to do data transformations.
## Concepts
The SNARK process it the transformation of a data tree. Initial data could include texts, images, static binary or textual data or even active external data subscriptions. The result is usually a tree of documents or a directly served web-site.
**Data** is any kind of content, generated lazily with additional metadata (DataForge Meta).
## Using DataForge context
DataForge module management is based on **Contexts** and **Plugins**. Context is used both as dependency injection system, lifecycle object and API discoverability root for all executions. To use some subsystem, one needs to:
* Create a Context with a Plugin like this:
```kotlin
Context("Optional context name"){
plugin(SnarkHtml)
// Here SnarkHtml is a Plugin factory declared as a companion object to a Plugin itself
}
```
* Get the loaded plugin instance via `val snarkHtml = context.request(SnarkHtml)`
* Use plugin like
```kotlin
val siteData = snarkHtml.readSiteData(context) {
directory(snark.io, Name.EMPTY, dataDirectory)
}
```
## SNARK-html
SNARK-HTML module defines tools to work with HTML output format.
### Postprocessing
${modules}

32
snark-html/README.md Normal file
View File

@ -0,0 +1,32 @@
# Module snark-html
## Features
- [data](#) : Data-based processing. Instead of traditional layout-based
- [layouts](#) : Use custom layouts to represent a data tree
- [parsers](#) : Add custom file formats and parsers using DataForge dependency injection
- [preprocessor](#) : Preprocessing text files using templates
- [metadata](#) : Trademark DataForge metadata layering and transformations
- [dynamic](#) : Generating dynamic site using KTor server
- [static](#) : Generating static site
## Usage
## Artifact:
The Maven coordinates of this project are `space.kscience:snark-html:0.2.0-dev-1`.
**Gradle Kotlin DSL:**
```kotlin
repositories {
maven("https://repo.kotlin.link")
mavenCentral()
}
dependencies {
implementation("space.kscience:snark-html:0.2.0-dev-1")
}
```

View File

@ -51,7 +51,7 @@ public class SnarkHtml : WorkspacePlugin() {
override fun content(target: String): Map<Name, Any> = when (target) { override fun content(target: String): Map<Name, Any> = when (target) {
SnarkReader::class.dfType -> mapOf( SnarkReader::class.dfType -> mapOf(
"html".asName() to HtmlReader, "html".asName() to RawHtmlReader,
"markdown".asName() to MarkdownReader, "markdown".asName() to MarkdownReader,
"json".asName() to SnarkReader<Meta>(JsonMetaFormat, ContentType.Application.Json.toString()), "json".asName() to SnarkReader<Meta>(JsonMetaFormat, ContentType.Application.Json.toString()),
"yaml".asName() to SnarkReader<Meta>(YamlMetaFormat, "text/yaml", "yaml"), "yaml".asName() to SnarkReader<Meta>(YamlMetaFormat, "text/yaml", "yaml"),

View File

@ -13,7 +13,7 @@ public interface SnarkHtmlReader : SnarkReader<PageFragment>{
override val outputType: KType get() = typeOf<PageFragment>() override val outputType: KType get() = typeOf<PageFragment>()
} }
public object HtmlReader : SnarkHtmlReader { public object RawHtmlReader : SnarkHtmlReader {
override val inputContentTypes: Set<String> = setOf("html") override val inputContentTypes: Set<String> = setOf("html")
override fun readFrom(source: String): PageFragment = PageFragment { override fun readFrom(source: String): PageFragment = PageFragment {

View File

@ -1,66 +1,4 @@
## Examples # Module snark-pandoc
### Simple converting
Convert from INPUT_FILE to OUTPUT_FILE:
```java
PandocWrapper wrapper = new PandocWrapper();
wrapper.use(p -> {
var command = new PandocCommandBuilder(List.of(INPUT_FILE), OUTPUT_FILE);
PandocWrapper.execute(command);
});
```
Equal to:
```
pandoc --output=OUTPUT_FILE INPUT_FILE
```
### Convert and set formats
Convert from INPUT_FILE to OUTPUT_FILE and set INPUT_FORMAT and OUTPUT_FORMAT:
```java
PandocWrapper wrapper = new PandocWrapper();
wrapper.use(p -> {
var command = new PandocCommandBuilder(List.of(INPUT_FILE), OUTPUT_FILE);
command.formatForm(INPUT_FORMAT);
command.formatTo(OUTPUT_FORMAT);
PandocWrapper.execute(command);
});
```
Equal to:
```
pandoc --output=OUTPUT_FILE --from=INPUT_FORMAT --to=OUTPUT_FORMAT INPUT_FILE
```
### Converting with options
Convert from INPUT_FILE to standalone OUTPUT_FILE and set variable KEY to VALUE :
```java
PandocWrapper wrapper = new PandocWrapper();
wrapper.use(p -> {
var command = new PandocCommandBuilder(List.of(INPUT_FILE), OUTPUT_FILE);
command.standalone();
command.setVariable(KEY, VALUE);
PandocWrapper.execute(command);
});
```
Equal to:
```
pandoc --output=OUTPUT_FILE --standalone --variable=KEY:VALUE INPUT_FILE
```
### Write output from pandoc to file
Receive possible input formats in OUTPUT_FILE:
```java
PandocWrapper wrapper = new PandocWrapper();
wrapper.use(p -> {
var command = new PandocCommandBuilder();
command.getInputFormats();
PandocWrapper.execute(command, OUTPUT_FILE);
});
```
Then in OUTPUT_FILE will be a list supported input formats, one per line.
### Write errors from pandoc to file
Receive all from error stream and exit code in ERROR_FILE and output in OUTPUT_FILE:
```java
PandocWrapper wrapper = new PandocWrapper();
wrapper.use(p -> {
var command = new PandocCommandBuilder(List.of(INPUT_FILE), OUTPUT_FILE);
PandocWrapper.execute(command, OUTPUT_FILE, ERROR_FILE);
});
```