Update README.md

This commit is contained in:
Alexander Nozik 2023-09-01 19:57:27 +03:00
parent e2d3c108c2
commit ae0240df2b
2 changed files with 54 additions and 31 deletions

View File

@ -1,12 +1,6 @@
# Space Document Extractor
# Export everything from Space
The aim of this repository is to help to generate stand-alone version of JetBrains Space documents. Those documents are written in MarkDown format and could include images. In order to do that one have to do several steps:
* Download a page as markdown to a directory.
* Download attached images to specific directory.
* Replace references to attachments in MarkDown files.
This project uses Space SDK to organize those steps.
The aim of this repository is to help to export your data from JetBrains Space usin Space SDK.
## Setting up Space Application
@ -18,39 +12,60 @@ In order to access data in Space, one needs to [create a Space Application](http
* View book metadata
* View content
For restricted projects, one needs to manually add the project and its permission to allowed.
For restricted projects, one needs to manually add the project and its permission.
Then one needs to copy `clientId` and `clientSecret` for the application and use them as command line parameters.
## Downloading texts
## Export documents
Initially, the main idea was to export Space documents. Those documents are written in MarkDown format and could include images and file references, but do not have a dedicated API to download them. In order to do that, one has to do several steps:
* Download a page as markdown to a directory.
* Download attached images to specific directory.
* Replace references to attachments in MarkDown files.
### Downloading texts
Text and binary documents are processed recursively starting at given `folderId` or project root if it is not defined.
## Download images
### Download images
The images in space documents are inserted in the following format: `![](/d/aaaabbbbcccc?f=0 "name.png")`. Our aim is to detect those links in files and download appropriate images. Those links could not be replaced directly, because access requires OAuth authentication. For that we need to use access token from Space SDK.
## Replace references
### Replace references
After file is successfully downloaded, the reference in file must be replaced with a local one.
After the file is successfully downloaded, the reference in file must be replaced with a local one.
## Command line interface
### Document conversion with Pandoc
```commandline
Usage: space-document-extractor options_list
Options:
--spaceUrl -> Url of the space instance like 'https://mipt-npm.jetbrains.space' (always required) { String }
--project -> The key of the exported project (always required) { String }
--path -> Target directory. Default is './output/project-key'. { String }
--folderId -> FolderId for the folder to export. By default uses project root. { String }
--clientId -> Space application client ID (if not defined, use environment value 'space.clientId') { String }
--clientSecret -> Space application client secret (if not defined, use environment value 'space.clientSecret') { String }
--help, -h -> Usage info
The package also includes an automatic conversion of documents via pandoc. See CLI keys reference for details.
### CLI for document download
The CLI for document extraction is the following:
```
./space-export docs --clientId <Client ID> --clientSecret <Client Secret> <optional keys> <mandatory Space page URL>
```
Typical application usage:
The URL could be either a folder page or a project page. If it is a project page, all documents in the project are exported.
```commandline
.\space-document-extractor --spaceUrl "your space URL" --project "your project key" --clientId "your client ID" --clientSecret "your client secret"
## Export repositories
This is straight-forward. It scans projects for repositories and then clones them, using a system git and default user SSH key (it is possible to add custom SSH certificate in the future).
CLI is the same as for documents, but takes only project root as URL.
## Export chat history
Chat history is exported the same way as documents (including threads).
URL for chats is either a specific chat page (without threads for now) or a Space base URL (in this case, all chats will be exported).
## Export direct messages
Direct messages require different treatment because they require authorization on behalf of the user. In order to do so, one needs to create a personal token (Search for `Personal token` in the search) with global `View direct messages`, `View messages` and `View profile` access. Then use it with a `--token` key like this:
```
./space-export --token <Token string> <URL>
```
It will download all documents and postprocess markdown files, replacing image links with downloaded image in `images` directory (each subdirectory will have its own `images`.
Url could be either a base space Url or an Url of the chat.

View File

@ -24,7 +24,11 @@ private abstract class ExtractCommand(name: String, description: String) : Subco
val url by argument(
ArgType.String,
description = "Url of the folder like 'https://spc.jetbrains.space/p/mipt-npm/documents/folders?f=SPC-qn7al1VorKp' or 'https://spc.jetbrains.space/p/mipt-npm/documents/SPC/f/SPC-qn7al1VorKp?f=SPC-qn7al1VorKp'"
description = """
Root IRL of the space Url like `https://spc.jetbrains.space`.
OR
Url of a specific conversation like: `https://spc.jetbrains.space/im/user/TestAccount`.
""".trimIndent()
)
val clientId by option(
@ -218,12 +222,16 @@ private class ExtractDirectCommand : Subcommand("direct", "Extract direct messag
val url by argument(
ArgType.String,
description = "Url of the folder like 'https://spc.jetbrains.space/p/mipt-npm/documents/folders?f=SPC-qn7al1VorKp' or 'https://spc.jetbrains.space/p/mipt-npm/documents/SPC/f/SPC-qn7al1VorKp?f=SPC-qn7al1VorKp'"
description = """
Root IRL of the space Url like `https://spc.jetbrains.space`.
OR
Url of a specific conversation like: `https://spc.jetbrains.space/im/user/TestAccount`.
""".trimIndent()
)
val token by option(
ArgType.String,
description = "A permanent token. Must have 'View direct messages', 'View messages' and 'View profile' access."
description = "A permanent token. Must have `View direct messages`, `View messages` and `View profile` access."
).required()
val path: String by option(