Update README.md
This commit is contained in:
parent
e2d3c108c2
commit
ae0240df2b
71
README.md
71
README.md
@ -1,12 +1,6 @@
|
||||
# Space Document Extractor
|
||||
# Export everything from Space
|
||||
|
||||
The aim of this repository is to help to generate stand-alone version of JetBrains Space documents. Those documents are written in MarkDown format and could include images. In order to do that one have to do several steps:
|
||||
|
||||
* Download a page as markdown to a directory.
|
||||
* Download attached images to specific directory.
|
||||
* Replace references to attachments in MarkDown files.
|
||||
|
||||
This project uses Space SDK to organize those steps.
|
||||
The aim of this repository is to help to export your data from JetBrains Space usin Space SDK.
|
||||
|
||||
## Setting up Space Application
|
||||
|
||||
@ -18,39 +12,60 @@ In order to access data in Space, one needs to [create a Space Application](http
|
||||
* View book metadata
|
||||
* View content
|
||||
|
||||
For restricted projects, one needs to manually add the project and its permission to allowed.
|
||||
For restricted projects, one needs to manually add the project and its permission.
|
||||
|
||||
Then one needs to copy `clientId` and `clientSecret` for the application and use them as command line parameters.
|
||||
|
||||
## Downloading texts
|
||||
## Export documents
|
||||
Initially, the main idea was to export Space documents. Those documents are written in MarkDown format and could include images and file references, but do not have a dedicated API to download them. In order to do that, one has to do several steps:
|
||||
|
||||
* Download a page as markdown to a directory.
|
||||
* Download attached images to specific directory.
|
||||
* Replace references to attachments in MarkDown files.
|
||||
|
||||
### Downloading texts
|
||||
|
||||
Text and binary documents are processed recursively starting at given `folderId` or project root if it is not defined.
|
||||
## Download images
|
||||
|
||||
### Download images
|
||||
|
||||
The images in space documents are inserted in the following format: `![](/d/aaaabbbbcccc?f=0 "name.png")`. Our aim is to detect those links in files and download appropriate images. Those links could not be replaced directly, because access requires OAuth authentication. For that we need to use access token from Space SDK.
|
||||
|
||||
## Replace references
|
||||
### Replace references
|
||||
|
||||
After file is successfully downloaded, the reference in file must be replaced with a local one.
|
||||
After the file is successfully downloaded, the reference in file must be replaced with a local one.
|
||||
|
||||
## Command line interface
|
||||
### Document conversion with Pandoc
|
||||
|
||||
```commandline
|
||||
Usage: space-document-extractor options_list
|
||||
Options:
|
||||
--spaceUrl -> Url of the space instance like 'https://mipt-npm.jetbrains.space' (always required) { String }
|
||||
--project -> The key of the exported project (always required) { String }
|
||||
--path -> Target directory. Default is './output/project-key'. { String }
|
||||
--folderId -> FolderId for the folder to export. By default uses project root. { String }
|
||||
--clientId -> Space application client ID (if not defined, use environment value 'space.clientId') { String }
|
||||
--clientSecret -> Space application client secret (if not defined, use environment value 'space.clientSecret') { String }
|
||||
--help, -h -> Usage info
|
||||
The package also includes an automatic conversion of documents via pandoc. See CLI keys reference for details.
|
||||
|
||||
### CLI for document download
|
||||
|
||||
The CLI for document extraction is the following:
|
||||
```
|
||||
./space-export docs --clientId <Client ID> --clientSecret <Client Secret> <optional keys> <mandatory Space page URL>
|
||||
```
|
||||
|
||||
Typical application usage:
|
||||
The URL could be either a folder page or a project page. If it is a project page, all documents in the project are exported.
|
||||
|
||||
```commandline
|
||||
.\space-document-extractor --spaceUrl "your space URL" --project "your project key" --clientId "your client ID" --clientSecret "your client secret"
|
||||
## Export repositories
|
||||
|
||||
This is straight-forward. It scans projects for repositories and then clones them, using a system git and default user SSH key (it is possible to add custom SSH certificate in the future).
|
||||
|
||||
CLI is the same as for documents, but takes only project root as URL.
|
||||
|
||||
## Export chat history
|
||||
|
||||
Chat history is exported the same way as documents (including threads).
|
||||
|
||||
URL for chats is either a specific chat page (without threads for now) or a Space base URL (in this case, all chats will be exported).
|
||||
|
||||
## Export direct messages
|
||||
|
||||
Direct messages require different treatment because they require authorization on behalf of the user. In order to do so, one needs to create a personal token (Search for `Personal token` in the search) with global `View direct messages`, `View messages` and `View profile` access. Then use it with a `--token` key like this:
|
||||
|
||||
```
|
||||
./space-export --token <Token string> <URL>
|
||||
```
|
||||
|
||||
It will download all documents and postprocess markdown files, replacing image links with downloaded image in `images` directory (each subdirectory will have its own `images`.
|
||||
Url could be either a base space Url or an Url of the chat.
|
@ -24,7 +24,11 @@ private abstract class ExtractCommand(name: String, description: String) : Subco
|
||||
|
||||
val url by argument(
|
||||
ArgType.String,
|
||||
description = "Url of the folder like 'https://spc.jetbrains.space/p/mipt-npm/documents/folders?f=SPC-qn7al1VorKp' or 'https://spc.jetbrains.space/p/mipt-npm/documents/SPC/f/SPC-qn7al1VorKp?f=SPC-qn7al1VorKp'"
|
||||
description = """
|
||||
Root IRL of the space Url like `https://spc.jetbrains.space`.
|
||||
OR
|
||||
Url of a specific conversation like: `https://spc.jetbrains.space/im/user/TestAccount`.
|
||||
""".trimIndent()
|
||||
)
|
||||
|
||||
val clientId by option(
|
||||
@ -218,12 +222,16 @@ private class ExtractDirectCommand : Subcommand("direct", "Extract direct messag
|
||||
|
||||
val url by argument(
|
||||
ArgType.String,
|
||||
description = "Url of the folder like 'https://spc.jetbrains.space/p/mipt-npm/documents/folders?f=SPC-qn7al1VorKp' or 'https://spc.jetbrains.space/p/mipt-npm/documents/SPC/f/SPC-qn7al1VorKp?f=SPC-qn7al1VorKp'"
|
||||
description = """
|
||||
Root IRL of the space Url like `https://spc.jetbrains.space`.
|
||||
OR
|
||||
Url of a specific conversation like: `https://spc.jetbrains.space/im/user/TestAccount`.
|
||||
""".trimIndent()
|
||||
)
|
||||
|
||||
val token by option(
|
||||
ArgType.String,
|
||||
description = "A permanent token. Must have 'View direct messages', 'View messages' and 'View profile' access."
|
||||
description = "A permanent token. Must have `View direct messages`, `View messages` and `View profile` access."
|
||||
).required()
|
||||
|
||||
val path: String by option(
|
||||
|
Loading…
Reference in New Issue
Block a user