As a Node module dependency, the engine exposes a JavaScript API that can be called in your own code. The following modules are available.
fetch
The fetch
module gets the MIME type and content of a document from its URL
import fetch from '@opentermsarchive/engine/fetch';
Documentation on how to use fetch
is provided as JSDoc.
If you pass the executeClientScripts
option to fetch
, a headless browser will be used to download and execute the page before serialising its DOM. For performance reasons, the starting and stopping of the browser is your responsibility to avoid instantiating a browser on each fetch. Here is an example on how to use this feature:
import fetch, { launchHeadlessBrowser, stopHeadlessBrowser } from '@opentermsarchive/engine/fetch';
await launchHeadlessBrowser();
await fetch({ executeClientScripts: true, ... });
await fetch({ executeClientScripts: true, ... });
await fetch({ executeClientScripts: true, ... });
await stopHeadlessBrowser();
The fetch
module options are defined as a node-config
submodule. The default fetcher
configuration can be overridden by adding a fetcher
object to the local configuration file.
extract
The extract
module transforms HTML or PDF content into a Markdown string according to a declaration.
import extract from '@opentermsarchive/engine/extract';
The extract
function documentation is available as JSDoc.
SourceDocument
The SourceDocument
class encapsulates information about a terms’ source document tracked by Open Terms Archive.
import SourceDocument from '@opentermsarchive/engine/sourceDocument';
The SourceDocument
format is defined in source code.