langchain.document_loaders.docugami.DocugamiLoader¶

class langchain.document_loaders.docugami.DocugamiLoader(*, api: str = 'https://api.docugami.com/v1preview1', access_token: Optional[str] = None, docset_id: Optional[str] = None, document_ids: Optional[Sequence[str]] = None, file_paths: Optional[Sequence[Union[Path, str]]] = None, min_chunk_size: int = 32)[source]¶

Bases: BaseLoader, BaseModel

Loads processed docs from Docugami.

To use, you should have the lxml python package installed.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

param access_token: Optional[str] = None¶

The Docugami API access token to use.

param api: str = 'https://api.docugami.com/v1preview1'¶

The Docugami API endpoint to use.

param docset_id: Optional[str] = None¶

The Docugami API docset ID to use.

param document_ids: Optional[Sequence[str]] = None¶

The Docugami API document IDs to use.

param file_paths: Optional[Sequence[Union[pathlib.Path, str]]] = None¶

The local file paths to use.

param min_chunk_size: int = 32¶

The minimum chunk size to use when parsing DGML. Defaults to 32.

lazy_load() Iterator[Document]¶

A lazy loader for Documents.

load() List[Document][source]¶

Load documents.

load_and_split(text_splitter: Optional[TextSplitter] = None) List[Document]¶

Load Documents and split into chunks. Chunks are returned as Documents.

Parameters

text_splitter – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Returns

List of Documents.

validator validate_local_or_remote  »  all fields[source]¶

Validate that either local file paths are given, or remote API docset ID.

Parameters

values – The values to validate.

Returns

The validated values.

Examples using DocugamiLoader¶