`langchain.document_loaders.docugami`.DocugamiLoader¶

class langchain.document_loaders.docugami.DocugamiLoader(*, api: str = 'https://api.docugami.com/v1preview1', access_token: Optional[str] = None, docset_id: Optional[str] = None, document_ids: Optional[Sequence[str]] = None, file_paths: Optional[Sequence[Union[Path, str]]] = None, min_chunk_size: int = 32)[source]¶

Bases: BaseLoader, BaseModel

Loads processed docs from Docugami.

To use, you should have the lxml python package installed.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

param access_token: Optional[str] = None¶: The Docugami API access token to use.

param api: str = 'https://api.docugami.com/v1preview1'¶: The Docugami API endpoint to use.

param docset_id: Optional[str] = None¶: The Docugami API docset ID to use.

param document_ids: Optional[Sequence[str]] = None¶: The Docugami API document IDs to use.

param file_paths: Optional[Sequence[Union[pathlib.Path, str]]] = None¶: The local file paths to use.

param min_chunk_size: int = 32¶: The minimum chunk size to use when parsing DGML. Defaults to 32.

lazy_load() → Iterator[Document]¶: A lazy loader for Documents.

load() → List[Document][source]¶: Load documents.

load_and_split(text_splitter: Optional[TextSplitter] = None) → List[Document]¶

Load Documents and split into chunks. Chunks are returned as Documents.

Parameters: text_splitter – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
Returns: List of Documents.

validator validate_local_or_remote » all fields[source]¶

Validate that either local file paths are given, or remote API docset ID.

Parameters: values – The values to validate.
Returns: The validated values.

Examples using DocugamiLoader¶

Docugami

langchain.document_loaders.docugami.DocugamiLoader¶

Examples using DocugamiLoader¶

`langchain.document_loaders.docugami`.DocugamiLoader¶