langchain.document_loaders.docugami.DocugamiLoader¶
- class langchain.document_loaders.docugami.DocugamiLoader(*, api: str = 'https://api.docugami.com/v1preview1', access_token: Optional[str] = None, docset_id: Optional[str] = None, document_ids: Optional[Sequence[str]] = None, file_paths: Optional[Sequence[Union[Path, str]]] = None, min_chunk_size: int = 32)[source]¶
Bases:
BaseLoader,BaseModelLoads processed docs from Docugami.
To use, you should have the
lxmlpython package installed.Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param access_token: Optional[str] = None¶
The Docugami API access token to use.
- param api: str = 'https://api.docugami.com/v1preview1'¶
The Docugami API endpoint to use.
- param docset_id: Optional[str] = None¶
The Docugami API docset ID to use.
- param document_ids: Optional[Sequence[str]] = None¶
The Docugami API document IDs to use.
- param file_paths: Optional[Sequence[Union[pathlib.Path, str]]] = None¶
The local file paths to use.
- param min_chunk_size: int = 32¶
The minimum chunk size to use when parsing DGML. Defaults to 32.
- load_and_split(text_splitter: Optional[TextSplitter] = None) List[Document]¶
Load Documents and split into chunks. Chunks are returned as Documents.
- Parameters
text_splitter – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
- Returns
List of Documents.