`langchain.document_loaders.generic`.GenericLoader¶

class langchain.document_loaders.generic.GenericLoader(blob_loader: BlobLoader, blob_parser: BaseBlobParser)[source]¶

Bases: BaseLoader

A generic document loader.

A generic document loader that allows combining an arbitrary blob loader with a blob parser.

Examples

from langchain.document_loaders import GenericLoader from langchain.document_loaders.blob_loaders import FileSystemBlobLoader

loader = GenericLoader.from_filesystem(: path=”path/to/directory”, glob=”**/[!.]*”, suffixes=[“.pdf”], show_progress=True,

)

docs = loader.lazy_load() next(docs)

Example instantiations to change which files are loaded:

… code-block:: python

# Recursively load all text files in a directory. loader = GenericLoader.from_filesystem(“/path/to/dir”, glob=”**/*.txt”)

# Recursively load all non-hidden files in a directory. loader = GenericLoader.from_filesystem(“/path/to/dir”, glob=”**/[!.]*”)

# Load all files in a directory without recursion. loader = GenericLoader.from_filesystem(“/path/to/dir”, glob=”*”)

Example instantiations to change which parser is used:

… code-block:: python

from langchain.document_loaders.parsers.pdf import PyPDFParser

# Recursively load all text files in a directory. loader = GenericLoader.from_filesystem(

“/path/to/dir”, glob=”**/*.pdf”, parser=PyPDFParser()

)

A generic document loader.

Parameters

blob_loader – A blob loader which knows how to yield blobs
blob_parser – A blob parser which knows how to parse blobs into documents

Methods

`__init__`(blob_loader, blob_parser)	A generic document loader.
`from_filesystem`(path, *[, glob, suffixes, ...])	Create a generic document loader using a filesystem blob loader.
`lazy_load`()	Load documents lazily.
`load`()	Load all documents.
`load_and_split`([text_splitter])	Load all documents and split them into sentences.

classmethod from_filesystem(path: Union[str, Path], *, glob: str = '**/[!.]*', suffixes: Optional[Sequence[str]] = None, show_progress: bool = False, parser: Union[Literal['default'], BaseBlobParser] = 'default') → GenericLoader[source]¶

Create a generic document loader using a filesystem blob loader.

Parameters

path – The path to the directory to load documents from.
glob – The glob pattern to use to find documents.
suffixes – The suffixes to use to filter documents. If None, all files matching the glob will be loaded.
show_progress – Whether to show a progress bar or not (requires tqdm). Proxies to the file system loader.
parser – A blob parser which knows how to parse blobs into documents

Returns

A generic document loader.

lazy_load() → Iterator[Document][source]¶: Load documents lazily. Use this when working at a large scale.

load() → List[Document][source]¶: Load all documents.

load_and_split(text_splitter: Optional[TextSplitter] = None) → List[Document][source]¶: Load all documents and split them into sentences.

Examples using GenericLoader¶

langchain.document_loaders.generic.GenericLoader¶

Examples using GenericLoader¶

`langchain.document_loaders.generic`.GenericLoader¶