langchain.retrievers.document_compressors.embeddings_filter.EmbeddingsFilter¶
- class langchain.retrievers.document_compressors.embeddings_filter.EmbeddingsFilter(*, embeddings: ~langchain.embeddings.base.Embeddings, similarity_fn: ~typing.Callable = <function cosine_similarity>, k: ~typing.Optional[int] = 20, similarity_threshold: ~typing.Optional[float] = None)[source]¶
Bases:
BaseDocumentCompressorDocument compressor that uses embeddings to drop documents unrelated to the query.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param embeddings: langchain.embeddings.base.Embeddings [Required]¶
Embeddings to use for embedding document contents and queries.
- param k: Optional[int] = 20¶
The number of relevant documents to return. Can be set to None, in which case similarity_threshold must be specified. Defaults to 20.
- param similarity_fn: Callable = <function cosine_similarity>¶
Similarity function for comparing documents. Function expected to take as input two matrices (List[List[float]]) and return a matrix of scores where higher values indicate greater similarity.
- param similarity_threshold: Optional[float] = None¶
Threshold for determining when two documents are similar enough to be considered redundant. Defaults to None, must be specified if k is set to None.
- async acompress_documents(documents: Sequence[Document], query: str, callbacks: Optional[Union[List[BaseCallbackHandler], BaseCallbackManager]] = None) Sequence[Document][source]¶
Filter down documents.
- compress_documents(documents: Sequence[Document], query: str, callbacks: Optional[Union[List[BaseCallbackHandler], BaseCallbackManager]] = None) Sequence[Document][source]¶
Filter documents based on similarity of their embeddings to the query.