langchain.retrievers.web_research.WebResearchRetriever¶

class langchain.retrievers.web_research.WebResearchRetriever(*, tags: ~typing.Optional[~typing.List[str]] = None, metadata: ~typing.Optional[~typing.Dict[str, ~typing.Any]] = None, vectorstore: ~langchain.vectorstores.base.VectorStore, llm_chain: ~langchain.chains.llm.LLMChain, search: ~langchain.utilities.google_search.GoogleSearchAPIWrapper, num_search_results: int = 1, text_splitter: ~langchain.text_splitter.RecursiveCharacterTextSplitter = <langchain.text_splitter.RecursiveCharacterTextSplitter object>, url_database: ~typing.List[str] = None)[source]¶

Bases: BaseRetriever

Retriever for web research based on the Google Search API.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

param llm_chain: langchain.chains.llm.LLMChain [Required]¶
param metadata: Optional[Dict[str, Any]] = None¶

Optional metadata associated with the retriever. Defaults to None This metadata will be associated with each call to this retriever, and passed as arguments to the handlers defined in callbacks. You can use these to eg identify a specific instance of a retriever with its use case.

param num_search_results: int = 1¶

Number of pages per Google search

param search: langchain.utilities.google_search.GoogleSearchAPIWrapper [Required]¶

Google Search API Wrapper

param tags: Optional[List[str]] = None¶

Optional list of tags associated with the retriever. Defaults to None These tags will be associated with each call to this retriever, and passed as arguments to the handlers defined in callbacks. You can use these to eg identify a specific instance of a retriever with its use case.

param text_splitter: langchain.text_splitter.RecursiveCharacterTextSplitter = <langchain.text_splitter.RecursiveCharacterTextSplitter object>¶

Text splitter for splitting web pages into chunks

param url_database: List[str] [Optional]¶

List of processed URLs

param vectorstore: langchain.vectorstores.base.VectorStore [Required]¶

Vector store for storing web pages

async aget_relevant_documents(query: str, *, callbacks: Callbacks = None, tags: Optional[List[str]] = None, metadata: Optional[Dict[str, Any]] = None, **kwargs: Any) List[Document]¶

Asynchronously get documents relevant to a query. :param query: string to find relevant documents for :param callbacks: Callback manager or list of callbacks :param tags: Optional list of tags associated with the retriever. Defaults to None

These tags will be associated with each call to this retriever, and passed as arguments to the handlers defined in callbacks.

Parameters

metadata – Optional metadata associated with the retriever. Defaults to None This metadata will be associated with each call to this retriever, and passed as arguments to the handlers defined in callbacks.

Returns

List of relevant documents

async ainvoke(input: str, config: Optional[RunnableConfig] = None) List[Document]¶
clean_search_query(query: str) str[source]¶
classmethod from_llm(vectorstore: ~langchain.vectorstores.base.VectorStore, llm: ~langchain.llms.base.BaseLLM, search: ~langchain.utilities.google_search.GoogleSearchAPIWrapper, prompt: ~typing.Optional[~langchain.schema.prompt_template.BasePromptTemplate] = None, num_search_results: int = 1, text_splitter: ~langchain.text_splitter.RecursiveCharacterTextSplitter = <langchain.text_splitter.RecursiveCharacterTextSplitter object>) WebResearchRetriever[source]¶

Initialize from llm using default template.

Parameters
  • vectorstore – Vector store for storing web pages

  • llm – llm for search question generation

  • search – GoogleSearchAPIWrapper

  • prompt – prompt to generating search questions

  • num_search_results – Number of pages per Google search

  • text_splitter – Text splitter for splitting web pages into chunks

Returns

WebResearchRetriever

get_relevant_documents(query: str, *, callbacks: Callbacks = None, tags: Optional[List[str]] = None, metadata: Optional[Dict[str, Any]] = None, **kwargs: Any) List[Document]¶

Retrieve documents relevant to a query. :param query: string to find relevant documents for :param callbacks: Callback manager or list of callbacks :param tags: Optional list of tags associated with the retriever. Defaults to None

These tags will be associated with each call to this retriever, and passed as arguments to the handlers defined in callbacks.

Parameters

metadata – Optional metadata associated with the retriever. Defaults to None This metadata will be associated with each call to this retriever, and passed as arguments to the handlers defined in callbacks.

Returns

List of relevant documents

invoke(input: str, config: Optional[RunnableConfig] = None) List[Document]¶
search_tool(query: str, num_search_results: int = 1) List[dict][source]¶

Returns num_serch_results pages per Google search.

to_json() Union[SerializedConstructor, SerializedNotImplemented]¶
to_json_not_implemented() SerializedNotImplemented¶
property lc_attributes: Dict¶

Return a list of attribute names that should be included in the serialized kwargs. These attributes must be accepted by the constructor.

property lc_namespace: List[str]¶

Return the namespace of the langchain object. eg. [“langchain”, “llms”, “openai”]

property lc_secrets: Dict[str, str]¶

Return a map of constructor argument names to secret ids. eg. {“openai_api_key”: “OPENAI_API_KEY”}

property lc_serializable: bool¶

Return whether or not the class is serializable.

model Config¶

Bases: object

Configuration for this pydantic object.

arbitrary_types_allowed = True¶

Examples using WebResearchRetriever¶