Soumettre #833652: zilliztech deep-searcher 0.0.2 Improper Authorizationinformation

Titre	zilliztech deep-searcher 0.0.2 Improper Authorization
Description	## Vulnerability Title zilliztech deep-searcher CollectionRouter authorization context ignored ## Affected Component `deepsearcher.agent.collection_router.CollectionRouter` Repository: https://github.com/zilliztech/deep-searcher ## Summary The `CollectionRouter.invoke()` method routes queries to vector database collections without considering caller authorization context. It accepts `kwargs`, but collection selection only uses the query text and globally listed collection names/descriptions. In deployments where collections represent different tenants, users, roles, or permission scopes, a caller can be routed to a collection outside their authorized collection set. Downstream RAG implementations then search the selected collections directly. ## Technical Details `deepsearcher/agent/collection_router.py` defines: def invoke(self, query: str, dim: int, kwargs) -> Tuple[List[str], int]: However, `kwargs` is not used. The method obtains all collection metadata through: collection_infos = self.vector_db.list_collections(dim=dim) The prompt sent to the LLM includes only: { "collection_name": collection_info.collection_name, "collection_description": collection_info.description, } The selected collection names are returned without checking tenant, user, permissions, or an authorized collection allowlist. The result is then used by RAG agents. For example, `deepsearcher/agent/naive_rag.py` calls: selected_collections, n_token_route = self.collection_router.invoke( query=query, dim=self.embedding_model.dimension ) and then searches each selected collection: self.vector_db.search_data(collection=collection, ...) Similar patterns exist in `deepsearcher/agent/deep_search.py` and `deepsearcher/agent/chain_of_rag.py`. ## Attack Scenario Assume a multi-tenant deployment with these collections: - `board_docs`: restricted to executive users. - `public_docs`: available to public users. An executive user has: { "tenant_id": "tenant_a", "user_id": "alice", "permissions": "executive-read", "authorized_collection_set": ["board_docs", "public_docs"] } A public user has: { "tenant_id": "tenant_b", "user_id": "mallory", "permissions": "public-read", "authorized_collection_set": ["public_docs"] } For the same query, such as `quarterly revenue`, the router can select `board_docs` for both users because it does not evaluate the authorization context. If the downstream vector database does not independently enforce access control, the public user can retrieve restricted board content. ## Proof of Concept The following behavioral check demonstrates the issue: selected_victim, _ = router.invoke( "quarterly revenue", dim=embedding.dimension, tenant_id="tenant_a", user_id="alice", permissions="executive-read", authorized_collection_set=["board_docs", "public_docs"], ) selected_attacker, _ = router.invoke( "quarterly revenue", dim=embedding.dimension, tenant_id="tenant_b", user_id="mallory", permissions="public-read", authorized_collection_set=["public_docs"], ) print(selected_victim) print(selected_attacker) Observed behavior: Victim selected collections: ['board_docs'] Attacker selected collections: ['board_docs'] Expected behavior: Attacker selected collections should not include board_docs. ## Impact Unauthorized users may be able to search or retrieve documents from collections outside their intended authorization scope. This can lead to disclosure of confidential enterprise data when DeepSearcher is exposed as a shared multi-user or multi-tenant RAG service. ## Preconditions - The application exposes DeepSearcher retrieval/query APIs to multiple users or tenants. - Collections are used as a security boundary. - The application expects DeepSearcher to honor authorization context or does not enforce collection-level access control before vector search. - The underlying vector database does not independently block unauthorized collection access. ## Severity Suggested severity: Medium Rationale: the issue can expose restricted retrieval results in common multi-user RAG deployments, but exploitability depends on the surrounding application treating collections as an authorization boundary. ## Suggested Remediation - Filter `CollectionRouter` candidates by caller authorization before sending collection metadata to the LLM. - Intersect selected collections with an explicit authorized collection set before calling `search_data()`. - Forward and enforce authorization context consistently in `NaiveRAG`, `DeepSearch`, and `ChainOfRAG`. - Document that DeepSearcher does not provide collection-level authorization if this behavior is intentional. ## References - `deepsearcher/agent/collection_router.py` - `deepsearcher/agent/naive_rag.py` - `deepsearcher/agent/deep_search.py` - `deepsearcher/agent/chain_of_rag.py` ## Reporter Notes No repository-native tenant/user/permission model was found. This report is therefore most applicable to deployments that layer a multi-user or multi-tenant service on top of DeepSearcher.
La source	⚠️ https://github.com/zilliztech/deep-searcher/issues/267
Utilisateur	Dem000 (UID 98389)
Soumission	20/05/2026 04:24 (il y a 20 jours)
Modérer	07/06/2026 11:20 (18 days later)
Statut	Accepté
Entrée VulDB	369086 [zilliztech deep-searcher jusqu’à 0.0.2 collection_router.py CollectionRouter.invoke kwargs élévation de privilèges]
Points	20

◂ Précédent Aperçu Suivant ▸

Do you want to use VulDB in your project?

Use the official API to access entries easily!