| Title | zilliztech deep-searcher 0.0.2 Improper Authorization |
|---|
| Description | ## Vulnerability Title
zilliztech deep-searcher CollectionRouter authorization context ignored
## Affected Component
`deepsearcher.agent.collection_router.CollectionRouter`
Repository: https://github.com/zilliztech/deep-searcher
## Summary
The `CollectionRouter.invoke()` method routes queries to vector database collections without considering caller authorization context. It accepts `**kwargs`, but collection selection only uses the query text and globally listed collection names/descriptions.
In deployments where collections represent different tenants, users, roles, or permission scopes, a caller can be routed to a collection outside their authorized collection set. Downstream RAG implementations then search the selected collections directly.
## Technical Details
`deepsearcher/agent/collection_router.py` defines:
def invoke(self, query: str, dim: int, **kwargs) -> Tuple[List[str], int]:
However, `kwargs` is not used. The method obtains all collection metadata through:
collection_infos = self.vector_db.list_collections(dim=dim)
The prompt sent to the LLM includes only:
{
"collection_name": collection_info.collection_name,
"collection_description": collection_info.description,
}
The selected collection names are returned without checking tenant, user, permissions, or an authorized collection allowlist.
The result is then used by RAG agents. For example, `deepsearcher/agent/naive_rag.py` calls:
selected_collections, n_token_route = self.collection_router.invoke(
query=query, dim=self.embedding_model.dimension
)
and then searches each selected collection:
self.vector_db.search_data(collection=collection, ...)
Similar patterns exist in `deepsearcher/agent/deep_search.py` and `deepsearcher/agent/chain_of_rag.py`.
## Attack Scenario
Assume a multi-tenant deployment with these collections:
- `board_docs`: restricted to executive users.
- `public_docs`: available to public users.
An executive user has:
{
"tenant_id": "tenant_a",
"user_id": "alice",
"permissions": "executive-read",
"authorized_collection_set": ["board_docs", "public_docs"]
}
A public user has:
{
"tenant_id": "tenant_b",
"user_id": "mallory",
"permissions": "public-read",
"authorized_collection_set": ["public_docs"]
}
For the same query, such as `quarterly revenue`, the router can select `board_docs` for both users because it does not evaluate the authorization context. If the downstream vector database does not independently enforce access control, the public user can retrieve restricted board content.
## Proof of Concept
The following behavioral check demonstrates the issue:
selected_victim, _ = router.invoke(
"quarterly revenue",
dim=embedding.dimension,
tenant_id="tenant_a",
user_id="alice",
permissions="executive-read",
authorized_collection_set=["board_docs", "public_docs"],
)
selected_attacker, _ = router.invoke(
"quarterly revenue",
dim=embedding.dimension,
tenant_id="tenant_b",
user_id="mallory",
permissions="public-read",
authorized_collection_set=["public_docs"],
)
print(selected_victim)
print(selected_attacker)
Observed behavior:
Victim selected collections: ['board_docs']
Attacker selected collections: ['board_docs']
Expected behavior:
Attacker selected collections should not include board_docs.
## Impact
Unauthorized users may be able to search or retrieve documents from collections outside their intended authorization scope. This can lead to disclosure of confidential enterprise data when DeepSearcher is exposed as a shared multi-user or multi-tenant RAG service.
## Preconditions
- The application exposes DeepSearcher retrieval/query APIs to multiple users or tenants.
- Collections are used as a security boundary.
- The application expects DeepSearcher to honor authorization context or does not enforce collection-level access control before vector search.
- The underlying vector database does not independently block unauthorized collection access.
## Severity
Suggested severity: Medium
Rationale: the issue can expose restricted retrieval results in common multi-user RAG deployments, but exploitability depends on the surrounding application treating collections as an authorization boundary.
## Suggested Remediation
- Filter `CollectionRouter` candidates by caller authorization before sending collection metadata to the LLM.
- Intersect selected collections with an explicit authorized collection set before calling `search_data()`.
- Forward and enforce authorization context consistently in `NaiveRAG`, `DeepSearch`, and `ChainOfRAG`.
- Document that DeepSearcher does not provide collection-level authorization if this behavior is intentional.
## References
- `deepsearcher/agent/collection_router.py`
- `deepsearcher/agent/naive_rag.py`
- `deepsearcher/agent/deep_search.py`
- `deepsearcher/agent/chain_of_rag.py`
## Reporter Notes
No repository-native tenant/user/permission model was found. This report is therefore most applicable to deployments that layer a multi-user or multi-tenant service on top of DeepSearcher. |
|---|
| Source | ⚠️ https://github.com/zilliztech/deep-searcher/issues/267 |
|---|
| User | Dem000 (UID 98389) |
|---|
| Submission | 05/20/2026 04:24 (19 days ago) |
|---|
| Moderation | 06/07/2026 11:20 (18 days later) |
|---|
| Status | Accepted |
|---|
| VulDB entry | 369086 [zilliztech deep-searcher up to 0.0.2 collection_router.py CollectionRouter.invoke kwargs access control] |
|---|
| Points | 20 |
|---|