One Collection to Rule Them All: Efficient Multitenancy in Qdrant

This is a condensed version of my article. You can read the full story on Medium.

1,000 Users ≠ 1,000 Collections

Building a multi-user AI application with thousands of tenants? Creating a separate collection for each user might seem clean, but it leads to high costs and cluster instability.

The solution is Multitenancy: letting tenants share the same database infrastructure while keeping their data private. Think of it as an apartment building—everyone has their own private space, but they share the same foundation.

The Simple Solution: Payload-based Multitenancy

Instead of separate collections, use one collection and tag each vector with a tenant identifier (e.g., group_id).

Implementation

Upserting Data:

client.upsert(
    collection_name="multitenant_collection",
    points=[
        models.PointStruct(id=1, vector=[...], payload={"group_id": "user_1"}),
        models.PointStruct(id=2, vector=[...], payload={"group_id": "user_2"})
    ]
)

Querying Data:

client.search(
    collection_name="multitenant_collection",
    query_vector=[...],
    query_filter=models.Filter(
        must=[
            models.FieldCondition(key="group_id", match=models.MatchValue(value="user_1"))
        ]
    )
)

Optimization: Set is_tenant=true on the payload index to co-locate vectors for the same tenant, ensuring fast, sequential disk reads. tenant

The “Noisy Neighbor” Problem

Payload-based multitenancy works great until one tenant becomes a “Whale”, a high-volume user that dominates shared resources, degrading performance for everyone else.

The Advanced Solution: Tiered Multitenancy

Qdrant 1.16 introduces Tiered Multitenancy, a hybrid architecture where:

Small tenants share a “fallback” shard efficiently.
Large tenants (Whales) get promoted to their own dedicated shards.

How it works

Enable Custom Sharding: Create a collection with sharding_method=Custom.
Create a Fallback Shard: A shared space for all small tenants.
Tenant Promotion: Seamlessly move a growing tenant from the fallback shard to a dedicated shard without downtime.

# Routing requests
client.upsert(...,
    shard_key_selector=models.ShardKeyWithFallback(
        target="user_1",   # Try dedicated shard
        fallback="default" # Use shared shard if dedicated doesn't exist
    )
)

This might summarize the key differences between the two multitenancy strategies :

tab

Conclusion

Start with standard, payload-based multitenancy, it’s simple and cost-effective. As your application grows and “power users” emerge, adopt Tiered Multitenancy to promote them to dedicated resources.

Share this article:

(END)