Retrieval underpins modern AI systems, and the quality of the embedding model determines how effectively applications can find and reason over enterprise data. Today we are launching Qwen3-Embedding-0.6B on Databricks, a state-of-the-art embedding model delivering strong retrieval performance, multilingual coverage, and secure serverless deployment.
Together with Agent Bricks and Vector Search, this model enables teams to build AI agents directly on enterprise data in Databricks, retrieving relevant context and reasoning over governed data without moving data outside the platform.
Build Retrieval-Powered Agents with Agent Bricks
State-of-the-art embedding models are a critical foundation for modern AI systems, enabling applications to retrieve the right context from large collections of enterprise data. Qwen3-Embedding-0.6B, now available on Databricks, delivers strong retrieval performance for these workloads.
Qwen3-Embedding-0.6B is built on the powerful Qwen3 foundation and comes from the same research team behind the widely adopted GTE series. With a max context length of 32k tokens, this model provides incredible flexibility for chunking documents to various different sizes. Moreover, its instruction-aware design lets developers tailor the model to specific tasks and languages with a simple prompt, typically boosting retrieval performance by 1–5%.
On Databricks, this can be combined with Agent Bricks and Vector Search to build retrieval-powered AI agents directly on enterprise data. Teams can index documents with Vector Search and retrieve relevant context during agent execution, grounding agents in governed data stored in Databricks.
How This Embedding Model Improves AI Agents on Databricks
Qwen3-Embedding-0.6B delivers state-of-the-art quality for its size. On the MTEB multilingual and English v2 leaderboards, it outperforms most other 0.6B-class models and surpasses flagship embedding models from OpenAI and Cohere, while rivaling much larger 7B+ models. This means you can achieve top-tier retrieval performance without the latency and cost of very large models.
The model also offers fine-grained control over cost and recall through Matryoshka Representation Learning (MRL), which concentrates the most important information in the early vector dimensions. This allows embeddings to be safely truncated for cheaper storage and faster search while preserving most of the signal. With Qwen3-Embedding-0.6B, you can choose any embedding size from 32 to 1024 dimensions at request time—using smaller vectors for large-scale recall indexes and full-size vectors for higher-precision reranking.
To use this feature with databricks-qwen3-embedding-0-6b, set the optional dimensions field in your Embeddings REST API request to the desired output size (a power of two between 32 and 1024). See the Foundation Model REST API documentation for details.
Multilingual by Design
Qwen3-Embedding-0.6B is the first multilingual embedding model hosted by Databricks, designed for global workloads from the start. While many embedding models are English-first with limited multilingual support, Qwen3-Embedding-0.6B inherits broad language coverage from the Qwen3 base model, which was pretrained on text spanning more than 100 languages.
This enables strong performance not only for English retrieval but also for multilingual and cross-lingual tasks. Applications can search in one language and retrieve results in another, or support mixed-language datasets and code retrieval across multiple programming languages.
Secure Serverless Deployment
Like other Databricks-hosted foundation models, Qwen3-Embedding-0.6B runs on secure, fully managed serverless GPUs inside the Databricks platform.
Simply call the Foundation Model APIs, and Databricks handles provisioning, autoscaling, and reliability. Because the model runs on geo-aware, compliant infrastructure, you can keep embeddings close to your data, respect data residency requirements, and integrate retrieval directly with existing Databricks workloads.
Try out Qwen3-Embedding-0.6B today!
Whether you’re building semantic search, RAG pipelines, multilingual retrieval, or text classification systems, Qwen3-Embedding-0.6B offers an exceptional combination of speed, efficiency, and state-of-the-art accuracy. This model is available as databricks-qwen3-embedding-0-6b across all clouds in all regions that support Foundation Model Serving, and you can try out this model in the Databricks Serving page. It is available on all Model Serving surfaces: Pay-Per-Token, AI Functions (batch inference), and Provisioned Throughput. You can also select this model for Vector Search use cases.
