Skip to content

Embeddings

Generate vector embeddings from text. The endpoint is OpenAI-compatible and works with any client that targets the OpenAI Embeddings API.

POST /v1/embeddings
{
"model": "text-embedding-3-small",
"input": "The quick brown fox jumps over the lazy dog"
}

input also accepts an array of strings for batch embedding:

{
"model": "text-embedding-3-small",
"input": ["first sentence", "second sentence"]
}
FieldTypeRequiredDescription
modelstringNo*Embedding model to use. Defaults to llm.default_embedding_model if configured
inputstring or string[]YesText to embed

*Required if llm.default_embedding_model is not set in config.

{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023064255, -0.009327292, ...]
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 9,
"total_tokens": 9
}
}

Both Relay-issued keys (gr-...) and passthrough keys work:

Terminal window
# Relay-issued key — uses server-configured OpenAI/Anthropic credentials
curl https://relay.company.com/v1/embeddings \
-H "Authorization: Bearer gr-..." \
-H "Content-Type: application/json" \
-d '{"model": "text-embedding-3-small", "input": "Hello world"}'
# Passthrough — your own OpenAI key, routed through Relay middleware
curl https://relay.company.com/v1/embeddings \
-H "Authorization: Bearer sk-..." \
-H "Content-Type: application/json" \
-d '{"model": "text-embedding-3-small", "input": "Hello world"}'
from openai import OpenAI
client = OpenAI(
api_key="gr-...",
base_url="https://relay.company.com/v1",
)
response = client.embeddings.create(
model="text-embedding-3-small",
input="Hello world",
)
print(response.data[0].embedding)

Any embedding model supported by LiteLLM works — the model name is passed through directly. Common options:

ProviderModel
OpenAItext-embedding-3-small, text-embedding-3-large, text-embedding-ada-002
AnthropicNot supported (no embedding API)
Cohereembed-english-v3.0, embed-multilingual-v3.0
Azure OpenAIazure/text-embedding-3-small

Set a default embedding model so clients don’t need to specify it every request:

llm:
default_embedding_model: text-embedding-3-small

Helm: config.llm.defaultEmbeddingModel