Reference
Rate Limits
Message and request limits by plan.
How rate limits work
Rate limits are applied per account across a rolling 3-hour window. Both chat UI messages and API requests count toward the same limit.
Limits by plan
| Plan | Rate limit | Window |
|---|---|---|
| Free | Standard | 3 hours |
| Pro | 10x standard | 3 hours |
| Enterprise | Custom | Custom |
What happens when you hit the limit
In the chat UI
A modal appears informing you that you've reached your limit. You can:
- Upgrade to Pro for 10x the limit
- Wait for the window to reset
Via the API
The API returns a 429 Too Many Requests response:
{
"error": {
"message": "Rate limit exceeded. Please try again later.",
"type": "rate_limit_error"
}
}Include retry logic in your application:
import time
from openai import OpenAI, RateLimitError
client = OpenAI(
base_url="https://app.commissioned.tech/v1",
api_key="your-api-key",
)
def chat_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="your-model-id",
messages=messages,
)
except RateLimitError:
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # exponential backoff
else:
raiseTips for staying within limits
- Cache responses — if the same question comes up often, cache the answer
- Batch requests — combine multiple questions into a single multi-turn conversation
- Use shorter conversations — trim message history to reduce per-request token usage
- Upgrade to Pro — the simplest solution if you consistently hit limits
Enterprise plans include custom rate limits tailored to your usage. Contact sales to discuss.