Rate Limiting
Serenity* Star API implements rate limiting to ensure fair usage, maintain system stability, and protect against abuse. This guide explains how rate limiting works and how to handle rate limit responses in your applications.
Overview
Rate limiting controls the number of API requests a client can make within specific time windows. When you exceed the allowed limits, the API returns a 429 Too Many Requests HTTP status code.
Rate limits are applied at multiple levels:
- General Limits: Default limits that apply to all API requests
- Subscription Plan Limits: Custom limits based on your subscription plan
- Agent-Level Limits: Specific limits configured per agent
How Rate Limiting Works
Client Identification
Rate limits are applied per tenant (organization). The system identifies your tenant using the API key provided in the X-API-KEY header of your requests.
Rate Limit Windows
Rate limits are enforced across four time windows:
| Time Window | Description |
|---|---|
| Per Second | Maximum requests allowed per second |
| Per Minute | Maximum requests allowed per minute |
| Per Hour | Maximum requests allowed per hour |
| Per 12 Hours | Maximum requests allowed per 12-hour period |
Affected Endpoints
Subscription-based rate limits apply to specific high-resource endpoints:
- Agent Execution:
POST /api/agent/{agentCode}/executeandPOST /api/v*/agent/{agentCode}/execute - Conversation Management:
POST /api/agent/{agentCode}/conversationandPATCH /api/agent/{agentCode}/conversation/{conversationId}/context - Audio Transcription:
POST /api/audio/transcribe - Real-time Connections:
POST /api/agent/{agentCode}/realtime - Volatile Knowledge:
POST /api/volatileknowledge - Connector Status:
GET /api/connection/agentInstance/{id}/connector/{connectorId}/status
Other endpoints use the general rate limits which are typically more permissive.
All affected endpoints share the same rate limit counter for your tenant. This means that calls to different endpoints are combined when calculating your rate limit usage.
Example: If your plan allows 100 requests per minute:
- 50 calls to
/agent/{code}/execute - 30 calls to
/api/audio/transcribe - 20 calls to
/api/volatileknowledge
This totals 100 requests, reaching your limit for that minute.
Nested Agent Executions
When an agent executes another agent internally (e.g., Agent A calls Agent B as part of its workflow), each execution counts separately toward your rate limit.
Example: If Agent A executes and internally triggers Agent B:
- 1 request counted for Agent A execution
- 1 request counted for Agent B execution
- Total: 2 requests against your rate limit
This applies to any level of nesting. Plan your agent architecture accordingly if you have agents that trigger other agents.
Rate Limit Responses
When you exceed a rate limit, the API responds with:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 60
The Retry-After header indicates the number of seconds to wait before making another request.
Example Error Response
{
"error": "Rate limit exceeded",
"message": "Too many requests. Please retry after 60 seconds.",
"retryAfter": 60
}
Rate Limits by Subscription Plan
Your subscription plan determines your rate limits. Higher-tier plans typically include higher limits to accommodate increased API usage.
| Feature | Description |
|---|---|
| Rate Limit - Per Second | Maximum API calls allowed per second |
| Rate Limit - Per Minute | Maximum API calls allowed per minute |
| Rate Limit - Per Hour | Maximum API calls allowed per hour |
| Rate Limit - Per 12 Hours | Maximum API calls allowed per 12-hour period |
To view your current plan's rate limits, check your subscription details in the Serenity* Star dashboard under Billing > Your Plan.
If you consistently hit rate limits, consider upgrading your subscription plan for higher limits. Visit the billing section to compare plans and their rate limit allocations.
Agent-Level Rate Limits
In addition to subscription plan limits, you can configure rate limits at the individual agent level. This is useful for:
- Protecting specific agents from excessive usage
- Managing costs for high-traffic agents
- Ensuring fair resource distribution across multiple agents
Configuring Agent Rate Limits
Agent rate limits can be configured in the Execution Limits section of your agent settings:
| Setting | Description |
|---|---|
| Executions Per Minute | Maximum agent executions allowed per minute |
When both subscription and agent-level limits exist, the more restrictive limit applies. For example:
- If your plan allows 100 requests per minute and your agent is configured for 50 executions per minute => The agent will be limited to 50 executions per minute
Troubleshooting
Consistently Hitting Rate Limits
If you're frequently hitting rate limits:
- Review your usage patterns: Identify if certain operations are causing excessive API calls
- Implement caching: Cache responses to reduce redundant requests
- Optimize your code: Ensure you're not making unnecessary API calls
- Consider plan upgrade: If legitimate usage exceeds your limits, upgrade your subscription
Unexpected Rate Limit Errors
If you receive 429 errors unexpectedly:
- Check for multiple clients: Ensure other applications or team members aren't consuming your rate limit quota
- Verify your API key: Make sure you're using the correct API key for your organization
- Review recent changes: Check if any recent code changes increased API call frequency
Need Higher Limits?
If your use case requires higher rate limits than available in standard subscription plans:
- Contact Support: Reach out to discuss your specific requirements
- Enterprise Plans: Consider an enterprise plan with customized rate limits
- Optimize Usage: Work with our team to optimize your API usage patterns
For more information about subscription plans and their features, visit the Billing section.