Rate Limiting

Serenity* Star API implements rate limiting to ensure fair usage, maintain system stability, and protect against abuse. This guide explains how rate limiting works and how to handle rate limit responses in your applications.

Overview

Rate limiting controls the number of API requests a client can make within specific time windows. When you exceed the allowed limits, the API returns a 429 Too Many Requests HTTP status code.

Rate limits are applied at multiple levels:

General Limits: Default limits that apply to all API requests
Subscription Plan Limits: Custom limits based on your subscription plan
Agent-Level Limits: Specific limits configured per agent

How Rate Limiting Works

Client Identification

Rate limits are applied per tenant (organization). The system identifies your tenant using the API key provided in the X-API-KEY header of your requests.

Rate Limit Windows

Rate limits are enforced across four time windows:

Time Window	Description
Per Second	Maximum requests allowed per second
Per Minute	Maximum requests allowed per minute
Per Hour	Maximum requests allowed per hour
Per 12 Hours	Maximum requests allowed per 12-hour period

Affected Endpoints

Subscription-based rate limits apply to specific high-resource endpoints:

Agent Execution: POST /api/agent/{agentCode}/execute and POST /api/v*/agent/{agentCode}/execute
Conversation Management: POST /api/agent/{agentCode}/conversation and PATCH /api/agent/{agentCode}/conversation/{conversationId}/context
Audio Transcription: POST /api/audio/transcribe
Real-time Connections: POST /api/agent/{agentCode}/realtime
Volatile Knowledge: POST /api/volatileknowledge
Connector Status: GET /api/connection/agentInstance/{id}/connector/{connectorId}/status

Other endpoints use the general rate limits which are typically more permissive.

Shared Rate Limit Counter

All affected endpoints share the same rate limit counter for your tenant. This means that calls to different endpoints are combined when calculating your rate limit usage.

Example: If your plan allows 100 requests per minute:

50 calls to /agent/{code}/execute
30 calls to /api/audio/transcribe
20 calls to /api/volatileknowledge

This totals 100 requests, reaching your limit for that minute.

Nested Agent Executions

When an agent executes another agent internally (e.g., Agent A calls Agent B as part of its workflow), each execution counts separately toward your rate limit.

Example: If Agent A executes and internally triggers Agent B:

1 request counted for Agent A execution
1 request counted for Agent B execution
Total: 2 requests against your rate limit

This applies to any level of nesting. Plan your agent architecture accordingly if you have agents that trigger other agents.

Rate Limit Responses

When you exceed a rate limit, the API responds with:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 60

The Retry-After header indicates the number of seconds to wait before making another request.

Example Error Response

{
  "error": "Rate limit exceeded",
  "message": "Too many requests. Please retry after 60 seconds.",
  "retryAfter": 60
}

Rate Limits by Subscription Plan

Your subscription plan determines your rate limits. Higher-tier plans typically include higher limits to accommodate increased API usage.

Feature	Description
Rate Limit - Per Second	Maximum API calls allowed per second
Rate Limit - Per Minute	Maximum API calls allowed per minute
Rate Limit - Per Hour	Maximum API calls allowed per hour
Rate Limit - Per 12 Hours	Maximum API calls allowed per 12-hour period

To view your current plan's rate limits, check your subscription details in the Serenity* Star dashboard under Billing > Your Plan.

Upgrading Your Plan

If you consistently hit rate limits, consider upgrading your subscription plan for higher limits. Visit the billing section to compare plans and their rate limit allocations.

Agent-Level Rate Limits

In addition to subscription plan limits, you can configure rate limits at the individual agent level. This is useful for:

Protecting specific agents from excessive usage
Managing costs for high-traffic agents
Ensuring fair resource distribution across multiple agents

Configuring Agent Rate Limits

Agent rate limits can be configured in the Execution Limits section of your agent settings:

Setting	Description
Executions Per Minute	Maximum agent executions allowed per minute

When both subscription and agent-level limits exist, the more restrictive limit applies. For example:

If your plan allows 100 requests per minute and your agent is configured for 50 executions per minute => The agent will be limited to 50 executions per minute

Troubleshooting

Consistently Hitting Rate Limits

If you're frequently hitting rate limits:

Review your usage patterns: Identify if certain operations are causing excessive API calls
Implement caching: Cache responses to reduce redundant requests
Optimize your code: Ensure you're not making unnecessary API calls
Consider plan upgrade: If legitimate usage exceeds your limits, upgrade your subscription

Unexpected Rate Limit Errors

If you receive 429 errors unexpectedly:

Check for multiple clients: Ensure other applications or team members aren't consuming your rate limit quota
Verify your API key: Make sure you're using the correct API key for your organization
Review recent changes: Check if any recent code changes increased API call frequency

Need Higher Limits?

If your use case requires higher rate limits than available in standard subscription plans:

Contact Support: Reach out to discuss your specific requirements
Enterprise Plans: Consider an enterprise plan with customized rate limits
Optimize Usage: Work with our team to optimize your API usage patterns

For more information about subscription plans and their features, visit the Billing section.

Rate Limiting

Overview​

How Rate Limiting Works​

Client Identification​

Rate Limit Windows​

Affected Endpoints​

Nested Agent Executions​

Rate Limit Responses​

Example Error Response​

Rate Limits by Subscription Plan​

Agent-Level Rate Limits​

Configuring Agent Rate Limits​

Troubleshooting​

Consistently Hitting Rate Limits​

Unexpected Rate Limit Errors​

Need Higher Limits?​