Skip to main content

Rate Limiting

Serenity* Star API implements rate limiting to ensure fair usage, maintain system stability, and protect against abuse. This guide explains how rate limiting works and how to handle rate limit responses in your applications.

Overview

Rate limiting controls the number of API requests a client can make within specific time windows. When you exceed the allowed limits, the API returns a 429 Too Many Requests HTTP status code.

Rate limits are applied at multiple levels:

  1. General Limits: Default limits that apply to all API requests
  2. Subscription Plan Limits: Custom limits based on your subscription plan
  3. Agent-Level Limits: Specific limits configured per agent

How Rate Limiting Works

Client Identification

Rate limits are applied per tenant (organization). The system identifies your tenant using the API key provided in the X-API-KEY header of your requests.

Rate Limit Windows

Rate limits are enforced across four time windows:

Time WindowDescription
Per SecondMaximum requests allowed per second
Per MinuteMaximum requests allowed per minute
Per HourMaximum requests allowed per hour
Per 12 HoursMaximum requests allowed per 12-hour period

Affected Endpoints

Subscription-based rate limits apply to specific high-resource endpoints:

  • Agent Execution: POST /api/agent/{agentCode}/execute and POST /api/v*/agent/{agentCode}/execute
  • Conversation Management: POST /api/agent/{agentCode}/conversation and PATCH /api/agent/{agentCode}/conversation/{conversationId}/context
  • Audio Transcription: POST /api/audio/transcribe
  • Real-time Connections: POST /api/agent/{agentCode}/realtime
  • Volatile Knowledge: POST /api/volatileknowledge
  • Connector Status: GET /api/connection/agentInstance/{id}/connector/{connectorId}/status

Other endpoints use the general rate limits which are typically more permissive.

Shared Rate Limit Counter

All affected endpoints share the same rate limit counter for your tenant. This means that calls to different endpoints are combined when calculating your rate limit usage.

Example: If your plan allows 100 requests per minute:

  • 50 calls to /agent/{code}/execute
  • 30 calls to /api/audio/transcribe
  • 20 calls to /api/volatileknowledge

This totals 100 requests, reaching your limit for that minute.

Nested Agent Executions

When an agent executes another agent internally (e.g., Agent A calls Agent B as part of its workflow), each execution counts separately toward your rate limit.

Example: If Agent A executes and internally triggers Agent B:

  • 1 request counted for Agent A execution
  • 1 request counted for Agent B execution
  • Total: 2 requests against your rate limit

This applies to any level of nesting. Plan your agent architecture accordingly if you have agents that trigger other agents.

Rate Limit Responses

When you exceed a rate limit, the API responds with:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 60

The Retry-After header indicates the number of seconds to wait before making another request.

Example Error Response

{
"error": "Rate limit exceeded",
"message": "Too many requests. Please retry after 60 seconds.",
"retryAfter": 60
}

Rate Limits by Subscription Plan

Your subscription plan determines your rate limits. Higher-tier plans typically include higher limits to accommodate increased API usage.

FeatureDescription
Rate Limit - Per SecondMaximum API calls allowed per second
Rate Limit - Per MinuteMaximum API calls allowed per minute
Rate Limit - Per HourMaximum API calls allowed per hour
Rate Limit - Per 12 HoursMaximum API calls allowed per 12-hour period

To view your current plan's rate limits, check your subscription details in the Serenity* Star dashboard under Billing > Your Plan.

Upgrading Your Plan

If you consistently hit rate limits, consider upgrading your subscription plan for higher limits. Visit the billing section to compare plans and their rate limit allocations.

Agent-Level Rate Limits

In addition to subscription plan limits, you can configure rate limits at the individual agent level. This is useful for:

  • Protecting specific agents from excessive usage
  • Managing costs for high-traffic agents
  • Ensuring fair resource distribution across multiple agents

Configuring Agent Rate Limits

Agent rate limits can be configured in the Execution Limits section of your agent settings:

SettingDescription
Executions Per MinuteMaximum agent executions allowed per minute

When both subscription and agent-level limits exist, the more restrictive limit applies. For example:

  • If your plan allows 100 requests per minute and your agent is configured for 50 executions per minute => The agent will be limited to 50 executions per minute

Troubleshooting

Consistently Hitting Rate Limits

If you're frequently hitting rate limits:

  1. Review your usage patterns: Identify if certain operations are causing excessive API calls
  2. Implement caching: Cache responses to reduce redundant requests
  3. Optimize your code: Ensure you're not making unnecessary API calls
  4. Consider plan upgrade: If legitimate usage exceeds your limits, upgrade your subscription

Unexpected Rate Limit Errors

If you receive 429 errors unexpectedly:

  1. Check for multiple clients: Ensure other applications or team members aren't consuming your rate limit quota
  2. Verify your API key: Make sure you're using the correct API key for your organization
  3. Review recent changes: Check if any recent code changes increased API call frequency

Need Higher Limits?

If your use case requires higher rate limits than available in standard subscription plans:

  1. Contact Support: Reach out to discuss your specific requirements
  2. Enterprise Plans: Consider an enterprise plan with customized rate limits
  3. Optimize Usage: Work with our team to optimize your API usage patterns

For more information about subscription plans and their features, visit the Billing section.