Vision AI Agents — Rate Limits & Usage Tiers
Vision AI Agents enforces usage limits to ensure fair platform access, stable performance, and predictable scaling for all developers.
API limits are applied based on developer usage tiers. Each tier defines limits for concurrent API requests, daily request volume, and monthly token usage.
Developers can upgrade tiers to increase throughput and processing capacity.
How Rate Limits Work
Each API request counts toward one or more usage limits depending on the operation.
Rate limits help ensure:
- reliable performance for all developers
- predictable system throughput
- controlled infrastructure scaling
- fair platform usage
Rate limits apply across all Vision AI Agents APIs including:
- Video Ingest APIs
- Video Intelligence APIs
- Audience Testing APIs
- Search APIs
Usage Tiers
Vision AI Agents provides multiple usage tiers designed to support developers from early experimentation through large-scale enterprise deployments.
| Tier | Intended Use | Concurrent Requests | Daily Max Requests | Monthly Token Limit |
|---|---|---|---|---|
| Tier 0 | Developer / Testing | 5 | 500 | 50,000 |
| Tier 1 | Small Applications | 10 | 2,000 | 250,000 |
| Tier 2 | Production Applications | 20 | 10,000 | 1,000,000 |
| Tier 3 | Large Scale Applications | 40 | 25,000 | 5,000,000 |
| Tier 4 | Enterprise | 80 | 100,000 | Custom |
Enterprise tiers may include custom limits based on application requirements.
Rate Limit Types
Vision AI Agents enforces several types of limits to manage platform usage.
| Limit Type | Description | Example |
|---|---|---|
| Concurrent Requests | Maximum number of simultaneous API requests allowed | 20 concurrent requests |
| Daily Requests | Total number of API calls allowed within a 24-hour window | 10,000 requests per day |
| Monthly Tokens | Total computational processing capacity used per billing cycle | 1,000,000 tokens |
Concurrent Request Limits
Concurrent request limits define the number of API requests that may be processed simultaneously.
For example, a Tier 2 developer may send up to 20 concurrent API requests at the same time.
If the concurrent request limit is exceeded, additional requests may be queued or rejected with a rate limit response.
| Scenario | Behavior |
|---|---|
| Within limit | Requests are processed immediately |
| Near limit | Requests may be queued briefly |
| Exceeded limit | API returns HTTP 429 rate limit response |
Daily Request Limits
Daily request limits control the maximum number of API calls that can be made within a 24-hour period.
Daily limits help prevent runaway request volumes and ensure stable infrastructure usage across the platform.
When the daily request limit is reached, additional API calls will return a rate limit response.
| Tier Example | Daily Limit |
|---|---|
| Tier 0 | 500 requests/day |
| Tier 1 | 2,000 requests/day |
| Tier 2 | 10,000 requests/day |
| Tier 3 | 25,000 requests/day |
Monthly Token Limits
Monthly token limits control the total amount of platform processing capacity used within a billing period.
Tokens represent computational usage across platform services such as:
- video analysis processing
- intelligence extraction
- vector embedding generation
- search query processing
| Service Operation | Token Usage Category |
|---|---|
| Video Analysis | Video processing tokens |
| Scene Intelligence Extraction | Analysis tokens |
| Vector Embedding Generation | Embedding tokens |
| Search Queries | Search tokens |
Developers can monitor token usage in the developer dashboard.
Rate Limit Responses
If a rate limit is exceeded, the API will return an HTTP 429 response.
Example response:
{
"error": "rate_limit_exceeded",
"message": "API rate limit exceeded for current usage tier"
}
Rate Limit Response Fields
| Field | Type | Description |
|---|---|---|
| error | string | Machine-readable error code |
| message | string | Human-readable explanation of the limit violation |
Developers should implement retry logic or request tier upgrades if limits are reached frequently.
Best Practices for Scaling
Developers building production systems should follow these best practices.
| Best Practice | Description |
|---|---|
| Queue ingest workloads | Process large uploads gradually rather than all at once |
| Avoid repeated analysis | Reuse previously generated analysis results when possible |
| Cache search responses | Reduce repeated queries for identical searches |
| Batch operations | Process multiple videos in grouped requests |
These practices help maintain optimal performance and reduce unnecessary API calls.
Requesting Higher Limits
Developers requiring higher throughput can request upgraded usage tiers.
Enterprise plans may include:
- higher concurrency limits
- higher request volumes
- dedicated processing capacity
- custom rate limits
For enterprise access, contact:
Monitoring Usage
Developers can monitor their API usage through the Vision AI Agents developer dashboard.
| Metric | Description |
|---|---|
| Current Usage Tier | The developer's assigned rate limit tier |
| Concurrent Requests | Number of active simultaneous API requests |
| Daily Request Consumption | Total requests used within the current day |
| Monthly Token Usage | Total tokens consumed during the billing cycle |
Monitoring usage helps developers plan scaling and manage application workloads effectively.
Related Documentation
Developers integrating Vision AI Agents may also review:
- Getting Started
- Platform Architecture
- API Reference
- Authentication
- Search Integration Guide