LLM API Rate Limiting with Datawiza Agent Gateway

About 3 min

LLM API Rate Limiting with Datawiza Agent Gateway

Datawiza Agent Gateway enforces rate limits at the service level. Each service carries its own rate limit policies, and each policy targets one of two dimensions — the authenticated user or the virtual API key. When a limit is exceeded, the gateway returns 429 Too Many Requests immediately, without forwarding the request to the upstream LLM.

No client-side changes are required. Rate limits apply to all clients connecting through the gateway — whether they are talking to Anthropic, OpenAI, or Gemini — as long as they use a virtual API key.

Policy Dimensions

Level	What is tracked	Typical use case
User	All requests made by the same user, across all their virtual keys, on this service	Enforce per-seat fairness
Virtual API Key	Requests made by one specific virtual key on this service	Limit a single app or integration

Both dimensions are evaluated on every request. A request is rejected if either limit is exceeded.

Rate limit buckets are always scoped to a service. A user who exceeds their quota on one service is not affected on any other service.

Limit Types and Windows

Each policy rule is defined by the following fields:

Field	Options	Description
Name	string	A label for this policy rule (e.g. `user-rpm`)
Type	`Requests` \| `Tokens`	What to count — HTTP requests, or LLM tokens (input + output)
Level	`User` \| `Virtual API Key`	Which dimension to track
Limit	integer	Max allowed in the window. `0` = hard block — all matching requests are rejected immediately.
Frequency	`Per Minute` \| `Per Day`	Quota window — 60 seconds or 24 hours

Token check timing

Requests limits are checked and committed before the request is forwarded to the upstream LLM. Tokens limits are deducted after the response is received, once the actual input + output token count is known. Token deduction only occurs on responses that include token usage data — upstream errors that return no usage payload do not affect the token bucket.

Configure Rate Limits in DAGC

Rate limit policies are configured on the Service, not on individual virtual keys.

In the Datawiza Agent Gateway Console (DAGC), go to Services in the left sidebar and click the service you want to configure.
Open the Policies tab, then select the Rate Limit sub-tab. Click Create Policy.
The Configure Rate Limit Policy dialog opens. Fill in the fields and click Add Policy.

Repeat to add more rules. The values below are illustrative — set limits that match your upstream provider's actual quota for your account tier:

Name	Type	Level	Limit	Frequency
`key-rpm`	`Requests`	`Virtual API Key`	`20`	`Per Minute`
`key-rpd`	`Requests`	`Virtual API Key`	`1000`	`Per Day`
`user-rpm`	`Requests`	`User`	`60`	`Per Minute`
`user-rpd`	`Requests`	`User`	`5000`	`Per Day`
`key-tpm`	`Tokens`	`Virtual API Key`	`10000`	`Per Minute`
`key-tpd`	`Tokens`	`Virtual API Key`	`200000`	`Per Day`
`user-tpm`	`Tokens`	`User`	`40000`	`Per Minute`
`user-tpd`	`Tokens`	`User`	`1000000`	`Per Day`

Note

Changes take effect after the gateway applies the updated configuration — typically within a few seconds, no restart required.

How Limits Are Applied

Request arrives
   │
   ├── 1. Authenticate virtual key → 401 on failure (no rate-limit state touched)
   │
   ├── 2. Check all active rules (non-destructive peek)
   │         └── 429 + Retry-After if any rule is exceeded
   │
   ├── 3. Commit Requests counters (consume 1 unit per counter)
   │         └── 429 if a concurrent request consumed the last slot
   │
   ├── 4. Forward to upstream LLM
   │
   └── 5. Deduct Tokens counters with actual token count (input + output)

When a limit is exceeded, the response includes a Retry-After header set to the number of seconds (delta-seconds) to wait before retrying.

Window Refill

Windows use a rolling refill: the quota resets 60 seconds (or 24 hours) after the bucket was first written, not at a fixed clock boundary. A 1-minute bucket exhausted at 10:35:42 refills at 10:36:42 — not at 10:36:00.

Windows refill in full at the boundary, not gradually. The entire quota is restored in one step.

Datawiza Agent Gateway Introduction

LLM API Rate Limiting with Datawiza Agent Gateway

LLM API Rate Limiting with Datawiza Agent Gateway

Policy Dimensions

Limit Types and Windows

Configure Rate Limits in DAGC

How Limits Are Applied

Window Refill

Related Resources