Firsty Builders›Guides›Bulk-provisioning 1,000 eSIMs without hitting rate limits

Resellers

Bulk-provisioning 1,000 eSIMs without hitting rate limits

Exponential backoff, idempotency keys, and how to design a queue that survives partial failures.

GTGauthier ThierensMay 21, 2026· 10 min read

Resellers

Bulk-provisioning 1,000 eSIMs without hitting rate limits

A corporate travel manager messages you on a Friday afternoon. Their team of 350 sales reps is flying to a conference in Singapore on Monday. Each needs a SIM. Can you provision them by end of weekend?

Sure. You're a reseller. You have an API integration with an eSIM provider. You write a loop that hits the provisioning endpoint 350 times.

Two minutes in, the API starts returning 429 Too Many Requests. Twenty minutes in, you've successfully provisioned 89 SIMs and you have no idea which ones. You're going to have a bad weekend.

This post is how to design bulk provisioning correctly.

The problem

Most eSIM APIs (Firsty included) rate-limit at a window-based threshold. Firsty's general endpoints allow 1000 requests per 15 minutes. Bursting past that returns 429 errors with a

Retry-After

header telling you exactly when you can retry.

A naive bulk script:

javascript
for (const employee of employees) {
  await provisionEsim(employee);
}

This is slow (350 seconds at 1 second per call) and starts failing the moment you exceed the rate window.

A naive parallel script is worse:

javascript
await Promise.all(employees.map(e => provisionEsim(e)));

This hits the API as fast as Node can spawn requests. The first batch might succeed. The rest fail with 429. You retry the failures. Some succeed, some fail again. You have no idea which employees got SIMs.

What you actually need

Try it yourself

Free sandbox. Real Tier-1 carriers. 60 seconds from signup to credentials.

Get started →

A queue that:

Respects the rate limit (stays under 1000 per 15 minutes)
Uses idempotency to prevent duplicate provisioning on retry
Recovers gracefully from failures
Reports progress so you can tell the customer how many are done
Is resumable: if your script crashes at SIM 217, you can restart and pick up where you left off

This is rate-limited job processing. The pattern works for any bulk API operation.

Idempotency keys first

Before we touch rate limits, fix the duplicate provisioning problem. Every provisioning request should include a unique idempotency key in the

X-Idempotency-Key

header:

javascript
const idempotencyKey = `${customerId}-${employeeId}`;

await axios.post('/api/v3/esims', {}, {
  headers: {
    Authorization: `Bearer ${token}`,
    'X-Idempotency-Key': idempotencyKey,
  }
});

If you retry the same request with the same idempotency key within 48 hours, Firsty returns the original response instead of creating a duplicate. The key should be deterministic: same logical operation, same key.

If a request with the same key is still being processed, you get

409 Conflict

. If you reuse a key with a different request body, you get

422 Unprocessable Entity

Common mistake: using a UUID generated at request time as the idempotency key. This doesn't help because retries generate a new UUID. The key must be derived from the logical operation, not the request.

Window-based rate limiting

Firsty's rate limit is 1000 requests per 15 minutes. That's roughly 67 requests per minute on average, but the window is what matters: you could burst 1000 requests in 30 seconds and then be locked out for 14.5 minutes.

The cleanest pattern is to pace yourself to use the window evenly. A token bucket works well:

javascript
class TokenBucket {
  constructor(capacity, refillPerSecond) {
    this.capacity = capacity;
    this.tokens = capacity;
    this.refillPerSecond = refillPerSecond;
    this.lastRefill = Date.now();
  }

  async acquire() {
    while (true) {
      this.refill();
      if (this.tokens >= 1) {
        this.tokens -= 1;
        return;
      }
      const waitMs = (1 / this.refillPerSecond) * 1000;
      await new Promise(r => setTimeout(r, waitMs));
    }
  }

  refill() {
    const now = Date.now();
    const elapsedSec = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(this.capacity, this.tokens + elapsedSec * this.refillPerSecond);
    this.lastRefill = now;
  }
}

const bucket = new TokenBucket(1000, 1000 / 900);

Before each API call:

javascript
await bucket.acquire();
await provisionEsim(employee);

This automatically paces your requests. You never exceed the rate limit because acquire() blocks when the bucket is empty.

Exponential backoff on 429

Even with rate limiting, you'll occasionally get 429s. Network jitter, clock skew between you and the server, other clients sharing your IP. Handle them with the

Retry-After

header:

javascript
async function withRetry(fn, maxRetries = 5) {
  let lastErr;
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err) {
      if (err.response?.status !== 429) throw err;
      lastErr = err;

      const retryAfter = parseInt(err.response.headers['retry-after']) || (2 ** attempt);
      const jitter = Math.random() * 0.3;
      const waitMs = (retryAfter + jitter) * 1000;

      console.log(`Rate limited, waiting ${waitMs}ms before retry ${attempt + 1}`);
      await new Promise(r => setTimeout(r, waitMs));
    }
  }
  throw lastErr;
}

Three details that matter:

Honor the
```
Retry-After
```
header if present. The server is telling you exactly when to retry.
Add jitter (random delay) to prevent thundering herd if multiple workers are retrying simultaneously.
Cap the retries. A permanent 429 (account suspension, for example) shouldn't retry forever.

Persistent job queue

In-memory rate limiting works for a single script. For real bulk operations, persist your work to a database:

sql
CREATE TABLE provisioning_jobs (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  batch_id UUID NOT NULL,
  employee_id TEXT NOT NULL,
  plan_reference TEXT NOT NULL,
  status TEXT DEFAULT 'pending' CHECK (status IN ('pending', 'in_progress', 'completed', 'failed')),
  attempts INT DEFAULT 0,
  iccid TEXT,
  activation_code TEXT,
  error TEXT,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  completed_at TIMESTAMPTZ,
  UNIQUE (batch_id, employee_id)
);

CREATE INDEX provisioning_jobs_status_idx ON provisioning_jobs(status);

Your worker pulls one job at a time:

sql
UPDATE provisioning_jobs
SET status = 'in_progress', attempts = attempts + 1
WHERE id = (
  SELECT id FROM provisioning_jobs
  WHERE status = 'pending'
  ORDER BY created_at
  FOR UPDATE SKIP LOCKED
  LIMIT 1
)
RETURNING *;

The

FOR UPDATE SKIP LOCKED

is critical: it lets multiple workers run in parallel without grabbing the same job.

After each provisioning:

sql
UPDATE provisioning_jobs
SET status = 'completed', iccid = $1, activation_code = $2, completed_at = NOW()
WHERE id = $3;

If your script crashes, you restart and unprocessed jobs are still

pending

. No state is lost.

Reporting progress

For a 350-SIM batch, you want to tell the customer "247 of 350 done." A simple status query:

sql
SELECT
  COUNT(*) FILTER (WHERE status = 'completed') AS completed,
  COUNT(*) FILTER (WHERE status = 'failed') AS failed,
  COUNT(*) FILTER (WHERE status IN ('pending', 'in_progress')) AS remaining
FROM provisioning_jobs
WHERE batch_id = $1;

Poll this every few seconds in the UI. Show a progress bar.

Putting it all together

javascript
const WORKERS = 5;
const bucket = new TokenBucket(1000, 1000 / 900);

async function worker() {
  while (true) {
    const job = await claimNextJob();
    if (!job) return;

    try {
      await bucket.acquire();

      const esim = await withRetry(() =>
        provisionEsim({
          idempotencyKey: `${job.batch_id}-${job.employee_id}-esim`,
        })
      );

      await bucket.acquire();

      await withRetry(() =>
        orderPackage(esim.profileReference, esim.esimReference, job.plan_reference, {
          idempotencyKey: `${job.batch_id}-${job.employee_id}-pkg`,
        })
      );

      await markJobCompleted(job.id, esim);
    } catch (err) {
      await markJobFailed(job.id, err.message);
    }
  }
}

await Promise.all(Array.from({ length: WORKERS }, worker));

Note that each SIM requires 2 API calls (create eSIM + order package), so 350 SIMs is 700 API calls total. At 1000 per 15 minutes, that's about 11 minutes of API work, plus carrier-side latency.

With proper progress reporting, the customer can watch it complete. If something fails, you have a database record of exactly which SIM failed and why.

What this costs

The naive approach: 1 hour of code, 0 working SIMs, angry customer.

The correct approach: 1 day of code, working bulk provisioning forever, repeat business.

If you're a reseller doing volume, this is week-one infrastructure, not a nice-to-have.

ShareLinkedIn X

Bulk-provisioning 1,000 eSIMs without hitting rate limits

Bulk-provisioning 1,000 eSIMs without hitting rate limits

The problem

What you actually need

Try it yourself

Idempotency keys first

Window-based rate limiting

Exponential backoff on 429

Persistent job queue

Reporting progress

Putting it all together

What this costs

Related guides

How to Become an eSIM Reseller in 2026: Step-by-Step Guide

Buying eSIMs in bulk: an honest checklist for evaluating wholesale providers

Data Roaming Optimization for eSIM Applications