Specifications

We provide extensive specifications for each type of connector which you can feed into the LLM of your choice to get a connector built for you.

Analytical Connectors Common Specification

Purpose-built integrations that extract data from source systems and load it into analytical stores (data warehouses or lakes) for reporting, modeling, and BI. They prioritize correctness, incremental delivery, and schema stability.

Data Model

Favor a normalized schema: entities (e.g., users, accounts), events (e.g., pageview, charge), and reference tables
Required columns per table: primary key, updated_at (source clock), ingested_at (connector clock UTC)
Prefer scalar columns; keep nested payloads in a _raw JSON column when needed

Sync Semantics

Support both initial full sync and ongoing incremental syncs
Use a deterministic cursor (e.g., updated_at, event_timestamp, or CDC offset); CDC is preferred when available
Perform idempotent loads via MERGE/UPSERT on primary key (and cursor where relevant)
Chunk and paginate reads; stream writes to avoid unbounded memory

Schema Evolution

Backwards-compatible, additive by default; avoid breaking renames/drops
Use stable, documented naming conventions (snake_case; UTC timestamps)
Emit clear migration notes when columns are added or semantics change

Data Quality and Deletes

Deduplicate by primary key and the latest updated_at/version
Represent soft deletes with is_deleted and deleted_at; propagate hard deletes when the source exposes them
Validate basic types and required fields; route malformed rows to a dead-letter path/table

Performance and Limits

Respect source rate limits; use concurrency controls and adaptive backoff with jitter
Use incremental checkpoints after each page/batch so jobs can resume safely
Prefer server-side filtering and projection to minimize transfer size

Observability

Logs: structured, with job/run IDs and page/batch numbers; never log secrets
Metrics: rows_read, rows_written, lag_seconds, duplicate_rows, retries, and duration_seconds
Optional tracing spans around source fetch, transform, and load

Security

TLS by default; least-privilege access to sources and targets
PII handling: configurable field redaction/masking; scrub sensitive data from logs and metrics

Documentation

List covered entities and their cursors, limitations/quotas, and expected sync cadences
Provide example schemas, sample queries, and recovery steps for common failures

API Connector Specification

This specification defines the requirements for implementing a robust, production‑ready API connector. The connector must be language‑agnostic. Any illustrative snippets must be treated as pseudocode, not tied to a specific language or framework.

Scope and Principles

Language‑agnostic: The spec describes behaviors, contracts, and data shapes, not language constructs.
Separation of concerns: Request execution, authentication, retries, rate limits, and pagination are composable, swappable modules.
Deterministic, observable, testable: Deterministic defaults, structured logs/metrics/traces, and clear test surfaces.
Secure by default: Credentials are redacted, transport is encrypted where applicable, and inputs/outputs are validated.
Resilient: Backoff with jitter, circuit breaking, idempotency, and graceful degradation built in.
Extensible: Hooks/middleware enable customization without forking core.

Core Methods

Every API connector must implement the following core functionality:

Initialization and Lifecycle

initialize(configuration)
Sets up the connector with the provided configuration. Should validate the configuration and prepare any internal state.
connect()
Establishes connection to the API service. May include authentication, session creation, or connection pooling.
disconnect()
Gracefully closes the connection and cleans up resources. Should complete any pending requests before disconnecting.
isConnected()
Returns true if the connector is currently connected and ready to make requests, false otherwise.

Request Methods

request(options)
Core method for making HTTP requests. All other HTTP methods should internally use this method.
Options should include: method, path, headers, query parameters, body, timeout, and any method-specific settings.
get(path, options)
Performs an HTTP GET request to the specified path.
post(path, data, options)
Performs an HTTP POST request with the provided data payload.
put(path, data, options)
Performs an HTTP PUT request to update a resource.
patch(path, data, options)
Performs an HTTP PATCH request for partial updates.
delete(path, options)
Performs an HTTP DELETE request to remove a resource.

Advanced Operations

batch(requests)
Executes multiple requests in a single operation where supported by the API. Should handle partial failures gracefully.
paginate(options)
Returns an iterator that automatically handles pagination, fetching subsequent pages as needed. Should support different pagination strategies.

Optional Operations (if applicable)

stream(options)
Reads streaming responses (e.g., chunked, SSE) with backpressure and cancellation.
upload(options) / download(options)
Handles large payload transfers, with multi‑part or resumable strategies when supported.

Configuration Structure

The connector configuration should support the following settings:

Base Configuration

baseUrl - The base URL for all API requests
timeout - Request timeout in milliseconds (default: 30000)
userAgent - Identifier for outbound requests (include app version/commit when available)
proxy - Optional proxy configuration (host, port, protocol, credentials)
tls - TLS options (verify, min version, CA bundle, mTLS certificates) where applicable
pooling - Connection pooling/keep‑alive settings

Authentication Configuration

Support for multiple authentication types:

type - One of: api_key, bearer, basic, oauth2, or custom
credentials - Authentication credentials specific to the chosen type

Retry Configuration

maxAttempts - Maximum number of retry attempts (default: 3)
initialDelay - Initial retry delay in milliseconds (default: 1000)
maxDelay - Maximum retry delay in milliseconds (default: 30000)
backoffMultiplier - Multiplier for exponential backoff (default: 2)
retryableStatusCodes - HTTP status codes that trigger retries (default: [429, 500, 502, 503, 504])
retryableErrors - Error types/codes that should trigger retries
retryBudgetMs - Hard cap on total time spent retrying a single logical operation
respectRetryAfter - Whether to honor server Retry‑After hints (default: true)
idempotency - Enable idempotency key strategy for unsafe methods (default: enabled)

Rate Limiting Configuration

requestsPerSecond - Maximum requests per second
requestsPerMinute - Maximum requests per minute
requestsPerHour - Maximum requests per hour
concurrentRequests - Maximum concurrent requests (default: 10)
burstCapacity - Allowed burst above steady rate (token bucket)
adaptiveFromHeaders - Update limits from response headers when available (default: true)

Default Settings

defaultHeaders - Headers to include with every request
defaultQueryParams - Query parameters to include with every request

Hooks Configuration

Arrays of hooks to execute at different stages:

beforeRequest - Executed before sending a request
afterResponse - Executed after receiving a response
onError - Executed when an error occurs
onRetry - Executed before retrying a request

Retry Mechanism

The connector must implement a robust retry strategy with the following requirements:

Retry Strategy Methods

shouldRetry(error, attemptNumber)
Determines whether a request should be retried based on the error and current attempt count.
calculateDelay(attemptNumber)
Calculates the delay before the next retry attempt.
onRetry(error, attemptNumber)
Hook called before each retry attempt for logging or state updates.

Implementation Requirements

Exponential Backoff
Calculate delay as: minimum(initialDelay × (backoffMultiplier ^ attemptNumber), maxDelay)
Jitter
Add randomization to prevent thundering herd: actualDelay = delay × (0.5 + random(0 to 0.5))
Respect Server Hints
Honor "Retry-After" headers when present
Circuit Breaker
Implement circuit breaker pattern to prevent cascading failures
Retry Budget
Abort retries once the per‑operation retry budget is exhausted, even if maxAttempts not reached.

Hook System

Hooks provide extension points for customizing connector behavior without modifying core logic:

Hook Structure

name - Unique identifier for the hook
priority - Execution order (lower numbers execute first)
execute(context) - The hook's main function

Hook Context

Each hook receives a context object containing:

type - The hook type: beforeRequest, afterResponse, onError, or onRetry
request - The request options (when applicable)
response - The response object (when applicable)
error - The error object (when applicable)
metadata - Additional context data

Context Methods

modifyRequest(updates) - Modify the outgoing request
modifyResponse(updates) - Modify the incoming response
abort(reason) - Cancel the request with a reason

Middleware Pipeline (conceptual)

Hooks/middleware execute in a well‑defined order around the core request execution:

PSEUDOCODE pipeline:
1. Build request (defaults → per‑call options → auth → user hooks)
2. Rate limiter: waitForSlot()
3. beforeRequest hooks (ordered by priority)
4. Execute (with timeout + cancellation token)
5. afterResponse hooks (transform/validate)
6. onError hooks (map/enrich), possibly shouldRetry → backoff
7. Metrics/logging at each stage

Common Hook Use Cases

Adding authentication headers
Request/response logging
Metrics collection
Request signing
Response transformation
Error enrichment

Type and Data Model Management

Response Structure

All responses should be wrapped in a consistent structure containing:

data - The actual response payload
status - HTTP status code
headers - Response headers as key-value pairs
meta - Optional metadata including:
- timestamp - When the response was received
- duration - Request duration in milliseconds
- retryCount - Number of retry attempts made
- rateLimit - Current rate limit status
- requestId - Correlation identifier echoed by server or generated by client

Data Transformation

The connector should provide methods for data transformation:

deserialize(data, schema)
Transform API response data into internal application models
serialize(data, schema)
Transform internal models into API-compatible format
validate(data, schema)
Validate data against a schema definition

Schema Definition

Schemas should support:

type - Data type: object, array, string, number, or boolean
properties - For objects, defines nested properties
items - For arrays, defines the schema of array elements
required - List of required property names
format - Specific format constraints (e.g., date-time, email, uri)
transform - Custom transformation function

Error Handling

Error Structure

All connector errors should include:

message - Human-readable error description
code - Machine-readable error code
statusCode - HTTP status code (if applicable)
details - Additional error context or data
retryable - Boolean indicating if the request can be retried
requestId - Correlation identifier if available
source - Subsystem where the error occurred (transport, auth, rateLimit, deserialize, userHook, unknown)

Standard Error Codes

Connectors should use these standardized error codes:

NETWORK_ERROR - Network connectivity issues
TIMEOUT - Request exceeded timeout limit
AUTH_FAILED - Authentication or authorization failure
RATE_LIMIT - Rate limit exceeded
INVALID_REQUEST - Malformed or invalid request
SERVER_ERROR - Server-side error (5xx status codes)
PARSING_ERROR - Failed to parse response
VALIDATION_ERROR - Data validation failed
CANCELLED - Request was cancelled by caller
UNSUPPORTED - Operation not supported by target API

Error Handling Best Practices

Preserve original error information for debugging
Provide actionable error messages
Include request context in error details
Differentiate between retryable and non-retryable errors
Log errors with appropriate severity levels

PSEUDOCODE error enrichment:
IF transport error THEN code = NETWORK_ERROR, retryable = true
ELSE IF status in [408, 425, 429, 5xx] THEN retryable = true
ELSE retryable = false
Attach requestId, endpoint, method, attemptNumber, duration

Pagination Support

Pagination Configuration

The paginate method should accept options including:

pageSize - Number of items per page
startCursor - Initial cursor for cursor-based pagination
startPage - Initial page number for page-based pagination
strategy - Pagination type: cursor, offset, page, or link-header
params - Strategy‑specific parameter names (e.g., pageParam, perPageParam, cursorParam, offsetParam, limitParam)

Custom Extraction Functions

Allow customization of pagination logic through:

extractNextCursor(response) - Extract the next page cursor from response
extractItems(response) - Extract items array from response
hasNextPage(response) - Determine if more pages exist

Pagination Implementation

The paginate method should:

Return an iterator for memory-efficient processing
Automatically fetch subsequent pages as needed
Handle different pagination strategies transparently
Yield arrays of items for each page
Stop when no more pages are available

PSEUDOCODE for paginate method:
1. Initialize cursor/page from options
2. Set hasMore = true
3. WHILE hasMore:
   a. Make request with current cursor/page
   b. Extract items from response
   c. Yield items to caller
   d. Extract next cursor/page
   e. Check if more pages exist
   f. Update hasMore flag
4. End iteration when no more pages

Concurrency, Cancellation, and Timeouts

Cancellation token: All operations accept a caller‑provided token to cancel in‑flight work.
Per‑call timeout: Enforced at the transport layer; must trigger cancellation and error with TIMEOUT.
Global shutdown: The connector supports graceful shutdown, draining in‑flight requests.
Max concurrency: Enforced independent of rate limits; bounded work queue to avoid unbounded memory growth.

PSEUDOCODE request with cancellation and timeout:
1. IF !canProceed() THEN waitForSlot()
2. START timer(timeout)
3. TRY execute
4. IF cancelled OR timer expired → abort transport → raise TIMEOUT/CANCELLED
5. ALWAYS release slot

Streaming and Large Payloads

Support reading streaming responses (SSE/chunked) with backpressure.
Support large uploads/downloads with chunking, multi‑part, or resumable mechanisms when available.
Apply checksum/ETag validation when provided by the server.
Surface progress events via hooks or callbacks where relevant.

PSEUDOCODE streaming read:
open stream
FOR EACH chunk IN stream:
  emit chunk to caller
ON error → map to NETWORK_ERROR (retryable if partial/transient)

Rate Limiting

Rate Limiter Methods

The rate limiter should implement:

canProceed()
Returns true if a request can be made immediately without exceeding rate limits
waitForSlot()
Blocks/waits until a request slot becomes available
updateFromResponse(headers)
Updates rate limit state based on response headers (e.g., X-RateLimit-Remaining)
getStatus()
Returns current rate limit status information

Rate Limit Status

Status information should include:

limit - Maximum requests allowed in the window
remaining - Requests remaining in current window
reset - Timestamp when the limit resets
retryAfter - Seconds to wait before retrying (if provided)

Implementation Strategies

Token Bucket - Smooth rate limiting with burst capacity
Sliding Window - Precise rate limiting over time windows
Fixed Window - Simple reset at specific intervals
Adaptive - Adjust based on server feedback

PSEUDOCODE adaptive update:
IF headers contain rate-limit info THEN update limiter state
IF Retry-After present THEN sleep per hint

Authentication Strategies

Authentication Methods

Each authentication strategy should implement:

authenticate(request)
Apply authentication credentials to the outgoing request
refresh()
Refresh expired credentials (optional, for token-based auth)
isValid()
Check if current authentication credentials are still valid

Required Authentication Types

API Key
Support for API keys in headers, query parameters, or custom locations
Bearer Token
JWT or opaque tokens with optional refresh mechanism
Basic Authentication
Username and password encoded in Authorization header
OAuth 2.0
Full OAuth flow with token refresh support
Custom Authentication
Signature-based auth, HMAC, or other custom schemes

Authentication Best Practices

Store credentials securely (never in plain text)
Implement automatic token refresh before expiration
Handle authentication failures gracefully
Support multiple authentication methods per connector
Allow authentication method switching at runtime

PSEUDOCODE auth application:
credentials = load from secure store
IF credentials expiring → refresh()
add auth to request (header/query/signature)

Idempotency

For unsafe methods (e.g., POST), support idempotency keys when the API allows, to safely retry.
Generate a stable key per logical operation; store it in a header or agreed field.
Avoid silent replays when idempotency is not supported (surface clear warnings).

PSEUDOCODE idempotency key:
key = hash(operationName + stableInputs)
set header "Idempotency-Key" = key

Webhooks and Async Jobs (if applicable)

Verify webhook signatures and timestamps; reject stale or invalid deliveries.
Support async job polling patterns (create → poll status → fetch result), with backoff.
De‑duplicate webhook events using delivery IDs or replay IDs.

PSEUDOCODE async job:
jobId = POST /jobs
REPEAT until done:
  status = GET /jobs/{jobId}
  IF status == done → break
  sleep(backoff)
result = GET /jobs/{jobId}/result

Best Practices

Connection Pooling: Reuse connections when possible
Request Deduplication: Prevent duplicate requests for the same resource
Caching: Implement cache headers respect (ETag, Last-Modified)
Compression: Support gzip/deflate compression
Logging: Structured logging with request IDs for tracing
Metrics: Track request count, latency, error rates
Graceful Shutdown: Complete in-flight requests before disconnecting
Resource Cleanup: Properly clean up timers, connections, and listeners

Observability

Logging: Structured logs with correlation requestId, redaction of secrets, and consistent fields.
Metrics: Counters (requests, errors, retries), distributions (latency, payload sizes), gauges (in‑flight, rate limits).
Tracing: Span per request with attributes for method, path, status, retryCount, rateLimit.

Security and Compliance

Redact secrets in logs, metrics, and errors.
Validate inputs and outputs; reject malformed data early.
Use TLS by default; support custom CA bundles and optional mTLS where required.
Clock‑skew aware signature validation when needed.
Respect data residency and minimization; avoid storing payloads unless explicitly enabled.

Versioning and Compatibility

Use the upstream/source version identifiers for organizing connector variants (e.g., v4, dates, API versions). SemVer is not required for registry entries.
Backward‑compatible changes preferable; document breaking changes clearly.
Feature flags or capability negotiation for optional features (e.g., streaming, webhooks).

Testing Requirements

Connectors must include:

Unit tests for all public methods
Integration tests with mock servers
Retry logic testing with various failure scenarios
Rate limit testing
Authentication flow testing
Error handling and recovery testing
Performance benchmarks

Conformance Checklist

Implements lifecycle: initialize, connect, disconnect, isConnected
Provides request primitives, optional stream/upload/download when applicable
Config supports baseUrl, timeouts, proxy/tls, auth, retry, rate limit, defaults, hooks
Retry with backoff + jitter, honors Retry‑After, has circuit breaker and retry budget
Hook pipeline before/after/error/retry; deterministic order and cancellation
Response wrapper with data/status/headers/meta including requestId and rateLimit
Structured errors with code/status/retryable/details and correlation id
Pagination supports cursor/offset/page/link‑header with pluggable extractors
Concurrency limits, cancellation, graceful shutdown
Observability: logs/metrics/traces with redaction
Security controls for credentials, TLS, validation, and redaction

Blob Storage Connectors

Connectors for cloud storage services like S3, Azure Blob Storage, and Google Cloud Storage.

Goals

Provide consistent configuration across providers (credentials, region, bucket/container)
Normalize listing, reading, writing, and deleting objects
Support pagination and streaming for large files

Core Operations

List objects with prefix and pagination support
Get object metadata (size, content-type, etag, last-modified)
Read object (buffer or stream)
Write object (buffer or stream) with content-type and ACL options
Delete single object or batch delete

Error Handling

Missing object → NotFound error
Permission issues → Authorization error
Transient network failures → Retries with backoff

Observability

Emit metrics for request counts, latency, size transferred
Structured logs with request IDs and provider operation names

Database Connectors

Connectors for various database systems including SQL and NoSQL databases.

Goals

Uniform connection configuration (host, port, database, credentials, SSL)
Pooled connections with sane defaults
Unified query/command execution with typed results where applicable

Core Capabilities

Health check and version info
Query execution (parameterized) with streaming for large result sets
Transaction support with commit/rollback
Schema inspection (tables, columns, indexes) where supported

Error Handling

Syntax/constraint errors surfaced with provider codes
Connection errors retried with backoff, transparent pool recovery

Observability

Metrics for query counts, latency, rows, errors
Trace spans for query execution and transactions

SaaS Connectors

Connectors for Software-as-a-Service platforms and third-party services.

Goals

Normalized auth (API key, OAuth2, custom token schemes)
Consistent pagination, rate limiting, and retry behavior
Standardized error taxonomy (auth, validation, rate-limit, server)

Core Capabilities

Authentication helpers and token refresh (where applicable)
Resource listing with cursor/page-based pagination
Webhook subscription/verification (if supported)
Idempotent write operations with conflict handling

Rate Limiting & Retries

Respect provider headers (X-RateLimit-*)
Exponential backoff with jitter for 429/5xx

Observability

Request/response logging with PII scrubbing
Metrics: request counts, latency, rate-limit hits

Specifications Copy Spec

Analytical Connectors Common Specification

Data Model

Sync Semantics

Schema Evolution

Data Quality and Deletes

Performance and Limits

Observability

Security

Documentation

API Connector Specification

Scope and Principles

Core Methods

Initialization and Lifecycle

Request Methods

Advanced Operations

Optional Operations (if applicable)

Configuration Structure

Base Configuration

Authentication Configuration

Retry Configuration

Rate Limiting Configuration

Default Settings

Hooks Configuration

Retry Mechanism

Retry Strategy Methods

Implementation Requirements

Hook System

Hook Structure

Hook Context

Context Methods

Middleware Pipeline (conceptual)

Common Hook Use Cases

Type and Data Model Management

Response Structure

Data Transformation

Schema Definition

Error Handling

Error Structure

Standard Error Codes

Error Handling Best Practices

Pagination Support

Pagination Configuration

Custom Extraction Functions

Pagination Implementation

Concurrency, Cancellation, and Timeouts

Streaming and Large Payloads

Rate Limiting

Rate Limiter Methods

Rate Limit Status

Implementation Strategies

Authentication Strategies

Authentication Methods

Required Authentication Types

Authentication Best Practices

Idempotency

Webhooks and Async Jobs (if applicable)

Best Practices

Observability

Security and Compliance

Versioning and Compatibility

Testing Requirements

Conformance Checklist

Blob Storage Connectors

Goals

Core Operations

Error Handling

Observability

Database Connectors

Goals

Core Capabilities

Error Handling

Observability

SaaS Connectors

Goals

Core Capabilities

Rate Limiting & Retries

Observability

Specifications