Specifications
We provide extensive specifications for each type of connector which you can feed into the LLM of your choice to get a connector built for you.
Analytical Connectors Common Specification
Purpose-built integrations that extract data from source systems and load it into analytical stores (data warehouses or lakes) for reporting, modeling, and BI. They prioritize correctness, incremental delivery, and schema stability.
Data Model
- Favor a normalized schema: entities (e.g., users, accounts), events (e.g., pageview, charge), and reference tables
- Required columns per table: primary key,
updated_at
(source clock),ingested_at
(connector clock UTC) - Prefer scalar columns; keep nested payloads in a
_raw
JSON column when needed
Sync Semantics
- Support both initial full sync and ongoing incremental syncs
- Use a deterministic cursor (e.g.,
updated_at
,event_timestamp
, or CDC offset); CDC is preferred when available - Perform idempotent loads via MERGE/UPSERT on primary key (and cursor where relevant)
- Chunk and paginate reads; stream writes to avoid unbounded memory
Schema Evolution
- Backwards-compatible, additive by default; avoid breaking renames/drops
- Use stable, documented naming conventions (snake_case; UTC timestamps)
- Emit clear migration notes when columns are added or semantics change
Data Quality and Deletes
- Deduplicate by primary key and the latest
updated_at
/version - Represent soft deletes with
is_deleted
anddeleted_at
; propagate hard deletes when the source exposes them - Validate basic types and required fields; route malformed rows to a dead-letter path/table
Performance and Limits
- Respect source rate limits; use concurrency controls and adaptive backoff with jitter
- Use incremental checkpoints after each page/batch so jobs can resume safely
- Prefer server-side filtering and projection to minimize transfer size
Observability
- Logs: structured, with job/run IDs and page/batch numbers; never log secrets
- Metrics:
rows_read
,rows_written
,lag_seconds
,duplicate_rows
,retries
, andduration_seconds
- Optional tracing spans around source fetch, transform, and load
Security
- TLS by default; least-privilege access to sources and targets
- PII handling: configurable field redaction/masking; scrub sensitive data from logs and metrics
Documentation
- List covered entities and their cursors, limitations/quotas, and expected sync cadences
- Provide example schemas, sample queries, and recovery steps for common failures
API Connector Specification
This specification defines the requirements for implementing a robust, production‑ready API connector. The connector must be language‑agnostic. Any illustrative snippets must be treated as pseudocode, not tied to a specific language or framework.
Scope and Principles
- Language‑agnostic: The spec describes behaviors, contracts, and data shapes, not language constructs.
- Separation of concerns: Request execution, authentication, retries, rate limits, and pagination are composable, swappable modules.
- Deterministic, observable, testable: Deterministic defaults, structured logs/metrics/traces, and clear test surfaces.
- Secure by default: Credentials are redacted, transport is encrypted where applicable, and inputs/outputs are validated.
- Resilient: Backoff with jitter, circuit breaking, idempotency, and graceful degradation built in.
- Extensible: Hooks/middleware enable customization without forking core.
Core Methods
Every API connector must implement the following core functionality:
Initialization and Lifecycle
-
initialize(configuration)
Sets up the connector with the provided configuration. Should validate the configuration and prepare any internal state. -
connect()
Establishes connection to the API service. May include authentication, session creation, or connection pooling. -
disconnect()
Gracefully closes the connection and cleans up resources. Should complete any pending requests before disconnecting. -
isConnected()
Returns true if the connector is currently connected and ready to make requests, false otherwise.
Request Methods
-
request(options)
Core method for making HTTP requests. All other HTTP methods should internally use this method.
Options should include: method, path, headers, query parameters, body, timeout, and any method-specific settings. -
get(path, options)
Performs an HTTP GET request to the specified path. -
post(path, data, options)
Performs an HTTP POST request with the provided data payload. -
put(path, data, options)
Performs an HTTP PUT request to update a resource. -
patch(path, data, options)
Performs an HTTP PATCH request for partial updates. -
delete(path, options)
Performs an HTTP DELETE request to remove a resource.
Advanced Operations
-
batch(requests)
Executes multiple requests in a single operation where supported by the API. Should handle partial failures gracefully. -
paginate(options)
Returns an iterator that automatically handles pagination, fetching subsequent pages as needed. Should support different pagination strategies.
Optional Operations (if applicable)
-
stream(options)
Reads streaming responses (e.g., chunked, SSE) with backpressure and cancellation. -
upload(options) / download(options)
Handles large payload transfers, with multi‑part or resumable strategies when supported.
Configuration Structure
The connector configuration should support the following settings:
Base Configuration
- baseUrl - The base URL for all API requests
- timeout - Request timeout in milliseconds (default: 30000)
- userAgent - Identifier for outbound requests (include app version/commit when available)
- proxy - Optional proxy configuration (host, port, protocol, credentials)
- tls - TLS options (verify, min version, CA bundle, mTLS certificates) where applicable
- pooling - Connection pooling/keep‑alive settings
Authentication Configuration
Support for multiple authentication types:
- type - One of: api_key, bearer, basic, oauth2, or custom
- credentials - Authentication credentials specific to the chosen type
Retry Configuration
- maxAttempts - Maximum number of retry attempts (default: 3)
- initialDelay - Initial retry delay in milliseconds (default: 1000)
- maxDelay - Maximum retry delay in milliseconds (default: 30000)
- backoffMultiplier - Multiplier for exponential backoff (default: 2)
- retryableStatusCodes - HTTP status codes that trigger retries (default: [429, 500, 502, 503, 504])
- retryableErrors - Error types/codes that should trigger retries
- retryBudgetMs - Hard cap on total time spent retrying a single logical operation
- respectRetryAfter - Whether to honor server Retry‑After hints (default: true)
- idempotency - Enable idempotency key strategy for unsafe methods (default: enabled)
Rate Limiting Configuration
- requestsPerSecond - Maximum requests per second
- requestsPerMinute - Maximum requests per minute
- requestsPerHour - Maximum requests per hour
- concurrentRequests - Maximum concurrent requests (default: 10)
- burstCapacity - Allowed burst above steady rate (token bucket)
- adaptiveFromHeaders - Update limits from response headers when available (default: true)
Default Settings
- defaultHeaders - Headers to include with every request
- defaultQueryParams - Query parameters to include with every request
Hooks Configuration
Arrays of hooks to execute at different stages:
- beforeRequest - Executed before sending a request
- afterResponse - Executed after receiving a response
- onError - Executed when an error occurs
- onRetry - Executed before retrying a request
Retry Mechanism
The connector must implement a robust retry strategy with the following requirements:
Retry Strategy Methods
-
shouldRetry(error, attemptNumber)
Determines whether a request should be retried based on the error and current attempt count. -
calculateDelay(attemptNumber)
Calculates the delay before the next retry attempt. -
onRetry(error, attemptNumber)
Hook called before each retry attempt for logging or state updates.
Implementation Requirements
-
Exponential Backoff
Calculate delay as: minimum(initialDelay × (backoffMultiplier ^ attemptNumber), maxDelay) -
Jitter
Add randomization to prevent thundering herd: actualDelay = delay × (0.5 + random(0 to 0.5)) -
Respect Server Hints
Honor "Retry-After" headers when present -
Circuit Breaker
Implement circuit breaker pattern to prevent cascading failures -
Retry Budget
Abort retries once the per‑operation retry budget is exhausted, even ifmaxAttempts
not reached.
Hook System
Hooks provide extension points for customizing connector behavior without modifying core logic:
Hook Structure
- name - Unique identifier for the hook
- priority - Execution order (lower numbers execute first)
- execute(context) - The hook's main function
Hook Context
Each hook receives a context object containing:
- type - The hook type: beforeRequest, afterResponse, onError, or onRetry
- request - The request options (when applicable)
- response - The response object (when applicable)
- error - The error object (when applicable)
- metadata - Additional context data
Context Methods
- modifyRequest(updates) - Modify the outgoing request
- modifyResponse(updates) - Modify the incoming response
- abort(reason) - Cancel the request with a reason
Middleware Pipeline (conceptual)
Hooks/middleware execute in a well‑defined order around the core request execution:
PSEUDOCODE pipeline: 1. Build request (defaults → per‑call options → auth → user hooks) 2. Rate limiter: waitForSlot() 3. beforeRequest hooks (ordered by priority) 4. Execute (with timeout + cancellation token) 5. afterResponse hooks (transform/validate) 6. onError hooks (map/enrich), possibly shouldRetry → backoff 7. Metrics/logging at each stage
Common Hook Use Cases
- Adding authentication headers
- Request/response logging
- Metrics collection
- Request signing
- Response transformation
- Error enrichment
Type and Data Model Management
Response Structure
All responses should be wrapped in a consistent structure containing:
- data - The actual response payload
- status - HTTP status code
- headers - Response headers as key-value pairs
- meta - Optional metadata including:
- timestamp - When the response was received
- duration - Request duration in milliseconds
- retryCount - Number of retry attempts made
- rateLimit - Current rate limit status
- requestId - Correlation identifier echoed by server or generated by client
Data Transformation
The connector should provide methods for data transformation:
-
deserialize(data, schema)
Transform API response data into internal application models -
serialize(data, schema)
Transform internal models into API-compatible format -
validate(data, schema)
Validate data against a schema definition
Schema Definition
Schemas should support:
- type - Data type: object, array, string, number, or boolean
- properties - For objects, defines nested properties
- items - For arrays, defines the schema of array elements
- required - List of required property names
- format - Specific format constraints (e.g., date-time, email, uri)
- transform - Custom transformation function
Error Handling
Error Structure
All connector errors should include:
- message - Human-readable error description
- code - Machine-readable error code
- statusCode - HTTP status code (if applicable)
- details - Additional error context or data
- retryable - Boolean indicating if the request can be retried
- requestId - Correlation identifier if available
- source - Subsystem where the error occurred (transport, auth, rateLimit, deserialize, userHook, unknown)
Standard Error Codes
Connectors should use these standardized error codes:
- NETWORK_ERROR - Network connectivity issues
- TIMEOUT - Request exceeded timeout limit
- AUTH_FAILED - Authentication or authorization failure
- RATE_LIMIT - Rate limit exceeded
- INVALID_REQUEST - Malformed or invalid request
- SERVER_ERROR - Server-side error (5xx status codes)
- PARSING_ERROR - Failed to parse response
- VALIDATION_ERROR - Data validation failed
- CANCELLED - Request was cancelled by caller
- UNSUPPORTED - Operation not supported by target API
Error Handling Best Practices
- Preserve original error information for debugging
- Provide actionable error messages
- Include request context in error details
- Differentiate between retryable and non-retryable errors
- Log errors with appropriate severity levels
PSEUDOCODE error enrichment: IF transport error THEN code = NETWORK_ERROR, retryable = true ELSE IF status in [408, 425, 429, 5xx] THEN retryable = true ELSE retryable = false Attach requestId, endpoint, method, attemptNumber, duration
Pagination Support
Pagination Configuration
The paginate method should accept options including:
- pageSize - Number of items per page
- startCursor - Initial cursor for cursor-based pagination
- startPage - Initial page number for page-based pagination
- strategy - Pagination type: cursor, offset, page, or link-header
- params - Strategy‑specific parameter names (e.g., pageParam, perPageParam, cursorParam, offsetParam, limitParam)
Custom Extraction Functions
Allow customization of pagination logic through:
- extractNextCursor(response) - Extract the next page cursor from response
- extractItems(response) - Extract items array from response
- hasNextPage(response) - Determine if more pages exist
Pagination Implementation
The paginate method should:
- Return an iterator for memory-efficient processing
- Automatically fetch subsequent pages as needed
- Handle different pagination strategies transparently
- Yield arrays of items for each page
- Stop when no more pages are available
PSEUDOCODE for paginate method: 1. Initialize cursor/page from options 2. Set hasMore = true 3. WHILE hasMore: a. Make request with current cursor/page b. Extract items from response c. Yield items to caller d. Extract next cursor/page e. Check if more pages exist f. Update hasMore flag 4. End iteration when no more pages
Concurrency, Cancellation, and Timeouts
- Cancellation token: All operations accept a caller‑provided token to cancel in‑flight work.
- Per‑call timeout: Enforced at the transport layer; must trigger cancellation and error with
TIMEOUT
. - Global shutdown: The connector supports graceful shutdown, draining in‑flight requests.
- Max concurrency: Enforced independent of rate limits; bounded work queue to avoid unbounded memory growth.
PSEUDOCODE request with cancellation and timeout: 1. IF !canProceed() THEN waitForSlot() 2. START timer(timeout) 3. TRY execute 4. IF cancelled OR timer expired → abort transport → raise TIMEOUT/CANCELLED 5. ALWAYS release slot
Streaming and Large Payloads
- Support reading streaming responses (SSE/chunked) with backpressure.
- Support large uploads/downloads with chunking, multi‑part, or resumable mechanisms when available.
- Apply checksum/ETag validation when provided by the server.
- Surface progress events via hooks or callbacks where relevant.
PSEUDOCODE streaming read: open stream FOR EACH chunk IN stream: emit chunk to caller ON error → map to NETWORK_ERROR (retryable if partial/transient)
Rate Limiting
Rate Limiter Methods
The rate limiter should implement:
-
canProceed()
Returns true if a request can be made immediately without exceeding rate limits -
waitForSlot()
Blocks/waits until a request slot becomes available -
updateFromResponse(headers)
Updates rate limit state based on response headers (e.g., X-RateLimit-Remaining) -
getStatus()
Returns current rate limit status information
Rate Limit Status
Status information should include:
- limit - Maximum requests allowed in the window
- remaining - Requests remaining in current window
- reset - Timestamp when the limit resets
- retryAfter - Seconds to wait before retrying (if provided)
Implementation Strategies
- Token Bucket - Smooth rate limiting with burst capacity
- Sliding Window - Precise rate limiting over time windows
- Fixed Window - Simple reset at specific intervals
- Adaptive - Adjust based on server feedback
PSEUDOCODE adaptive update: IF headers contain rate-limit info THEN update limiter state IF Retry-After present THEN sleep per hint
Authentication Strategies
Authentication Methods
Each authentication strategy should implement:
-
authenticate(request)
Apply authentication credentials to the outgoing request -
refresh()
Refresh expired credentials (optional, for token-based auth) -
isValid()
Check if current authentication credentials are still valid
Required Authentication Types
-
API Key
Support for API keys in headers, query parameters, or custom locations -
Bearer Token
JWT or opaque tokens with optional refresh mechanism -
Basic Authentication
Username and password encoded in Authorization header -
OAuth 2.0
Full OAuth flow with token refresh support -
Custom Authentication
Signature-based auth, HMAC, or other custom schemes
Authentication Best Practices
- Store credentials securely (never in plain text)
- Implement automatic token refresh before expiration
- Handle authentication failures gracefully
- Support multiple authentication methods per connector
- Allow authentication method switching at runtime
PSEUDOCODE auth application: credentials = load from secure store IF credentials expiring → refresh() add auth to request (header/query/signature)
Idempotency
- For unsafe methods (e.g., POST), support idempotency keys when the API allows, to safely retry.
- Generate a stable key per logical operation; store it in a header or agreed field.
- Avoid silent replays when idempotency is not supported (surface clear warnings).
PSEUDOCODE idempotency key: key = hash(operationName + stableInputs) set header "Idempotency-Key" = key
Webhooks and Async Jobs (if applicable)
- Verify webhook signatures and timestamps; reject stale or invalid deliveries.
- Support async job polling patterns (create → poll status → fetch result), with backoff.
- De‑duplicate webhook events using delivery IDs or replay IDs.
PSEUDOCODE async job: jobId = POST /jobs REPEAT until done: status = GET /jobs/{jobId} IF status == done → break sleep(backoff) result = GET /jobs/{jobId}/result
Best Practices
- Connection Pooling: Reuse connections when possible
- Request Deduplication: Prevent duplicate requests for the same resource
- Caching: Implement cache headers respect (ETag, Last-Modified)
- Compression: Support gzip/deflate compression
- Logging: Structured logging with request IDs for tracing
- Metrics: Track request count, latency, error rates
- Graceful Shutdown: Complete in-flight requests before disconnecting
- Resource Cleanup: Properly clean up timers, connections, and listeners
Observability
- Logging: Structured logs with correlation
requestId
, redaction of secrets, and consistent fields. - Metrics: Counters (requests, errors, retries), distributions (latency, payload sizes), gauges (in‑flight, rate limits).
- Tracing: Span per request with attributes for method, path, status, retryCount, rateLimit.
Security and Compliance
- Redact secrets in logs, metrics, and errors.
- Validate inputs and outputs; reject malformed data early.
- Use TLS by default; support custom CA bundles and optional mTLS where required.
- Clock‑skew aware signature validation when needed.
- Respect data residency and minimization; avoid storing payloads unless explicitly enabled.
Versioning and Compatibility
- Use the upstream/source version identifiers for organizing connector variants (e.g., v4, dates, API versions). SemVer is not required for registry entries.
- Backward‑compatible changes preferable; document breaking changes clearly.
- Feature flags or capability negotiation for optional features (e.g., streaming, webhooks).
Testing Requirements
Connectors must include:
- Unit tests for all public methods
- Integration tests with mock servers
- Retry logic testing with various failure scenarios
- Rate limit testing
- Authentication flow testing
- Error handling and recovery testing
- Performance benchmarks
Conformance Checklist
- Implements lifecycle: initialize, connect, disconnect, isConnected
- Provides request primitives, optional stream/upload/download when applicable
- Config supports baseUrl, timeouts, proxy/tls, auth, retry, rate limit, defaults, hooks
- Retry with backoff + jitter, honors Retry‑After, has circuit breaker and retry budget
- Hook pipeline before/after/error/retry; deterministic order and cancellation
- Response wrapper with data/status/headers/meta including requestId and rateLimit
- Structured errors with code/status/retryable/details and correlation id
- Pagination supports cursor/offset/page/link‑header with pluggable extractors
- Concurrency limits, cancellation, graceful shutdown
- Observability: logs/metrics/traces with redaction
- Security controls for credentials, TLS, validation, and redaction
Blob Storage Connectors
Connectors for cloud storage services like S3, Azure Blob Storage, and Google Cloud Storage.
Goals
- Provide consistent configuration across providers (credentials, region, bucket/container)
- Normalize listing, reading, writing, and deleting objects
- Support pagination and streaming for large files
Core Operations
- List objects with prefix and pagination support
- Get object metadata (size, content-type, etag, last-modified)
- Read object (buffer or stream)
- Write object (buffer or stream) with content-type and ACL options
- Delete single object or batch delete
Error Handling
- Missing object → NotFound error
- Permission issues → Authorization error
- Transient network failures → Retries with backoff
Observability
- Emit metrics for request counts, latency, size transferred
- Structured logs with request IDs and provider operation names
Database Connectors
Connectors for various database systems including SQL and NoSQL databases.
Goals
- Uniform connection configuration (host, port, database, credentials, SSL)
- Pooled connections with sane defaults
- Unified query/command execution with typed results where applicable
Core Capabilities
- Health check and version info
- Query execution (parameterized) with streaming for large result sets
- Transaction support with commit/rollback
- Schema inspection (tables, columns, indexes) where supported
Error Handling
- Syntax/constraint errors surfaced with provider codes
- Connection errors retried with backoff, transparent pool recovery
Observability
- Metrics for query counts, latency, rows, errors
- Trace spans for query execution and transactions
SaaS Connectors
Connectors for Software-as-a-Service platforms and third-party services.
Goals
- Normalized auth (API key, OAuth2, custom token schemes)
- Consistent pagination, rate limiting, and retry behavior
- Standardized error taxonomy (auth, validation, rate-limit, server)
Core Capabilities
- Authentication helpers and token refresh (where applicable)
- Resource listing with cursor/page-based pagination
- Webhook subscription/verification (if supported)
- Idempotent write operations with conflict handling
Rate Limiting & Retries
- Respect provider headers (X-RateLimit-*)
- Exponential backoff with jitter for 429/5xx
Observability
- Request/response logging with PII scrubbing
- Metrics: request counts, latency, rate-limit hits