AI Agent Platform Engineer (MCP / Plugins / Skills)
About the role
DogeSMS sits at a particular intersection: we are an OTP and verification primitive, and the fastest-growing consumers of OTP primitives in 2026 are AI agents. Autonomous agents need throwaway numbers to sign up for SaaS on behalf of users, headless browsers need SMS verification to complete account creation in sandboxed runs, and developer-tooling agents (Claude Code, Codex, Cursor, Devin, and their successors) increasingly want to script entire account-onboarding flows as part of larger workflows. Today they all reinvent the integration. They should not have to. This role exists to make DogeSMS the default verification primitive across the agent ecosystem. Concretely that means owning the surface area where agents meet our API: a first-class MCP (Model Context Protocol) server that any Anthropic-compatible client can mount and immediately have tool-calls for "request a number, wait for SMS, return the OTP"; a Claude Code plugin and a Codex plugin published in their respective registries; a published Skill that drops into Anthropic Skills marketplace and Codex's equivalent; an opinionated CLI that agents can shell out to inside a sandbox (Modal / E2B / Cloudflare Sandboxes / Anthropic Code Execution); a sandbox-safe API key model that lets an agent get short-lived scoped credentials without compromising the user's main account; and the developer relations work to make every one of these surfaces stay current as the protocols evolve weekly. The hard problem is not "wire up tool calls" — that is a weekend. The hard problem is the long tail: making the MCP server graceful when the model misuses it, defining the sandbox-safe credential lifecycle so agents cannot exfiltrate balance, instrumenting every surface so we can tell which clients (Claude Code vs Codex vs raw Anthropic SDK vs ChatGPT desktop) are actually generating revenue, and shipping breaking-change migration tooling when MCP / Skills / plugin specs themselves break (they will). This is a hire for someone who has actually built against MCP, Claude Code plugins, and / or the Skills system in production, not someone who has only read the docs.
What you'll do
- Own the DogeSMS MCP (Model Context Protocol) server end-to-end: tool schema design, transport choice (stdio vs HTTP+SSE vs streamable HTTP), authentication flow, error surface, and the long-tail UX of "what happens when an LLM calls our tool with a malformed argument."
- Ship and maintain the Claude Code plugin and the Codex plugin in their respective registries — including the marketplace listing, version-pinning strategy, and the on-call rotation for when an upstream protocol bump breaks distribution.
- Publish a DogeSMS Skill that drops into Anthropic Skills and the equivalent Codex skill registry; design the skill's invocation surface so an LLM picks it up reliably without prompt-engineering theatre.
- Build a sandbox-friendly CLI (`dogesms`) and matching SDKs that agents can shell out to inside E2B / Modal / Cloudflare Sandboxes / Anthropic Code Execution sandboxes, with sensible defaults for non-interactive environments (machine-readable output, deterministic error codes, no TTY assumptions).
- Design the sandbox-safe credential model: short-lived scoped API keys (think GitHub fine-grained tokens, but for agents), per-task budget caps, automatic key rotation, and the audit trail that lets a user understand what their agent did with their account.
- Internal agent platform: standardized tool schemas, OpenTelemetry instrumentation per LLM call and tool invocation, eval harnesses (deterministic cases + LLM-as-judge), and regression tracking across model versions for our own agent-assisted workflows (incident triage, log analysis, carrier-side monitoring).
- Public-facing developer relations: be the human who shows up in MCP / Anthropic / Codex GitHub issues and Discord servers when DogeSMS comes up, write the canonical "here is how to verify accounts from a Claude Code session" walkthrough, and keep our docs current as the protocols evolve.
What we expect
- Real production experience with at least two of: MCP servers (TypeScript or Python SDK), Anthropic Skills, Claude Code plugins, Codex plugins / extensions, or the OpenAI Assistants / Realtime tool-calling surfaces. You can talk about what is awkward in the current specs, not just what is in them.
- Demonstrated track record building developer-facing surfaces — at minimum a plugin, CLI, or SDK you shipped that other engineers actually used. Github stars are not the bar; a credible "here is the user complaint thread, here is how I responded" is.
- 3+ years building production LLM-powered systems with real failure modes you have debugged. Notebooks and demos do not count — we will ask about the most embarrassing way one of your agents has failed in production and what you changed.
- Practical comfort with sandbox runtimes (E2B, Modal, Cloudflare Sandboxes, Anthropic Code Execution, Daytona, or equivalent) — including the credential-handling and network-egress quirks that distinguish them.
- Strong TypeScript at a senior level; Python at a working level. Most MCP / plugin / Skill ecosystems are TypeScript-primary right now and we follow the ecosystem.
- Working knowledge of OpenAI, Anthropic, and at least one other LLM API including tool use, function calling, structured output, and prompt caching. Practical fluency with their latency and cost shapes, not just the marketing pages.
Stack we use
TypeScript-primary (Node.js runtime, official MCP SDK, Anthropic Skills SDK, Claude Code plugin tooling, Codex extension tooling), Python for eval scripts and any plugin surfaces that demand it. Sandbox runtimes — E2B / Modal / Cloudflare Sandboxes / Anthropic Code Execution — for both internal agent workflows and the user-facing "run inside a sandbox" examples. OpenTelemetry per LLM call and per tool invocation, SigNoz dashboards segmented by client (Claude Code vs Codex vs raw SDK). PostgreSQL for sandbox-safe credential issuance and audit logs. Go backend (you will read it when designing the credential issuance flow; you will not own it day-to-day). No proprietary agent frameworks that lock us into a single foundation model. No opaque "agent platform" abstractions that hide latency or prompt content from observability. The design constraint: any agent calling DogeSMS via MCP / plugin / Skill / CLI / SDK should be fully traceable end-to-end, with every LLM call, tool invocation, and credential issuance visible in SigNoz — and the spec changes weekly, so the platform is built to evolve.
Compensation
$90,000–$180,000 USD annually. Singapore rates for senior engineers with shipped MCP / Skills / Claude Code plugin experience are a tight labor market right now and the upper band reflects that. We adjust within the range based on demonstrated production agent + plugin work — someone who has shipped a plugin that real engineers use is at the upper end regardless of years of experience. Equity is available for full-time hires. We cover the contractor premium (CPF equivalent) for Singapore-resident contractors and conference travel to relevant ecosystem events (Anthropic dev days, OpenAI DevDay, MCP working group meetings).
Hiring process
- Submit via the talent pool form at /careers/openings/ai-agent-platform-engineer. Include links to: (a) any MCP server, Claude Code plugin, Codex extension, Anthropic Skill, or comparable plugin you have shipped; (b) one paragraph on the most embarrassing way one of your agents has failed in production and what you changed in response.
- Async technical exercise (paid, $300 USD): sketch the tool surface for the DogeSMS MCP server (5-6 tools max), explain the auth handshake, and identify three failure modes you would instrument first. Scoped to 4-5 hours.
- 60-minute Zoom call with a co-founder: deep on your exercise submission, your views on where the MCP / Skills / plugin ecosystem is going next, and honest conversation about what we are building.
- 2-week paid trial project ($1,000 USD): real backlog item — ship a working prototype of either the MCP server or the Claude Code plugin against a sandbox account. We care about how you scope the problem as much as the code you produce.
Apply
Submit the form below to apply for this open role.