AI engineer portfolio
Freelance AI engineer shipping production AI-native products end-to-end, not just bolting on an OpenAI call. Voice agents on OpenAI Realtime + Twilio with sub-second latency budgets, RAG systems with proper chunking and eval loops, tool-calling architectures, and the eval discipline to know when the model is failing in production. Currently building two AI-native products of my own.
- OpenAI Realtime API with Twilio for sub-700ms voice agents
- RAG: chunking strategies, embeddings, retrieval tuning, eval sets
- Tool-calling architectures and structured outputs
- Multi-model routing across Claude, GPT, and open-source
- Prompt design, prompt caching, and cost optimization
- Eval loops with golden sets, regression testing, and live monitoring
- Voice agent handoff to humans, with context preservation
- Vector stores (pgvector, Pinecone) and hybrid search
Shipped work for the same brief.
- 2026Phone Assistant — AI voice agent for L1 supportAI voice agent on OpenAI Realtime + Twilio handling L1 customer support: call routing, FAQ resolution, ticket creation, and clean human handoff. 640ms median, 880ms P95 turn latency.OpenAI RealtimeTwilioNext.jsPostgres
- 2026AI-first Scheduler with inline copilotSocial scheduler with an inline copilot that drafts posts, reformats per channel, and learns voice from past content. Cost-aware model routing and per-tenant token budgets.Next.jsOpenAISupabaseVercel Cron
- 2025CVLeap AI — AI-powered resume editorFull SaaS with AI-powered resume editor: streaming responses, structured outputs for ATS-safe formatting, and per-user usage tracking. Shipped in 8 weeks.Next.jsOpenAISupabaseTypeScript
What founders ask before reaching out.
What does 'AI engineer' actually mean in your portfolio?
Shipping AI-native products end-to-end. That means model selection with a tradeoff brief, prompt design, tool-calling architecture, RAG when retrieval matters, voice latency tuning when speech is the surface, evals so you know when it breaks, and cost monitoring so it doesn't blow up your budget. Not just calling chat.completions.create.
Can you build a voice agent on OpenAI Realtime?
Yes — currently shipping one. The voice-agents-latency-budget article on this site walks through the architecture, the 700ms target, why P95 matters more than median, and the Twilio + Node mediator pattern that hits it.
Do you build RAG systems?
Yes. RAG done right is mostly retrieval engineering, not LLM engineering: chunking strategy for your specific data shape, embedding model selection, hybrid search when keyword recall matters, reranking, and an eval set with golden answers so you know retrieval quality before it ships.
Which models do you work with?
Claude (Anthropic), GPT (OpenAI), and open-source via Together/Groq when latency or cost demands it. Model choice gets a written tradeoff brief on every engagement: capability, latency, cost, and failure modes for the specific task.
How do you handle evals and quality?
Golden set of inputs and expected outputs per task, scored by either heuristic checks or a judge LLM. Regression-tested on every prompt change. For voice, latency P95 and turn-success rate. For RAG, retrieval precision and answer faithfulness. No 'looks fine' shipping.
Will you train a custom model?
Usually no — fine-tuning is rarely the right answer for product work. Better prompts, better retrieval, and tool-calling solve most problems faster and cheaper. I'll fine-tune when there's a clear quality or latency reason and the training data exists.
Let's see if it's a fit.
30-minute call. No pitch, no slides. Tell me what you're building, including the AI parts, and the constraints. I'll tell you if I can help, and who else to call if I can't.