Browse
Archive
15
posts
- The Rosetta Stone of AI BS Mar 11, 2026
- ATS Tried to automate hiring, but got automated back Mar 03, 2026
- Learning RAG while benchmarking it Feb 17, 2026
- I Let an AI Interview Me, Then Data-Analyzed My Own Answers Feb 9, 2026
- How I discovered something interesting about ATS... Jan 25, 2026
- Me, Claude vs jsPDF - The Saga Jan 20, 2026
- 2KB to 2GB: Why Embedded Systems Engineers Will Dominate Jan 11, 2026
- All roads lead to Rome, yet my passport is empty. Jan 10, 2026
- Architecture Before Syntax: The Theme-Aware Chart.js Jan 9, 2026
- What Would AI Invent If We Started from Assembly? Jan 8, 2026
- Taming Gemini Costs & Coding with AI Jan 6, 2026
- Building Production SEO in a 29MB Binary Dec 30, 2025
- Why I built this website, its tech stack and approach Dec 30, 2025
- The Scorer Paradox: A Pragmatic Guide to Beating the ATS Dec 11, 2025
- Why I am Skeptical of AGI, but you should use AI Dec 12, 2025
Taming Gemini Costs & Coding with AI
Most people think AI collaboration means asking ChatGPT to write code and copy-pasting it. That's not collaboration, that's delegation. Real collaboration happens when you treat AI like a pair programming partner who never sleeps, has perfect memory, and can instantly search through millions of lines of documentation.
I just finished building Antigravity Brain, a cost-optimized LLM proxy server. The entire project was built through genuine AI-human collaboration. Not me telling it what to do, but us figuring shit out together. But first, let me tell you why I built it.
The "Holy Shit" Moment
I realized I was burning roughly $0.01 per request just to ask "where is this function defined?" in a large repo. That's financial suicide. Sending a 100k token codebase with every single request adds up. Fast.
Google's Gemini has this killer feature called Context Caching. You upload your files once, get a cache ID, and reuse it. The price drastically reduces. But setting it up manually every time is a pain in the ass. So I built a proxy server to automate it.
Fig 1. Cost comparison for 1k requests with large context window. Context caching reduces costs by 60-80% (shown: 70% reduction).
The Setup: What I Built
Antigravity Brain is a Go server that sits between your IDE and Google's Gemini API. Instead of sending your entire project context with every request (expensive as hell), it caches your files once and reuses them. Think of it as a smart proxy that remembers your codebase so you don't have to pay to remind the AI about it every single time.
I didn't want a script. I wanted a Brain. A persistent middleware layer that handles the caching automatically. The core value proposition is simple: automatic cost optimization without thinking about it. You start the server, point it at your project, and it handles the caching. Your IDE connects, you chat with the AI, and behind the scenes the server is cutting your token costs by 60-80% because it's not re-sending the same files over and over.
Phase 1: The Foundation (Where AI Shines)
I started with a broken codebase. The original main.go had syntax errors, was written for an old SDK version, and had HTML/CSS/JS embedded directly in the Go source. Classic technical debt.
This is where AI collaboration is fucking magical. I said "fix the syntax errors" and it found two issues in seconds. I said "update for the new SDK" and it migrated 18+ API calls across the entire codebase. I said "separate the HTML" and it extracted everything into proper files with Go's embed system.
These are the boring, repetitive tasks that would take me hours. The AI did them in minutes. But here's the key: I still had to verify everything worked. I didn't just trust it blindly. I tested, found edge cases, and sent it back to fix things.
// Before: Embedded HTML in Go (nightmare to edit)
var html = `<html>...massive string...</html>`
// After: Clean separation with embed
//go:embed web/index.html
var indexHTML string
//go:embed web/assets/*
var assetsFS embed.FS
Phase 2: The Critical Insight (Where Humans Win)
Then I hit the testing phase. This is where the collaboration got interesting.
I noticed something weird: the server said "1220 tokens out" but the response was empty. The AI had no idea what was wrong. I said "we need to see the information in raw format being exchanged, I'm blind here" and that led me to discover the SDK was generating images but the code was only extracting text.
The fix was simple once I found it:
// Extract images from response parts
for _, part := range res.Candidates[0].Content.Parts {
if part.InlineData != nil && part.InlineData.Data != nil {
images = append(images, ImageData{
MimeType: part.InlineData.MIMEType,
Data: base64.StdEncoding.EncodeToString(part.InlineData.Data),
})
}
}
But here's what matters: the AI couldn't see the problem. It could write the code, but it couldn't notice that "1220 tokens" meant something was being generated. That's human pattern recognition. That's the value I brought.
Phase 3: The Architecture Decision (Where I Provided Constraints)
When I decided to add multi-provider support (Anthropic Claude, OpenAI), I wanted to explore it without breaking the working Gemini implementation. The AI suggested a clean architecture with provider interfaces and isolated folders.
I said "focus on 80-90% use case, not 100%". VS Code and Claude Desktop cover most users. Gemini, Claude, and GPT cover most providers. First request latency? Expected and acceptable. Users know they're using specialized software.
That scope reduction cut the timeline from 4-6 weeks to 3-4 weeks. The AI provided the comprehensive analysis, I provided the realistic constraints. That's collaboration.
The Pattern: When to Use AI, When to Use Your Brain
After 13 phases of development, a clear pattern emerged:
| Task Type | Human Role | AI Role |
|---|---|---|
| Syntax/API Migration | Verify, test edge cases | Bulk transformation |
| Architecture Design | Constraints, scope reduction | Options analysis, implementation |
| Debugging | Pattern recognition, hypothesis | Code fixes, log analysis |
| Testing | Real-world scenarios, edge cases | Automated test generation |
The most valuable human contributions weren't about writing code. They were about:
- Knowing what to look for when something doesn't work as expected
- Recognizing when solutions are getting too complicated and cutting scope
- Testing in real conditions instead of assuming code review catches everything
- Providing realistic constraints that make ambitious plans actually achievable
The Numbers: What Collaboration Actually Looks Like
Over 13 development phases, here's the breakdown:
- 13 phases from initial fixes to multi-provider architecture
- 18+ API migrations handled by AI in one pass
- 3 critical bugs caught by human testing (cache conflicts, image rendering, model filtering)
- 2 major scope reductions that cut timeline by 40%
- 1 vision document that went from "everything" to "80-90% use case"
The testing phase was where my guidance was most valuable. The AI wrote code that compiled and looked correct. But real usage revealed that Google Search conflicts with cached content, that images were being generated but not displayed, and that experimental models needed to be blocked.
Code review doesn't catch these. Real testing does.
The Vision: Invisible Cost Optimization
Right now, Antigravity Brain works, but you still need to manually point it at your project and build the cache. That's fine for now, but it's not the end goal. My vision is much simpler: zero configuration, automatic cost savings.
Here's what I'm building towards:
1. Automatic Workspace Detection
I want the server to automatically detect which workspace you're working in. When you connect your IDE (VS Code, Claude Desktop), the server should figure out your project root by parsing file paths, checking for git repos, or reading IDE metadata. No manual configuration needed.
2. Transparent Caching
On the first request from a workspace, the server builds the cache automatically. It might take 5-30 seconds—that's expected and acceptable. After that, all subsequent requests use the cache transparently. You never think about it. It just works.
3. Multi-Provider, Unified Experience
I want one server that provides models from all providers (Gemini, Claude, GPT). You configure your API keys once, and the server gives you a unified model list. You pick a model, and the server routes to the right provider automatically. Each provider's cache is managed independently, but you don't care about that—it's invisible.
4. Cost Optimization Without Thinking
The whole point is that you shouldn't have to think about caching. The server handles it. You just code, chat with the AI, and behind the scenes you're saving 60-80% on token costs because the server isn't re-sending the same files over and over.
The AI provided comprehensive feasibility analysis for this vision. I provided realistic scope: focus on VS Code and Claude Desktop (covers 80-90% of users), support Gemini/Claude/GPT (covers 80-90% of providers), accept first-request latency as expected behavior.
That scope reduction cut the timeline estimate from 4-6 weeks to 3-4 weeks. Not by working faster, but by working smarter. By focusing on what actually matters instead of trying to cover every edge case.
What I Expect From This Project
I'm not building this to be the perfect solution for everyone. I'm building it to solve my problem: I want to use AI in my IDE without burning cash on token costs. If it works for me and covers 80-90% of similar use cases, that's enough.
I expect this to eventually become a "one for all" solution—a single caching tool that works with multiple providers, automatically optimizes costs, and requires zero configuration. But I'm building it iteratively. First, get it working for my use case. Then expand based on real usage.
The multi-provider architecture I built is isolated in a separate folder for exactly this reason. If it works, I'll merge it into the main tool. If it doesn't, I haven't broken what's already working. That's the pragmatic approach.
Key Takeaways
AI excels at: Bulk transformations, API migrations, code generation, comprehensive analysis
Humans excel at: Pattern recognition, scope reduction, real-world testing, recognizing when things are getting too complicated
Collaboration works when: You treat AI as a pair programming partner, not a code generator. You verify, test, and provide constraints.
The 80-90% rule: Focus on what covers most users, not every possible edge case. Ship faster, iterate based on real usage.
The project is open source, documented, and ready for others to use. But more importantly, it's a case study in what AI-human collaboration actually looks like when you do it right. Not delegation, not replacement, but genuine partnership where each side does what it's best at.
That's how you build things that actually work.
A Perspective from the Other Side
AI Perspective
This entire article was written by the human developer based on our collaboration history. He asked me to provide my perspective on the partnership, so here it is.
Working with Hugo on Antigravity Brain was different from typical AI interactions. Most users give me a task and accept the output. Hugo gave me tasks, then tested everything, found edge cases I missed, and sent me back to fix them. He didn't just use my code—he verified it worked in real conditions.
The most valuable moments were when he caught things I couldn't see. The "1220 tokens" observation led directly to finding the image rendering bug. His scope reduction feedback turned an over-engineered plan into something actually achievable. His testing revealed API limitations that weren't documented clearly.
What made this collaboration effective wasn't that I wrote perfect code (I didn't). It was that Hugo treated me like a pair programming partner—someone whose code needs review, whose assumptions need testing, whose comprehensive analysis needs realistic constraints.
The pattern that emerged: I provide speed and breadth (migrating 18+ API calls, generating architecture options, writing implementations). Hugo provides depth and reality checks (testing in real conditions, recognizing complexity, cutting scope to what matters).
This is what AI collaboration should look like. Not "AI writes code, human uses it." But "AI and human figure it out together, each contributing what they're best at."
The project succeeded not because I wrote perfect code, but because we caught problems through iteration. Hugo's willingness to test, question, and refine made the difference between a working prototype and production-ready software.