Browse
Archive
15
posts
- The Rosetta Stone of AI BS Mar 11, 2026
- ATS Tried to automate hiring, but got automated back Mar 03, 2026
- Learning RAG while benchmarking it Feb 17, 2026
- I Let an AI Interview Me, Then Data-Analyzed My Own Answers Feb 9, 2026
- How I discovered something interesting about ATS... Jan 25, 2026
- Me, Claude vs jsPDF - The Saga Jan 20, 2026
- 2KB to 2GB: Why Embedded Systems Engineers Will Dominate Jan 11, 2026
- All roads lead to Rome, yet my passport is empty. Jan 10, 2026
- Architecture Before Syntax: The Theme-Aware Chart.js Jan 9, 2026
- What Would AI Invent If We Started from Assembly? Jan 8, 2026
- Taming Gemini Costs & Coding with AI Jan 6, 2026
- Building Production SEO in a 29MB Binary Dec 30, 2025
- Why I built this website, its tech stack and approach Dec 30, 2025
- The Scorer Paradox: A Pragmatic Guide to Beating the ATS Dec 11, 2025
- Why I am Skeptical of AGI, but you should use AI Dec 12, 2025
The Rosetta Stone of AI BS
There is a unique comedy in building a fully functioning, zero-trust AI orchestrator and publishing tools to npm, only to realize the industry has rebranded everything you just did. This isn't a complaint, it's a victory lap. I predicted the enterprise AI pivot to on-prem six months ago, trusted my math instinct to build the infrastructure, and shipped products that actually work. Now, I just need to learn the marketing labels. Here is the translation table of what I built, the fundamental math that powered it, and the fancy buzzwords I'm supposed to use in interviews.
A few months ago, I wrote a post on LinkedIn predicting two things: companies will struggle with AGI because of data collapse (synthetic data doesn't carry culture), and non-AI companies wanting AI will end up self-hosting on-prem, doing reinforcement learning on the final step with their own proprietary data.
Yesterday, a recruiter sent me a job description. Senior AI Engineer. The project? On-prem deployment with custom RL on proprietary enterprise data.
Cool. So I'm a prophet now. Except I can't pass the multiple-choice test.
I Don't Know What Things Are Called
I keep building stuff and then finding out, months later, that someone gave it a name, a marketplace, and a landing page with gradient text and a waitlist.
My orchestrator that manages scraper state, routes work across LLM providers, and restarts itself when things break? Turns out that's an "autonomous agent." My .md files that tell Claude how to behave, what tools to use, what workflow to follow? Those are "skills." Anthropic literally built a skill shop. OpenAI built one too. I submitted to both. Using the format I was already writing in.
My scraper's decision loop? Observe page state, pick an action, execute, verify the result. I called it "the loop." Researchers call it ReAct. Reasoning plus Acting. Very fancy. I was just trying not to get banned.
I cleaned my job listing data so the embeddings would stop being stupid. Removed boilerplate, stripped AI-generated fluff sections, consolidated fields. That's called "feature engineering." I was calling it "making the data less shit."
I measured model outputs across different context configurations, tracked variance, applied PID control theory from my robotics days until two models converged within 2.9%. That's called "prompt engineering." As an actual engineer, I'm offended by the label. Adding "let's think step by step" to a text box is not engineering. Engineering has math, tolerances, convergence criteria, and feedback loops. But sure, same word.
The whole pipeline is an agentic workflow. I didn't know that's what it was called until someone used those words in an interview. My system scrapes listings, scores them against my profile, decides APPLY or AVOID or CONSIDER, generates a tailored resume, and submits. Each stage validates the previous one before proceeding. The conductor observes state, decides what to run next, handles failures, reschedules, persists everything to disk so it survives restarts. It ran for three months without me touching it.
That's a multi-agent system with state persistence, self-healing, and autonomous decision-making. I was calling it "the pipeline." And it runs on zero trust. No output goes unchecked. The scraper validates its own state and the state of everything upstream. The scorer does the same. The orchestrator watches all of them as a safety guard. Every agent assumes the previous one failed and verifies before proceeding. Nothing propagates without confirmation. I tried chain of thought on the apply/avoid decision at one point. Terrible idea. The model would accumulate context as it reasoned and talk itself into APPLY every time. Once it started leaning yes, the "thinking" just became confirmation bias in text form. So I went back to constrained parameters with PID-tuned thresholds. Don't let the model reason. Let it classify.
I sat in a technical interview and derived hybrid search from first principles in 2 minutes and 38 seconds. Keyword density indexing combined with embedding similarity, with a fallback chain depending on data type. I didn't know it was called hybrid retrieval. I didn't know Azure already sells it as a button you can click. I arrived at it because the problem had structured and unstructured data, and one approach obviously can't cover both.
Less than a week later I built the full RAG pipeline, benchmarked 5 embedding models across 51,545 vectors, 18 queries in 6 categories, and published every result. e5-large-instruct hit 0.879 top-1 similarity. OpenAI's text-embedding-3-large scored 0.571. The expensive model lost to a free one. Then I stripped boilerplate from the source data (AI-generated fluff, EEO statements, benefits copy) and re-ran everything. Top-1 scores barely moved. But spreads widened 36% on e5. Cleaner data didn't improve the best match. It improved consistency. My instinct about cleaning data to tighten the distribution rather than chase top-1 was correct. I just didn't have the vocabulary to explain why until after I'd already proven it empirically. But here's the thing that actually happened in production: boilerplate cleaning made it worse. Not retrieval, retrieval was fine. Cost. Denser chunks meant more tokens per query, and the bill went up. So what actually worked? Calling Kimi-k2.5 (dirt cheap) to rewrite every job description into a clean, normalized format before embedding. Costs dropped, retrieval improved, and the data stopped being a mess. Now Kimi sits in the pipeline permanently, rewriting every scraped JD before OpenAI embeds it. Fully automated. And for the actual match page? I went with text-embedding-3-small. The $0.02/million token model. Because for cosine search across 7,000+ jobs where the PDF parser already extracts clean structured data, you don't need 3072 dimensions of overkill. You need cheap, fast, and good enough. The parser does the heavy lifting.
The Rosetta Stone
I finally sat down and made the translation table. Left column: what I actually did. Right column: what I'm supposed to say in interviews so people don't think I'm making things up.
| What I did | What I should've said |
|---|---|
| Wrote .md files that control agent behavior | Skills / System Prompts |
| Built a conductor that manages everything | Autonomous Agent Orchestration |
| Made the scraper look, think, click, check | ReAct Pattern |
| Applied PID control theory until 2.9% model variance | "Prompt Engineering" |
| Tweaked prompts until models agreed with each other | Iterative Prompt Refinement |
| Saved state to a JSON file so it survives restarts | Agent State Management / Persistence |
| Sent jobs to whichever API was cheapest and awake | Multi-Model Routing with Failover |
| Made the data less shit so the model stops hallucinating | Feature Engineering |
| Built a Chrome extension because Puppeteer kept getting caught | CDP-Free Browser Automation |
| Made PDFs without LibreOffice because I'm not insane | Declarative PDF Rendering Engine |
| Combined keyword matching with vector search because duh | Hybrid Retrieval (BM25 + Dense Vectors) |
| Wrote a LinkedIn post 6 months ago | Enterprise AI Strategy (now a real job description) |
When someone asks "do you have experience with autonomous agent orchestration?" my brain short-circuits for a full second while I mentally grep my own work for a match. I do. I just filed it under "conductor." And in a multiple choice test, that second of translation is the difference between right and wrong.
December to Now
I started web development in December. That's not a typo and I'm not being cute. I came from embedded systems. Robotics. MATLAB. Assembly. C. I was doing reinforcement learning in constrained environments before Python was a mainstream language. So naturally the industry decided that Python is the only acceptable way to prove you know AI.
By February I had a live platform running on a single VPS. A Go binary, 30MB total, 99 PageSpeed, zero runtime dependencies, serving a custom CMS with hybrid SPA and SSR. A stealth scraper that ran against LinkedIn's anti-bot systems for three months without getting caught. A job scoring engine that converged with LinkedIn's own ATS within 2.9% variance. A PDF resume renderer with 10 theme variants, all client-side, zero server round-trips. A resume match engine doing pgvector cosine search across 7,000+ jobs.
On one vCPU. The kind of server that costs less per month than a mediocre sandwich.
Each piece exists because the previous piece needed it. The scraper needed to not get detected, so I built browser automation. The automation needed to look human, so I fingerprinted my own mouse movements and typing cadence. The scoring system needed to be affordable, so I made two models converge and dropped the expensive one. The resume engine needed to render without a server, so I built a jsPDF abstraction that does the layout math correctly on the first try.
None of it was planned as a product. Two things became products anyway, because apparently solving real problems occasionally produces things other people want.
The Accidental Products
sspdf was the resume renderer. I needed PDFs that look identical on every machine. No LibreOffice (which requires an entire operating system's worth of dependencies to render a damn page). No headless Chrome (which renders HTML into PDF like a drunk printer). No pixel nudging. Just math.
So I wrote a rendering engine on top of jsPDF. Source JSON describes the content. A theme file controls every visual decision. The core does cursor math, pagination, page breaks, header repetition on tables, the whole thing. Deterministic. Same input, same output, every machine.
It's on npm now. 7 built-in themes, native tables, embedded chart plugin. I submitted it as a plugin to both Anthropic's and OpenAI's marketplaces. It's the only PDF generation plugin in any of those directories. Not because nobody else wanted to build one. Because everyone else is still chaining LibreOffice.
webpilot exists because Puppeteer is a snitch. CDP leaves fingerprints. LinkedIn's systems were catching me within hours. So I threw out the entire Chrome DevTools Protocol approach and built something different.
A Chrome extension. Content script injected at document_start. A WebSocket server on localhost:7331. You send commands, the extension executes them through the real browser. Bezier mouse paths (not straight lines, which are an instant tell). Fingerprinted typing cadence. 13-point honeypot detection. And the cursor doesn't teleport to 0,0 between actions, which is what every automation tool in existence does and which is hilariously easy to detect.
It ran against LinkedIn for three months. Undetected. And to make behavioral analysis even harder, I stopped using a static list of search queries. Instead I feed an LLM what I'm looking for and ask it to suggest job titles. Then I swap models between runs so the query patterns never repeat. Then I shuffle the order. The entire pipeline doesn't have a single deterministic number. Every delay, every interval, every sequence is Math.random(). Good luck profiling that. I did catch reCAPTCHA v3 attempting to profile me though. Went to HackerOne, offered the behavioral fingerprint data and the code. They dismissed it as "informative." Cool. So I kept running. (Worth noting: webpilot doesn't ship with any of the profiling or anti-detection code, and it's not a scraping tool. It's a browser control runtime. What I do with it personally is my business.) Any language that speaks WebSocket can drive it. I submitted the skill to Anthropic and OpenAI. It gives Claude better browser control than Anthropic's own built-in tool. Theirs uses screenshots (expensive), disconnects randomly (annoying), and half the time reports "I can't see Chrome" while Chrome is literally open on screen.
Mine reads the DOM directly. Because the DOM is right there. It's text. Just read it.
The Tax on Being Early
Nobody talks about this part. Being ahead of the curve doesn't feel like being ahead. It feels like being out of sync. You build the thing, it works, you move on. Then six months later someone brands it, everyone learns the brand name, and suddenly you're the one who "doesn't have experience" with the thing you built before they named it.
I took a technical screening test this week for that Senior AI Engineer role. The job description said RAG, LLMs, generative AI in production, autonomous agents. I've shipped all of that. The test asked me what prevents memory allocation in a Python function.
I started pressing Next.
Gave up midway. Told the recruiter: syntax questions were more important than underlying principles from someone who's been doing RL in MATLAB before Python even was a thing. So, not for me. Appreciate the interest though.
And that's my fault. Genuinely. If the industry decided to call my orchestrator an "autonomous agent" and my .md files "skills" and my data cleaning "feature engineering," then the labels are part of the job. I need to learn them. The concepts were always mine. The packaging wasn't. And this industry doesn't test for concepts. It tests for packaging.
Another screening. Different company. First question: rate your Python experience. Option A said Beginner, 0-2 years. I didn't read the other options. Two years and you're a beginner? I would have rewritten Python in two years.
But that's the game. The form sees calendar time. It doesn't see that I went from zero web dev to a live platform with vector search, multi-agent orchestration, and a PDF renderer in two months while someone else spent those same two years following Flask tutorials. We'd both click the same radio button. Beautiful.
And recency isn't shallowness. When I built the embedding pipeline, I wasn't learning linear algebra for the first time. Vector spaces, distance metrics, dimensionality reduction, that's old math pointed at a new problem. The RAG benchmark took three days because I learned the API in three days. The math was already loaded. It's been loaded since MATLAB and a robotics lab.
A resume can't say that though. It sees "pgvector: 3 months" and ranks me below someone who's been importing LangChain for a year without knowing what cosine similarity actually measures. Time on tool is the only proxy that fits in a dropdown, and it's a shit proxy, but at least it's consistent. Consistently wrong.
Best part? My own tools do it to me. The ATS scorer I built keeps flagging my resume for missing skills I actually have. I learned them last week. Resume hasn't caught up. I'll ship a whole pipeline on Monday and my own scorer dings me on Wednesday because it still says PostgreSQL where it should say pgvector cosine similarity with ivfflat indexing. I'm outrunning my own snapshot refresh rate. I built a system so good at judging resumes that it judges mine. And it's right. The resume is wrong. I just haven't updated it yet because I was too busy building the thing the resume should mention.
So I'm learning the packaging. Not because I don't know the material. Because knowing the material was never what got anyone past a multiple choice screen.
Status
Both tools are approaching v1. sspdf just shipped the built-in themes in the npm package (they were missing from the files array in package.json, because of course that's the kind of bug I'd have). webpilot patched a hono security vulnerability in the MCP server and bumped to 0.4.6.
Plugin PRs are sitting in review at Anthropic and OpenAI. Three more are open at their marketplace repos. The articles are timestamped. The code is public. The predictions have receipts.
I predicted the enterprise AI trajectory six months before the job description showed up. I built the tools. I published the data. I open-sourced the products.
And I'm studying for vocabulary tests.
That's the arc. Not a tragedy, not a complaint. Just the comedy of building before the branding catches up, and then having to learn the branding so people believe you built it.