Technology
- Technology / AI for Developers
AI Builds the Code. You Still Have to Drive.
ArticleWhy autonomous AI development still breaks down without human engineering judgment, architectural ownership, and quality
Read more → - Technology / AI for Developers
The Testing Strategy for LLM-Backed Systems That Nobody Seems to Actually Run
ArticleA software platform team was shipping LLM features under a test suite that asserted on exact output strings; after three
Read more → - Technology / Research & Papers
I Implemented the Self-Consistency Paper From Scratch. Here Is Where It Helps and Where It Does Not.
ArticleSelf-consistency (Wang et al., 2022) is cited in 8,000 papers and used in almost zero production systems I know. I imple
Read more → - Technology / Libraries & Frameworks
Cursor vs. Windsurf vs. Aider: 30 Days of Real Work With Each
ArticleI used Cursor for 10 days, Windsurf for 10 days, and Aider for 10 days — same actual work — and logged every prompt and
Read more → - Technology / AI Security & Governance
Prompt Injection Is Not the Biggest LLM Security Risk. Your Tool-Calling Permissions Model Is.
ArticleDuring a red-team exercise against a banking agent with read and write permissions to customer accounts, an indirect pro
Read more → - Technology / Prompt Engineering
Structured Outputs vs. Function Calling vs. JSON Mode: A Benchmark With Actual Production Data
ArticleI had three ways to get structured output from an LLM. I had actual production data to test against. I benchmarked all t
Read more → - Technology / Prompt Engineering
Stop Calling It Prompt Engineering. Call It What It Is: Interface Design.
ArticleA health-tech team shipped an AI clinical-note summarizer with a plaintext prompt exposed directly to clinicians; daily
Read more → - Technology / AI Engineering
I Built a Multi-Agent System With LangGraph in a Weekend. Here Is What Broke and What Held.
ArticleI rebuilt a workflow I had been running manually for six months as a three-agent LangGraph system. Two of the three agen
Read more → - Technology / Models & Benchmarks
The MMLU Trap: Why Your Benchmark-Topping Model Is Failing in Production
ArticleA Fortune 100 insurer selected a model ranked first on MMLU for an adjudication assistant, and within six weeks p95 late
Read more → - Technology / AI Security & Governance
I Ran OWASP's LLM Top 10 Against My Own App: The Vulnerabilities That Actually Hit
ArticleI systematically tested my RAG-powered support bot against every item in the OWASP LLM Top 10 (2025 edition). Three of t
Read more → - Technology / Research & Papers
From Leaderboard to Latency: I Turned a Research-Grade Model Into a Service and Measured Everything
ArticleI took a newly released research model, deployed it in the cloud, and benchmarked real-world latency, cost, and reliabil
Read more → - Technology / AI for Developers
I Replaced Half My Boilerplate With AI: What Actually Stuck After 30 Days of Cursor and Copilot
ArticleI ran a month-long experiment building real features with AI coding tools, tracking test coverage, bug rate, and time-to
Read more →












