Disclaimer
The generation pipeline is admittedly quite slow. A full run — crawling, generating, and evaluating — can take several minutes depending on the size of the site.
This was a deliberate choice. The focus of this project was arriving at correctness over speed. Every stage prioritizes thoroughness: the MMR crawler explores intelligently rather than greedily, the generator agent reads and re-reads pages to extract specific facts, and the evaluation framework runs a full QA loop with a judge model to produce a meaningful score. None of these steps are cheap, and I chose not to cut corners on any of them.
That preference was probably too strong. In a production system you would want faster feedback loops and more aggressive caching. But the explorations in the approach — the evaluation framework, the MMR-based crawling, the self-optimization loop — are, I think, quite interesting regardless.
More importantly, correctness and speed are not fundamentally at odds here. The architecture has clear levers to rebalance the tradeoff: smaller crawl budgets, cached embeddings, lighter evaluation passes, parallel agent execution. By adjusting these knobs we can shift the balance toward speed without abandoning the quality signal — and hopefully break the assumption that it has to be a give-and-take at all.