Why we built a structured web-scraping API

If you’ve ever shipped a scraper to production, you know the rhythm: it works on Tuesday, breaks on Wednesday, you spend Thursday rewriting selectors, and by Friday it’s broken somewhere else.

We did this for two years across hundreds of sites. We’re done.

The bet

Instead of telling the scraper where to look, you tell it what you want. Send a URL and a JSON schema; get back validated JSON. No selectors. No XPath. No nth-child workarounds at 2am.

const product = await client.extract({
  url: "https://www.example.com/p/123",
  schema: {
    title: z.string(),
    price: z.number(),
  },
});

That’s the whole API.

What it took

Three things had to be true for this to work in production:

Reliability. LLMs hallucinate. Schema validation at the API boundary is non-negotiable. We refuse to return data that doesn’t match your schema — even if it means a retry.
Speed. The first version took 12 seconds per page. That’s unusable. We’re at 1.8s p50 now, with aggressive caching and a smaller specialized model for the extraction step.
Cost. $5 per 1K pages or it’s just a tax on your bad day. We hit that by doing the heavy lifting once per page and caching aggressively.

What’s next

Streaming responses for long pages
Multi-page crawl with schema validation across pages
Browser sessions (auth-protected content)

If you want to play with it, grab an API key. It’s free up to 100 pages a month.