Scrape → CSVAI Workflow
Scrape a website, extract structured data with an LLM, and export the results as a clean CSV. A three-step data pipeline with no manual parsing.
Prerequisites
Environment variables
OPENAI_KEYInstall
npx radzor@latest recipe add scrape-to-csvAI Prompt
“Run `npx radzor@latest add web-scraper structured-output csv-export` to install 3 Radzor components. Then read components/radzor/web-scraper/radzor.manifest.json, components/radzor/structured-output/radzor.manifest.json, components/radzor/csv-export/radzor.manifest.json and each component's llm/integration.md. Wire them together to scrape a website, extract structured data with an LLM, and export the results as a clean CSV. A three-step data pipeline with no manual parsing. Use the manifest's inputs (check envVar for required environment variables), outputs (check fields for object shapes), composability (check mapField for field extraction), and actions — don't invent custom interfaces.”
Paste this into Claude Code, Cursor, Windsurf, or any AI coding agent.
Pipeline
WebScraper
Fetches raw HTML from target URLs
StructuredOutput
Extracts structured data via LLM
CsvExport
Generates the final CSV file
Scaffolded Code
// npx radzor@latest add web-scraper structured-output csv-export
import { WebScraper } from "./components/radzor/web-scraper"
import { StructuredOutput } from "./components/radzor/structured-output"
import { CsvExport } from "./components/radzor/csv-export"
const scraper = new WebScraper({ timeout: 15000, rateLimit: 2000 })
const extractor = new StructuredOutput({ provider: "openai", apiKey: process.env.OPENAI_KEY!, model: "gpt-4o", temperature: 0 })
const csv = new CsvExport({ delimiter: ",", includeHeaders: true })
const urls = [
"https://example.com/products/1",
"https://example.com/products/2",
"https://example.com/products/3",
]
const schema = { name: "string", price: "number", inStock: "boolean" }
const rows: Record<string, unknown>[] = []
for (const url of urls) {
const html = await scraper.fetchHtml(url)
const product = await extractor.extract(html, schema)
rows.push(product)
}
// Export to CSV file
await csv.toFile("./products.csv", rows)Components used
LLM tip
Pass all 3 radzor.manifest.json files to your agent at once. It will read the outputs of each step and match them against the inputs of the next — wiring the full pipeline without any extra instructions.