Services & Infrastructure
dataset-foundry
Framework + 2 starter datasets for selling niche data on Datarade / Snowflake. Ship a $5-50K dataset in a weekend.
Launch kit
dataset-foundry — launch kit
1-liner
Framework + 2 starter datasets for selling niche data on Datarade / Snowflake. Ship a $5-50K dataset in a weekend.
Tweet hook
The most underrated revenue stream for indie devs in 2026: niche data.
Datarade buyers pay $1-50K for clean, refreshed lists nobody else has.
I built the framework + 2 example schemas to ship them fast.
Open code 🧵
- r/datasets: "Open-source framework for packaging niche datasets for sale"
- r/dataisbeautiful: cross-promote a polished sample
- r/sideproject: "$5K dataset → first sale on Datarade in 30 days"
Cold-email ICP
- AI training-data buyers (Scale, Surge subcontractors needing niche corpora)
- Industry analysts who'd buy ready-to-go niche feeds
- Boutique consulting firms
Cold-email template
Subject: niche data feed for {their domain}
Hi {first} — your post on {topic} mentioned needing {data type}.
I run a small data-product practice. We assemble + QC + license niche
datasets (sample at link). I've worked on:
- US craft breweries with weekly tap lists
- US podcasts with verified-active RSS
If you have a recurring data need we could build into a feed for you,
$5-25K range, refreshed monthly. Reply with the spec and I'll come
back with a quote.
SEO content
- "How to sell data on Datarade: 2026 walkthrough"
- "Niche dataset selection: 50 ideas with TAM estimates"
- "QC + schema validation for sellable data"
- "Snowflake Marketplace vs Datarade vs AWS DX: where to list first"
- "From scraper to sellable: a 4-week workflow"
Documentation
dataset-foundry
Framework for building, QC'ing, and packaging niche datasets for sale on Datarade, Snowflake Marketplace, AWS Data Exchange, and direct-to-buyer.
What this gives you
- Schema framework — declarative YAML defining columns, types, validation rules, and documentation.
- QC pipeline — runs your dataset through completeness + correctness checks. Fails loudly when something's broken.
- Multi-format export — CSV, JSONL, Parquet from one source.
- Marketplace manifest — Datarade-compatible YAML manifest with schema docs + sample link.
- Two starter dataset schemas in
examples/:- us-podcasts-with-rss — verified-active RSS feeds (AI training data buyers love this).
- us-craft-breweries-with-tap-list — weekly-refresh beer lists (beer-rating apps, distribution, tourism).
Usage
cd C:\openclaw-products\dataset-foundry
python -m venv .venv
.\.venv\Scripts\activate
pip install -e ".[dev]"
# 1. Pick a schema (or write your own)
cd examples/podcasts-with-rss
# 2. Populate raw.csv with your scraped data (use whatever scraper)
# 3. QC it
foundry check schema.yaml raw.csv --min-rows 50000
# 4. Package for marketplace upload
foundry package schema.yaml raw.csv --out dist/
# Output:
# dist/us-podcasts-with-rss.csv
# dist/us-podcasts-with-rss.jsonl
# dist/us-podcasts-with-rss.parquet
# dist/us-podcasts-with-rss-sample.csv
# dist/manifest.yaml
Pricing strategy by dataset type
| Dataset shape | Recurring price | One-time | Best marketplace |
|---|---|---|---|
| Verified-active feeds (podcasts, RSS, sitemaps) | $1-5K/yr | $499-1999 | Datarade |
| Locational directory (breweries, restaurants, dentists) | $500/yr | $99-499 | Direct |
| Niche industry tracking (M&A, hiring, regulation) | $5-20K/yr | $999-4999 | Snowflake |
| Real-time price feeds | $10-50K/yr | n/a | AWS Data Exchange |
| AI training corpora (text, audio, code) | $5-100K | $1K-50K | Direct + Datarade |
Selecting a niche
Good dataset niches share three properties:
- Hard to assemble. If anyone could grep one API, the data is commodity. Look for things requiring scraping, normalization, or manual verification.
- Recurring buyer need. A snapshot is one sale; a feed is yearly.
- Buyer has budget. B2B > consumer. Niche industry > generalist.
Bad niches:
- Anything available on Kaggle for free.
- Pure web-scraped public-API data (commodity).
- Infrequently-updated reference (one sale, dies).
Distribution channels
- Datarade — easiest onboarding, ~30% take, no exclusivity.
- Snowflake Marketplace — premium pricing; requires Snowflake account; longer onboarding.
- AWS Data Exchange — enterprise reach; slowest onboarding.
- Direct sale — best margin, highest sales effort. Pitch to: industry analysts, AI training labs, market research firms.
Test
pytest tests/ -v
Roadmap
- Direct upload-to-Datarade automation
- Diff-and-update for delta refreshes (vs full snapshot)
- PII redaction utility
- Schema migration tool (when you renumber/rename columns)
- Buyer-facing data dictionary HTML generator
- License-text generator for common terms