Workshop Showcase · Half-Day Sprint
Bring Your Own Data

The Data Mining Sprint.

2.5 hours · 4 exercises CSV in · intelligence out Demo Day finale

Governance first: workshop uses synthetic or anonymized data only. If you process real data through a cloud LLM, you need a no-training, no-retention DPA. For healthcare/pharma, a HIPAA BAA is also required. Prompts are shown verbatim so you know exactly what gets sent.

01 / 11
The Opportunity

Your data is talking.

Few teams are listening.

Most organizations collect data they never act on. In workshop after workshop, teams discover the insight they need already exists. It is just buried. The Data Mining Sprint surfaces it in under three hours.

02 / 11
Our Method

The Mining Loop.

01
Explore
Upload your CSV. Minimal pre-cleaning required. We audit quality first.
10 min
02
Spec
Define the question. What would you build if you knew the answer?
20 min
03
Plan
Map the 5-prompt pipeline. Clean → Classify → Extract → Synthesize → Recommend.
20 min
04
Build
Run the pipeline on real data. ~8 minutes per step. JSON output. Structured insight.
45 min
05
Test
Spot-check top claims. Random sample of 20 rows. Target: >80% agreement between human and model on a 20-row sample. This is a workshop screening step, not a production validation protocol.
20 min
06
Iterate
Refine prompts, re-run clusters, sharpen the business case.
30 min
03 / 11
Exercise 1 · Data Audit

Audit your data.

Completeness

% non-null cells
67%
example score

Consistency

Format standardization
B+
dates mixed · fixable

Structure

Row-level integrity
A-
one row per record

Freshness

Days since export
12
days old · still valid
04 / 11
Exercise 2 · The Pipeline

Run the 5-prompt pipeline.

01CleanStrip HTML · dedupe · filter
02ClassifyTopic clusters · sentiment · urgency
03ExtractFeature requests · churn signals · bugs
04SynthesizePareto ranking · 2×2 matrix · dot vote
05Recommend3 actions · ROI math · 30-day tracker

Prompt Chaining

Each prompt feeds the next. Clean output becomes Classify input. Classify output becomes Extract input. No prompt works alone. The pipeline is the product.

~8 minutes per step · JSON in, JSON out

Production note: Add checkpointed stages, retry logic, and dead-letter handling in production.

05 / 11
The Prompts · What Actually Gets Sent

The 5 prompts, verbatim.

01 Clean
"You are a data cleaner. Take this raw CSV. Remove duplicates, strip HTML, standardize dates to ISO 8601, filter rows with >30% nulls. Return cleaned JSON + change summary. "
02 Classify
"You are a topic classifier. Assign each ticket one primary topic (Bug, Feature, Billing, Onboarding, Other) and one sentiment. Return JSON array. "
03 Extract
"You are a signal extractor. Identify: (a) churn-risk phrases, (b) feature requests with verbatim evidence, (c) bug severity signals. Output raw excerpts only. Return structured JSON. "
04 Synthesize
"You are a strategic analyst. Rank topics by frequency x business impact. Identify the vital few. Use code for exact counting. Return Pareto-ranked JSON. "
05 Recommend
"You are a consultant. Recommend 3 actions with: expected impact, effort estimate, owner role, 30-day success metric. Return JSON only. "
Production Notes

Deterministic first: Use pandas/polars for dedup, date normalization, and null filtering before any LLM call. Batching: Large CSVs exceed context windows. Process in chunks of 500–1000 rows with checkpointed stages. Schema: Enforce JSON output via structured response modes (OpenAI response_format, Anthropic tool_use, or Pydantic). Prompt instructions alone are not enforcement. Determinism: temperature=0 reduces variance but does not guarantee identical outputs across separate API calls. Use code for operations requiring exact reproducibility.

06 / 11
Exercise 3 · Find the Vital Few

Find the vital few.

80%of volume often driven by ~20% of topics (Pareto heuristic)

Most insight hides in a small slice.

We cluster, rank, and dot-vote. The topic with the most dots becomes the centerpiece of your business case. In many sessions, someone has a moment of recognition: "Wait, 30% of our tickets are about one bug we never prioritized."

07 / 11
Exercise 4 · Business Case

Build the case.

Current cost

Hours/month × loaded hourly rate
[Input]
from your finance team

Pilot cost

Tooling + build time
[Input]
one-time · scoped together

Projected

Reduced hours + running costs
[Model]
deflection rate measured from your data

Payback

Net of ramp and tooling
[Output]
workshop summary brief

The "So What?" Rule

Every finding needs math. Not just what we found, but what it costs to ignore it. If we can't fill key variables with a real number from your company, the business case isn't ready. We stop and get the data.

08 / 11
The Finale · 20 Minutes

Demo Day.

1Present

3 minutes per team

Top insight, business case, and one surprise. No slides required. Whiteboard + voice.

2Score

Peer scoring

Clarity (1-5) and Conviction (1-5). Was the insight clear? Did the math hold up?

3Challenge

The hard question

"What would make this wrong?" Teams defend assumptions. The room learns from pushback.

4Select

First sprint priority

Selection weights: business impact (40%), feasibility (30%), clarity of presentation (30%). The team with the highest composite score nominates the pilot candidate. Not a popularity contest.

09 / 11
Quality Gates

Analysis you can trust.

These gates apply to every workshop dataset. The 80% threshold is a workshop screening step. Full validation requires a separate protocol.

01
Data Validation
Pre-flight check on every CSV. PII/PHI scan before processing. If hits are found, quarantine and switch to synthetic data.
Check
02
Spot-Checking
Top 3 claims verified by human review. Random 20-row sample. Target >80% agreement. Workshop screening only, not a production validation protocol.
Screen
03
Human Gate
Named owner attests: "I have reviewed this and stand behind it." All outputs require professional review before production use.
Attest
04
32-Point Checklist
A 32-point checklist covering pre-flight, data integrity, analysis rigor, report completeness, and validation. Score before moving to production.
≥27/32
10 / 11
Thank you

That's the sprint.

Under three hours. One CSV. A reusable prototype pipeline, a draft business case, and a team that knows how to mine its own data. I facilitate this for leadership offsites, client workshops, and team intensives.

AG
Alex GuyenneAI Solutions Consultant
11 / 11