Workshop Showcase · Half-Day Sprint

Bring Your Own Data

The Data Mining Sprint.

2.5 hours · 4 exercises CSV in · intelligence out Demo Day finale

Governance first: workshop uses synthetic or anonymized data only. If you process real data through a cloud LLM, you need a no-training, no-retention DPA. For healthcare/pharma, a HIPAA BAA is also required. Prompts are shown verbatim so you know exactly what gets sent.

01 / 11

The Opportunity

Your data is talking.

Few teams are listening.

Most organizations collect data they never act on. In workshop after workshop, teams discover the insight they need already exists. It is just buried. The Data Mining Sprint surfaces it in under three hours.

02 / 11

Our Method

The Mining Loop.

01

Explore

Upload your CSV. Minimal pre-cleaning required. We audit quality first.

10 min

02

Spec

Define the question. What would you build if you knew the answer?

20 min

03

Plan

Map the 5-prompt pipeline. Clean → Classify → Extract → Synthesize → Recommend.

20 min

04

Build

Run the pipeline on real data. ~8 minutes per step. JSON output. Structured insight.

45 min

05

Test

Spot-check top claims. Random sample of 20 rows. Target: >80% agreement between human and model on a 20-row sample. This is a workshop screening step, not a production validation protocol.

20 min

06

Iterate

Refine prompts, re-run clusters, sharpen the business case.

30 min

03 / 11

Exercise 1 · Data Audit

Audit your data.

Completeness

% non-null cells

67%

example score

Consistency

Format standardization

B+

dates mixed · fixable

Structure

Row-level integrity

A-

one row per record

Freshness

Days since export

12

days old · still valid

04 / 11

Exercise 2 · The Pipeline

Run the 5-prompt pipeline.

01CleanStrip HTML · dedupe · filter

02ClassifyTopic clusters · sentiment · urgency

03ExtractFeature requests · churn signals · bugs

04SynthesizePareto ranking · 2×2 matrix · dot vote

05Recommend3 actions · ROI math · 30-day tracker

Prompt Chaining

Each prompt feeds the next. Clean output becomes Classify input. Classify output becomes Extract input. No prompt works alone. The pipeline is the product.

~8 minutes per step · JSON in, JSON out

Production note: Add checkpointed stages, retry logic, and dead-letter handling in production.

05 / 11

The Prompts · What Actually Gets Sent

The 5 prompts, verbatim.

01 Clean

"You are a data cleaner. Take this raw CSV. Remove duplicates, strip HTML, standardize dates to ISO 8601, filter rows with >30% nulls. Return cleaned JSON + change summary. "

02 Classify

"You are a topic classifier. Assign each ticket one primary topic (Bug, Feature, Billing, Onboarding, Other) and one sentiment. Return JSON array. "

03 Extract

"You are a signal extractor. Identify: (a) churn-risk phrases, (b) feature requests with verbatim evidence, (c) bug severity signals. Output raw excerpts only. Return structured JSON. "

04 Synthesize

"You are a strategic analyst. Rank topics by frequency x business impact. Identify the vital few. Use code for exact counting. Return Pareto-ranked JSON. "

05 Recommend

"You are a consultant. Recommend 3 actions with: expected impact, effort estimate, owner role, 30-day success metric. Return JSON only. "

Production Notes

Deterministic first: Use pandas/polars for dedup, date normalization, and null filtering before any LLM call. Batching: Large CSVs exceed context windows. Process in chunks of 500–1000 rows with checkpointed stages. Schema: Enforce JSON output via structured response modes (OpenAI response_format, Anthropic tool_use, or Pydantic). Prompt instructions alone are not enforcement. Determinism: temperature=0 reduces variance but does not guarantee identical outputs across separate API calls. Use code for operations requiring exact reproducibility.

06 / 11

Exercise 3 · Find the Vital Few

Find the vital few.

80%of volume often driven by ~20% of topics (Pareto heuristic)

Most insight hides in a small slice.

We cluster, rank, and dot-vote. The topic with the most dots becomes the centerpiece of your business case. In many sessions, someone has a moment of recognition: "Wait, 30% of our tickets are about one bug we never prioritized."

07 / 11

Exercise 4 · Business Case

Build the case.

Current cost

Hours/month × loaded hourly rate

[Input]

from your finance team

Pilot cost

Tooling + build time

[Input]

one-time · scoped together

Projected

Reduced hours + running costs

[Model]

deflection rate measured from your data

Payback

Net of ramp and tooling

[Output]

workshop summary brief

The "So What?" Rule

Every finding needs math. Not just what we found, but what it costs to ignore it. If we can't fill key variables with a real number from your company, the business case isn't ready. We stop and get the data.

08 / 11

The Finale · 20 Minutes

Demo Day.

1Present

3 minutes per team

Top insight, business case, and one surprise. No slides required. Whiteboard + voice.

2Score

Peer scoring

Clarity (1-5) and Conviction (1-5). Was the insight clear? Did the math hold up?

3Challenge

The hard question

"What would make this wrong?" Teams defend assumptions. The room learns from pushback.

4Select

First sprint priority

Selection weights: business impact (40%), feasibility (30%), clarity of presentation (30%). The team with the highest composite score nominates the pilot candidate. Not a popularity contest.

09 / 11

Quality Gates

Analysis you can trust.

These gates apply to every workshop dataset. The 80% threshold is a workshop screening step. Full validation requires a separate protocol.

01

Data Validation

Pre-flight check on every CSV. PII/PHI scan before processing. If hits are found, quarantine and switch to synthetic data.

Check

02

Spot-Checking

Top 3 claims verified by human review. Random 20-row sample. Target >80% agreement. Workshop screening only, not a production validation protocol.

Screen

03

Human Gate

Named owner attests: "I have reviewed this and stand behind it." All outputs require professional review before production use.

Attest

04

32-Point Checklist

A 32-point checklist covering pre-flight, data integrity, analysis rigor, report completeness, and validation. Score before moving to production.

≥27/32

10 / 11

Thank you

That's the sprint.

Under three hours. One CSV. A reusable prototype pipeline, a draft business case, and a team that knows how to mine its own data. I facilitate this for leadership offsites, client workshops, and team intensives.

AG

Alex GuyenneAI Solutions Consultant

11 / 11