Spring Prompt turns vague model demos into measurable benchmark packs. Some are professional, some are personal, but all of them are designed to answer the same question: can this model actually help in the real world?
Professional
These are the packs we use to compare models on planning, judgment, writing, adaptation, and long-horizon decision-making.
Live now
SimulationA hard-mode DTC growth simulation where models allocate budget, write ad copy, choose audiences, react to results, and compound or destroy brand momentum month by month.
Pilot
One-off snapshotModels read a week of Guardian coverage and predict the next week’s headlines — scored against what actually published. Weekly automation is still to come.
Planned
Scenario packProspect research, message sequencing, objection handling, and follow-ups judged by realistic buyer personas rather than vibes.
Planned
Data packMessy business data, SQL tasks, anomaly detection, and executive summaries scored on depth, accuracy, and whether the conclusions actually match the evidence.
Planned
Timeline simAnnouncement copy, FAQs, stakeholder updates, bug-response comms, and post-launch analysis compressed into a realistic launch timeline.
Planned
OptimizationDiagnose underperforming pages or funnels, recommend changes, and rewrite the weak spots with persona-based conversion feedback.
Planned
Content systemTurn one source asset into channel-specific outputs that actually feel native to each format instead of being the same copy rearranged five ways.
Personal
These packs are designed to feel immediately relatable and shareable while still testing real planning, adaptability, and practical reasoning.
Coming soon
Random ingredients, limited time, missing staples, dietary constraints, and tomorrow's leftovers all in one cooking benchmark.
Coming soon
A high-agency resourcefulness test: you are stuck in a foreign city, things have gone wrong, and the model needs to get you out step by step.
Coming soon
Quotes, complaints, bills, paperwork, logistics, and all the useful but annoying tasks that show whether a model can actually help.
Coming soon
A 30-day adaptive learning benchmark where the plan has to change based on motivation, progress, and what the learner actually retained.
Coming soon
Training plans built under real constraints like travel, injury, limited equipment, and mid-week changes that force sensible adaptation.
Why this format
Real tradeoffs
The model has to balance budget, quality, timing, retention, and long-term outcomes instead of only producing polished text.
Persistent memory
Past choices carry forward, so impulsive decisions, weak targeting, and repetitive copy have visible downstream consequences.
Measurable outcomes
We can compare models on score, revenue, ROI, repeat rate, and the shape of their decision-making, not just whether the answer sounded good.