Публикация Школы траблшутеров

How to test a new media source

Время чтения: 16 мин 40 сек
18 февраля 2026 г. Просмотров: 150

Marketing, Startups | Константин Алексеев, Олег Брагинский

Founder of the Troubleshooters School, Oleg Braginsky and his student Konstantin Alekseev tackled a fintech scaling question. A client needed help validating a new media source: the task was to enter a new geography and confirm the channel performed. The vendor claimed to be unique, we needed to confirm that claim in practice.

Why do tests drag on?

Set out to find and assess a vendor for a new region. Got a different reality: almost a year of scaling, several geo switches, constant technical troubleshooting. Took away one lesson: without structure, every problem adds weeks of uncertainty.

Spent months on legal approvals, the first geo underperformed for over a month, platform data and MMP numbers diverged badly. At some point, on day forty, the vendor asked about our business goals – a signal the methodology should have been in place from the start. Mapped the full integration path from first contact to scaling, analyzed every stage, and identified where things broke down.

Most delays don't come from bad results – they come from missing exit criteria. Vendors are happy to keep testing indefinitely, because that works in their favor. Developed a methodology that compresses timelines and protects against systematic mistakes.

Implemented a three-phase structure: each phase has objective continue-or-stop criteria based on real numbers. Locked in threshold values, this converted vague testing into a controlled experiment. Established measurable criteria for every stage of source validation.

Three-phase structure

Developed the methodology around three sequential steps:

  1. Data validation – can we trust the platform's numbers?
  2. Pre-launch prep – are we technically ready to integrate?
  3. Results check – can this source deliver at an acceptable cost?

Sequence is critical: skipping data validation for a faster start ruins the entire experiment. Allocated budgets for each phase – minimal for data validation, capped for results check, meaningful spend only after confirming the hypothesis. Set objective transition criteria: either the numbers clear the thresholds, or we stop.

Figured out early: without validating data first, looking at results is pointless – impossible to tell what's real and what's a tracking artifact. Ran quality checks before touching economics. Established clear thresholds for every go/no-go decision across the test.

Preparation before the first click

Built a structured conversation with the vendor covering everything that typically surfaces too late. Developed three question blocks to send before any budget is committed:

  1. Data:
  • How quickly do you resolve technical integration issues?
  • Which campaign parameters are guaranteed to pass through to the tracker?
  • What click discrepancy between platform and tracker is considered acceptable?
  1. Geo:
  • Do you have benchmark cases from similar clients in our vertical?
  • Are there platform-specific constraints for our target geographies?
  • Which countries have known traffic quality problems on your platform?
  1. Optimization:
  • What minimum budget does the platform need to run stably?
  • Which events can realistically be optimized – not just in theory?
  • How exactly does the algorithm learn from client conversion data?

Sent the list in advance so the vendor rep arrived with numbers, not talking points. Asked to see real MMP integration examples: revealed the actual problem landscape immediately. Reviewed comparable client cases: got a realistic read on how long issues actually take to resolve.

Identified a consistent pattern: vendor reps tend toward optimistic forecasts, and most specific questions require "checking with the team." Documented expected response timelines in writing – answers had to arrive before launch, not after. Found another signal: if half the questions are still vague after a week, the integration will stretch into months.

Recorded data quality thresholds in writing before launch – protection in any dispute. Created a formal agreement document to reference when issues surfaced. Secured the client's position against vendor promises that evaporate post-launch.

First budget: data validation

Launched the results check phase: two to three weeks on a few thousand dollars, one question – can this platform deliver at an acceptable cost?

Checked the core performance metrics:

  • predictable budget pacing day over day without critical spikes
  • install cost against vertical benchmarks for target geographies
  • user quality via D1 and D7 retention and conversion to target in-app events.

Established a rule: optimization must show improvement over time. Flat results after two weeks signal either a scaling problem or the absence of real ML algorithms.

Stop criteria for this phase:

  • no movement after two weeks of optimization
  • budget paces erratically with unpredictable daily spikes
  • cost exceeds benchmark by 2x even after two full weeks
  • quality underperforms other active sources on retention and conversions.

When the platform manager starts optimizing manually – that's the signal, the ML isn't quite ML. Confirmed in practice: the team asked for more time, "the algorithm is still learning" – pushed back with specific conversion targets and concrete improvement dates.

Identified a pattern: a geo can underperform not because the platform is broken, but because it has country-specific constraints the vendor never disclosed upfront. Established a rule: two weeks is enough to read the direction.

Tested multiple geos sequentially and built a repeatable evaluation system: the first came in below expectations. Each iteration ran shorter because of clear exit criteria.

Moved fast on decisions: either raise bids aggressively or exit when an Asian cluster delivered almost no registrations. Found the performing geo only after several months of iteration – each cycle shorter because of the exit criteria.

Without the methodology, we would have stayed stuck in every region, hoping it would eventually work.

Incrementality tests

Strong metrics don't guarantee every install is genuinely new: some users would have converted organically, but the last click went to the paid channel.

Incrementality shows how many installs a source drives beyond organic baseline – rather than just intercepting attribution. Without this check, scaling can simply mean redistributing budget while overpaying for what would have been free.

Identified four testing methodologies with different accuracy-to-cost tradeoffs:

  1. Geo holdout. Splits markets into test and control: traffic runs in test geos, nothing changes in control geos, the gap shows true incremental lift. Requires 3–10 test markets and 15+ control markets over 4–8 weeks. Most accurate method – and the most resource-intensive to execute.
  2. User-level Randomized Controlled Trial. Borrowed from medical research: divides users into groups – one sees ads, one doesn't – and the install differential shows incremental lift. Requires statistically significant sample sizes, delivers results in 2–3 weeks, but depends on platform and tracker capabilities.
  3. Platform conversion lift. Native tools from Meta, Google, TikTok – free, requires 100+ conversions per week. Shows the platform's own contribution but misses cross-channel effects on other active sources.
  4. Time-based pause. Pauses the channel for a defined period and monitors the install drop: if volume falls, the channel was incremental. Simple to run, but risky (lost volume) and less accurate due to organic seasonality.

Walked the client through all methodologies and recommended deferring the check: spending on incrementality while still validating data and results is premature.

Documented a forward action plan: select methodology based on available budget, confirm platform tooling, use MMP incrementality as a fallback. If the source doesn't clear the performance threshold first, the incrementality question never becomes relevant.

Scaling criteria

Formulated three criteria for the decision:

  1. Economics with headroom. CPA supports target payback accounting for the cost increase that comes with scale – costs almost always go up, the platform pulls lower-quality inventory as budgets grow. Established a rule: if current CPA sits at benchmark, need at least 20–30% headroom before hitting the profitability floor.
  2. Data stability. Tracking quality holds as spend scales – discrepancies stay within acceptable range week over week. Some platforms start dropping parameters or widening click gaps at higher volume. All three criteria must pass simultaneously: solid economics with unstable data makes optimization impossible, low incrementality makes scaling a budget reshuffling exercise.
  3. Confirmed incrementality. The channel drives installs beyond organic baseline – not just capturing last-click attribution. Ran geo holdout: showed 70% incrementality. At the proposed scale, the 30% overpayment would have meant tens of thousands of dollars monthly in organic installs dressed up as paid.

Set the objective to scale gradually: 30–50% weekly increases give algorithms time to adapt without performance shock.

Locked in hard stop thresholds:

  • cost: up more than 40% from current level
  • retention: drops more than 20% versus other active channels
  • data quality: discrepancy exceeds 25% between platform and tracker.

Developed a three-month plan with weekly monitoring checkpoints and a documented rollback process if any threshold is breached.

What long integrations actually look like

Completed the path from first contact to scaling in one year: five months of legal work, four geo iterations, persistent technical troubleshooting throughout.

Without the methodology, it would have taken longer: getting stuck on the first geo alone would have added months of waiting. Confirmed: every source requires its own configuration, the tooling stack is unique, no plug-and-play solutions exist.

After completing app campaign testing, shifted to web – which turned out to be its own challenge. The vendor needed a pixel on the client's website: the algorithms understand the funnel and optimize accordingly – but client restrictions made that impossible.

Choose a web tracker integrated with the client's postbacks instead. Not the most elegant solution: it meant manual optimization – watching conversions in the tracker and adjusting the platform based on what we saw externally.

Ran into complications: technical integration required changes on both sides, click discrepancy hit 45%, parameters weren't passing correctly from the start. Fixing it took another week on top.

Got results on the first geo below expectations after six weeks: the methodology gave a clean basis to document the underperformance and move on. An Asian cluster delivered minimal registrations in three weeks – fast exit decision saved a full month of spend.

Found the performing geo on the fourth iteration at seven months: each subsequent cycle shorter because of the exit criteria.

Developed practical guidelines for planning this kind of integration:

  • timeline: plan for six months to a year from first contact to scaling
  • budget: treat test spend as a dedicated line item – an investment in channel intelligence, not a pipeline guarantee
  • communication: report progress through methodology milestones, not just final outcomes.

Validation checklist

Compiled all questions and criteria into a single reference document for evaluating every new source against a consistent standard. Structured the checklist across three phases with explicit go/no-go criteria at each gate. Built a system: converts subjective "seems to be working" into objective numbers with clear decision thresholds.

Phase 1: Questions Before the First Dollar

Defined the objective: get written answers across three question blocks – data and tracking (acceptable discrepancy, parameter passing, issue resolution speed), geo and vertical (problem markets, benchmark cases, platform quirks), optimization (algorithm vs. manual, realistically optimizable events, minimum stable budget). Set the gate criteria: all questions answered, no blockers, vendor commits to quality thresholds in writing. Without documented commitments, challenging data quality issues post-launch is extremely difficult.

Phase 2: Data Validation ($500–1’000 / 3–5 days)

Defined what to check daily: click discrepancy rate, install attribution accuracy, parameters visible in analytics, conversion events firing, cost calculating correctly. Set STOP criteria: discrepancy >20%, attribution <80%, parameters not passing. Locked in PROCEED criteria: discrepancy 10–15%, attribution >85%, parameters consistent.

Phase 3: Results Check ($5’000–10’000 / 2–3 weeks)

Defined what to check weekly: CPI vs. benchmarks, user quality (retention, conversions), pacing stability, optimization trend. Set STOP criteria: CPI 2x above benchmark, quality below other sources, erratic pacing, no improvement after two weeks. Locked in PAUSE criteria: CPI 30–50% above benchmark but quality is solid, volume is low, a geo hypothesis worth testing.

Set scaling criteria: CPI within 20% of benchmark, quality comparable, pacing predictable, positive trend confirmed.

Understanding how acquisition systems actually work – and identifying which elements drive performance – separates systematic execution from expensive guessing. Acting on individual validation elements shapes the entire system: each phase creates conditions for the next, filtering sources through progressively demanding criteria. Developed this methodology as a standalone algorithm: the client gets a repeatable tool for evaluating future sources independently – a system that works without the people who built it.