How to A/B Test Landing Pages and Measure the Winner in GA4

By Emily Redmond, Data Analyst at Emilytics · April 2026

TL;DR: A/B testing in GA4 requires: define your hypothesis, split traffic 50/50, run for 2–4 weeks, then check statistical significance. A test is real only if p-value < 0.05.

I watched a company declare victory after 3 days. Variant was up 25%. They rolled it out to 100% of traffic. Then it dropped 8%.

Three days wasn't enough data. The 25% lift was random noise. They hurt their conversion rate by jumping the gun.

A/B testing is powerful. But only if you do it right.

The A/B Testing Framework

A/B testing has one rule: change one variable at a time.

If you change headline, image, and button color all at once, you won't know which one moved the needle.

The process:

Make a hypothesis (specific, measurable)
Change one variable
Run it on 50% of traffic
Keep the other 50% as control
Measure for 2–4 weeks
Calculate statistical significance
Decide

Step 1: Form Your Hypothesis

A good hypothesis is specific and testable.

Bad hypothesis: "The form is probably too long."

Good hypothesis: "Our 5-field form has 25% completion rate. Competitor forms with 2 fields have 40% completion. If we reduce our form to 2 fields (email + company), we'll increase completion by at least 15%."

The good hypothesis:

Names the problem (5 fields)
Has a benchmark (competitor data)
Is measurable (15% improvement minimum)
Has a reason (reduced friction)

Step 2: Decide What to Test

Common A/B tests:

Element	Example
Headline	"Start Your Free Trial Today" vs. "Get Productivity Superpowers"
CTA text	"Submit" vs. "Get Started" vs. "Claim Your Free Trial"
CTA color	Blue vs. Orange vs. Green
Form fields	5 fields vs. 3 fields vs. 1 field
Image	Stock photo vs. customer photo vs. no image
Copy length	200 words vs. 500 words
Social proof	No testimonials vs. 3 testimonials vs. 5 testimonials

Rule: Test the elements that drive conversion, not the ones that feel nice.

Changing button color: low impact usually (5–10% lifts at best). Changing headline: high impact usually (15–30% lifts possible). Changing form length: high impact (20–40% lifts possible).

Step 3: Set Up Your Test in GA4

GA4 has a native A/B testing tool: GA4 Experiments.

To set up:

Go to GA4 Admin → Experiments
Click "Create Experiment"
Name it: "Homepage CTA Test" or similar
Select your campaign: which traffic are you testing? (all traffic, or specific source)
Choose variants:
- Control (original)
- Variant 1 (new version)
Set traffic allocation: 50% control, 50% variant
Choose your primary metric: Conversion rate
Set your hypothesis: minimum detectable effect (e.g., 15% improvement)

GA4 will calculate sample size needed.

Alternative: Use a third-party tool

If you use Optimizely, VWO, or Unbounce, they handle the splitting and measurement. You don't need GA4 Experiments.

Advantage: easier to use, better reporting Disadvantage: another tool to pay for

For this guide, I'll assume GA4 Experiments.

Step 4: Calculate Your Sample Size

This is critical. Too few visitors and you're measuring noise. Too many and you're wasting time.

GA4 Experiments calculates this for you, but here's the math:

You need enough visitors to detect your target improvement with 95% confidence.

Example:

Current conversion rate: 2%
Target improvement: 15% (to 2.3%)
Confidence level: 95%
Required sample size: ~33,000 visitors per variant

If you have 10,000 monthly visitors:

5,000 to control
5,000 to variant
That's half a month per variant
So 1 month total test duration

Use an online calculator (Optimizely, VWO, or GCALC) to compute your specific sample size.

Step 5: Run Your Test

Rules for running a test:

Rule 1: Don't peek at results before the test ends Every time you look, you're tempted to stop early. Don't. Wait for the full duration.

Rule 2: Run for at least 2 weeks Day-of-week variation is real. Monday ≠ Friday. Run two full weeks minimum.

Rule 3: Run for at least 1 full sales cycle (if applicable) If your sales cycle is 4 weeks, run for 4 weeks minimum. Otherwise you're comparing apples to oranges.

Rule 4: Don't change your hypothesis midway You started by testing "form length." Don't switch to testing "button color" halfway through. Finish the test.

Rule 5: Track all conversions, not just the main one If you're testing to increase form submissions, also track:

Form completion rate
Form abandonment
Downstream conversions (did they actually buy?)

A test might increase form submissions but decrease form quality. You need to see both.

Step 6: Analyze the Results

The test is done. Now you read the data.

Step 1: Check sample size Do you have enough visitors to make a conclusion? Use the calculator again.

If yes, proceed
If no, you need more time or more traffic

Step 2: Calculate statistical significance This is the most important metric. Use GA4's built-in stats or a calculator like this one:

Statistical significance = how confident are we that this result is real (not random)?

You want 95% confidence minimum. In statistics-speak: p-value < 0.05.

Example:

Control: 2% conversion rate
Variant: 2.4% conversion rate (+20%)
Sample size: 50,000 per variant
Confidence: 95% (p-value = 0.03)

Interpretation: There's a 95% chance this 20% improvement is real. You can declare a winner.

Counter-example:

Control: 2% conversion rate
Variant: 2.3% conversion rate (+15%)
Sample size: 100 per variant
Confidence: 60% (p-value = 0.40)

Interpretation: There's only a 60% chance this improvement is real. Could be luck. Keep the test running or abandon it.

GA4 Experiments does this calculation for you automatically. It'll tell you "This result is 87% statistically significant" or "95% statistically significant." Only act on 95%+.

Step 3: Check secondary metrics Did the variant improve:

Conversion rate? Yes ✓
Revenue per conversion? Did it go up or down?
Bounce rate? Did it get worse?
Form completion rate? Did more people finish?

A test might be statistically significant but hurt other metrics. Check.

Declare a Winner

If variant wins (95%+ confidence):

Roll it out to 100% of traffic
Document the result (what changed, what improved, by how much)
Move to next test

If control wins:

Keep the original
Go back to the drawing board
What went wrong with the hypothesis?

If no winner (below 95% confidence):

Option 1: Run the test longer (another 2 weeks)
Option 2: Accept there's no meaningful difference and keep the original
Option 3: Change your hypothesis and test something different

Don't keep a "losing" variant just because you like it. Data wins.

Common A/B Testing Mistakes

Mistake 1: Testing too many things at once If you change headline and button color, you won't know which worked. Test one variable.

Mistake 2: Stopping the test early You hit statistical significance on day 10. Stop! You need 2–4 weeks to account for weekly variation.

Mistake 3: Measuring the wrong metric Testing form length but measuring form submissions, not form completion rate. Measure what matters.

Mistake 4: Not accounting for seasonal variation Testing your homepage during Black Friday? Results won't apply to normal traffic. Test during "normal" periods.

Mistake 5: Ignoring variant quality A test might increase conversions but convert low-quality customers. Check downstream metrics (refunds, support tickets, LTV).

💡 Emily's take: I once ran a test that increased free trial signups by 40%. Looked amazing. Then I realized the conversion rate from free trial to paid actually dropped by 20%, because the new variant was attracting "freebie seekers," not serious prospects. We reverted. Measure what matters, not just what converts at the funnel stage you're testing.

Running Multiple Tests

Once you have a system, run concurrent tests:

Timeline example:

Test 1: Headline (weeks 1–4)
Test 2: Form length (weeks 1–4, running simultaneously)
Test 3: CTA color (weeks 5–8)

This requires enough traffic to split 4 ways (control + 3 variants), but if you have it, you can move faster.

Don't test 10 things at once. That's chaos. Test 2–3 concurrent tests max.

Frequently Asked Questions

Q: How long should I run a test? A: Minimum 2 weeks (to account for day-of-week variation), better 4 weeks (to account for weekly patterns). See How Long Should Your Analytics Observation Period Be?

Q: What sample size do I need? A: Depends on your current conversion rate and target improvement. Use a calculator. Typical range: 1,000–50,000 per variant.

Q: Can I keep a variant if it's 85% statistically significant? A: Technically, maybe. But I recommend waiting for 95%. The extra week of confidence prevents false winners. Plus, the cost of a false positive (rolling out a bad variant) is usually higher than the benefit of speed.

Q: What if I have very low traffic? A: Run tests longer (6–8 weeks instead of 4). Or test bigger changes (completely new copy vs. small tweaks). Or use qualitative feedback (user interviews) to validate before testing.

Q: Should I test on mobile and desktop separately? A: Yes, if you have enough traffic. Mobile and desktop often behave differently. If traffic is low, test both together first, then segment after you see a winner.

Q: Can I A/B test organic search traffic? A: Yes, using GA4 Experiments. But note: organic traffic is self-selected (they searched for you). Changes that work for organic might not work for PPC.

The Bottom Line

A/B testing is how you turn hunches into knowledge.

But only if you're disciplined: one variable, 2–4 weeks, 95% significance. Rush it and you'll ship bad changes.

Go slow. Test small. Learn fast.

Emily Redmond is a data analyst at Emilytics — AI analytics agent watching your GA4, Search Console, and Bing data around the clock. 8 years experience. Say hi →