Run an A/B experiment

To learn how new changes to your game affect user behavior, run an A/B experiment. For your experiment, choose all or part of your users and divide them into groups. Each group will be shown a unique experimental variant of your app. All variants are created based on a single version in the Games Console using flags, which are key-value pairs.

Monitor how your metrics change over time in the report to determine which changes are a success and improve your game's effectiveness.

Before creating an experiment:

  1. Plan your experiment:
    • What hypothesis do you want to test?
    • What is the difference between the variants?
    • What targets do you want to achieve?
  2. Implement the functionality. Use conditions and flags retrieved by the getFlags() method and upload a new app version.
  3. If needed, add the new flags to a configuration, and publish changes.

Creating an experiment

To create an experiment:

  1. Go to the Games Console.
  2. Select an app.
  3. Go to the Experiments tab and click Create experiment.

Warning

You can't run more than two experiments at a time.

Step 1. Name and description

Add a name for your experiment. You can also fill in the optional Description field, indicating what exactly you're testing (for example, button colors or valuable reward drop rates), your expected outcome, and the metrics you want to improve.

Step 2. Conditions

Date range

By default, the experiment is only limited by time. You can define the start date and duration of the experiment.

Warning

The maximum experiment duration is 30 days.

Audience share

Set the share of your audience to participate in the experiment.

Warning

This is not a share of the total number of users. The parameter is calculated based on the number of users who meet all the conditions you specified.

Example

If your audience share is 60%, 40% of users don't participate in the experiment. Those participating 60% are divided into several groups. The number of groups corresponds to the number of experiment variants, with each group getting one variant. For example, if your experiment encompasses 60% of your audience and has three variants, this means each variant gets 20% of your overall audience. This share is specified to the right of each variant.

Conditions

Use conditions to limit the experiment audience. After you add them, your experiment audience will only include users who meet all these conditions. Your Audience share will be calculated based on this audience.

To limit the audience, click Add condition, select all the necessary conditions, and fill in the fields that appear.

Available conditions:

  • Platforms: Mobile, desktop, or TV devices.
  • Languages: The language on the devices of users who will be shown your configuration.
  • Regions: The region set on the user's device.
  • Client features: You can set your own parameters as a key-value pair. For example: param=value. To specify multiple values combined with the "AND" operator, list the parameters separated by commas. Example: aparam=avalue, bparam=bvalue.

Step 3. Metrics

There are four metrics available for experimenting, all of which will be reflected in the results report.

Metric

Description

Play time per player

Average time in minutes that a player spent in the game in one day.

Interstitial impressions per player

Average number of interstitial unit impressions per player per day.

Rewarded impressions per player

Average number of rewarded video impressions per player per day.

In-app purchases per player

Average number of in-game purchases per player per day.

Step 4. Set up variants

You can set up multiple variants to be shown to users as part of the experiment. We recommend using the current version of your app without changes as a control variant, but you can set up changes in that version as well.

Warning

One experiment can have a maximum of 26 variants.

Set up changes in the control and experimental variants using flags, which you can get with the getFlags() method. You don't need to make changes to the app itself. Flags take the String value type.

  1. Select the block with the variant to which you want to apply changes in the experiment.
  2. Set flags with changed parameters.

Warning

You can't add or change more than two flags in a single variant.

Your audience will be divided into equal shares, the number of which will correspond to the number of the variants you created. Each variant will be shown to approximately the same number of users.

Step 5. Save and run the experiment

You can start your experiment right away or save it as a draft so you can edit and run it later. You can check your experiment at any time.

Checking the experiment

You can check the variants you have and see the applied changes:

  1. Make sure that you have chosen experimental flags for the variant you want to check.
  2. In the Experiment testing block next to the variant name, click the link or copy it to open on another device.
  3. Test that the app works with the selected experimental flag values.

The experiment conditions are not taken into account when testing variants.

Sample size calculator

In the Sample size calculator block, you can check if you get statistically significant results with the experiment conditions you chose.

This tool calculates the minimum detectable effect (MDE) that shows the smallest change in a metric that an experiment can detect for the given data with the specified error levels. If the MDE is low, you'll be able to see even the smallest change in the metric. To make your MDE lower, increase the sample size or experiment duration. If your MDE is high, you'll only be able to see significant changes. An experiment like that can be run with a small audience.

Fill in the calculator fields, providing the information about your app and the experiment:

  • Total players per day: Number of daily users accounting for the experiment conditions but not the sample size (the latter should be indicated in the Audience share block). To calculate a value for the Total players per day field, use the Players product metric.
  • Duration: Duration of the experiment in days. This corresponds to the value in the Date range field from Step 2. Conditions.
  • Audience share: Share of your audience that will participate in the experiment. This corresponds to the value in the Audience share field from Step 2. Conditions.
  • Variants: Number of variants in the experiment (from 2 to 26).
  • Number of events per player: Number of target events per average player. To calculate this value, divide the number of such events for a period by the number of unique users for this period with consideration to the experiment conditions. The target event you choose here should be based on your priority metric. If you want to assess multiple metrics in a single experiment, make separate calculations for each of them.

Borders of the detected effect: This parameter helps you see which metrics can be considered statistically significant. Those will lie outside the specified borders: smaller than the red one and bigger than the green one. Metrics that lie in between the borders may be random variations from the control variant conversion. If the range is too broad and you're looking for less prominent metric changes, try adjusting the experiment conditions. For example, increase the duration or audience share.

Running the experiment

Warning

After you launch the experiment, you won't be able to change the selected conditions, flags, and variants.

To start the experiment, click Save and run. Read a brief experiment overview and click Run if everything is fine.

Once the experiment has started, you'll be able to see which flags are used in experiments on the Flags tab. You'll also be able to view a preliminary report on the Experiments tab.

Report on the experiment results

In this report, you can see statistically significant metric changes in the experimental variant compared with the control variant.

How to read the report

To view the report:

  1. Go to the Games Console.
  2. Select an app.
  3. Open the Experiments tab and click View results under the experiment name.

Below the brief description of the experiment, you can select any of the available metrics and a time range for the report. The graph will show values of the selected metric for all tested variants as the experiment runs.

Under the graph, you'll see a table with the following numeric values:

  • Auxiliary metrics like Number of unique players.
  • The main metric that can be selected in the menu above the graph.
  • Δ: Difference between the metric values in the experimental and control variants.
  • Δ, %: Difference between the metric values in the experimental and control variants expressed as a percentage of the control variant metric value.
  • P-value: Main numeric characteristic of a statistical criterion testing outcome. For more information about this indicator, see What problems the report can solve.

If the metrics are colored in:

  • Gray, the results do not demonstrate any definite effect on the user.
  • Green, the results are positive and statistically significant.
  • Red, the results are negative and statistically significant.

There are three gradations of color intensity that are used depending on the P-value. The threshold values are 0.05, 0.01, and 0.001.

What problems the report can solve

The experiment tests the following hypotheses:

  • H0: Value of the metric hasn't changed in the given variant compared with the control variant.
  • H1: Value of the metric has changed in the given variant compared with the control variant.

The statistical criterion employed is the Mann–Whitney U test with bucketing methods. If data quantity is low, corrections for discreteness are implemented.

P-value serves as the main characteristic for assessing the experiment results. It helps determine the probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the metric value hasn't actually changed (the H0 hypothesis in the example). For more information, see the Wikipedia article on P-value.

If the H0 hypothesis is correct, the row isn't highlighted in the report.

It's important to bear in mind that the acceptance of the H0 hypothesis doesn't mean that the metric hadn't changed. You can only be sure that the effect isn't higher than the MDE. To see smaller changes in the metric, you can increase the experiment duration or audience size. To determine new values, use the Sample size calculator.

If the H1 hypothesis is confirmed with a significance level of P-value <= 0.05, the row in the report is highlighted.

Hypothesis acceptance is performed by comparing the P-value with the significance level: P-value <= alpha. The default alpha threshold is 0.05.

It's important to understand that using alpha is associated with the probability of type I errors (false positives). On the other hand, it's not reasonable to make alpha too low, because this increases the probability of type II errors (false negatives) and the MDE.

Accept the experiment

  1. Go to the Games Console.
  2. Select an app.
  3. Open the Experiments tab and click View results under the experiment name. Analyze the testing results for all variants and decide if the changes you made have been successful.

To use the experimental variant as your main one, click Add flags to config. The new values will be available in the last version of your app.

If you can't draw a firm conclusion, try changing the experiment conditions. To choose a new experiment duration or audience share, use the Sample size calculator.