Experiment results report
To view a report:
- Go to AppMetrica.
- Go to Varioqub → A/B experiments and click View result under the name of your experiment.
Ways to use this report
With this report, you can see if the experimental variant has any statistically significant changes in the primary metrics compared to the control variant.
The experiment tests hypotheses:
- H0: The metric value for this variant doesn't differ from the control one.
- H1: The metric value for this variant differs from the control one.
The reports use the Mann-Whitney U test with the application of bucketing methods as the test statistic. If there is too little data, discreteness corrections are used.
When the test statistic of a hypothesis takes on the value of H0, the corresponding row isn't highlighted in the AppMetrica report.
Keep in mind that if your hypothesis has a value of H0, that doesn't mean that the metric doesn't change. The only conclusion that can be made with sufficient certainty is that the effect is smaller than the MDE (minimum detectable effect). You can detect smaller changes in metrics by increasing the audience size and duration of the experiment.
When the test statistic of a hypothesis takes on the value of H1 with P-value ≤ 0.05, the corresponding row in the AppMetrica report is highlighted in color. The highlight color has three possible levels of intensity determined by the P-value. The threshold values for the different intensity levels are 0.05, 0.01, and 0.001.
Report structure and settings
The report data is grouped by the variants included in the experiment. The control variant is placed in the first row.
Metrics
When creating a new experiment, you can select its primary and secondary metrics. During the course of the experiment, these metrics become available in the report:
Primary metrics
- Users count: Number of app users for the specified period.
- Sessions count: Number of sessions divided by the number of users with at least one session for the specified period.
- Average session duration: Total duration of all sessions divided by their number (sessions with an undefined duration aren't taken into account).
- Timespent per user: Time a single user spends in the app.
- Conversion to event "A"(c): Ratio of the number of users with event A containing selected parameters c to all users.
- Conversion from event "A"(c) to event "B"(d): Number of users with event B containing selected parameters d divided by the number of users with event A containing selected parameters c.
- Step-by-step conversion from event "A"(c) to event "B"(d): Ratio of users with event B containing selected parameters d, which is preceded by event A, to users with event A and parameters c.
Note
In the Conversion to event "A"(c), Conversion from event "A"(c) to event "B"(d), and Step-by-step conversion from event "A"(c) to event "B"(d) reports, parameters (c, d) are optional and may be omitted.
If you add multiple parameters to an event, they are shown next to the event name in the report, for example: Conversion to event "A"(b→c→d). In this example, A is the event, b is the first-level parameter, c is the second-level parameter, and d is the third-level parameter.
E-commerce metrics
- In-app purchase revenue per user.
Advertising metrics
- Ad revenue per user: Revenue generated by in-app ad impressions per user.
In addition, the report includes:
-
Delta: Difference between the metric values in the experimental and control version.
-
Delta (%): Delta expressed as a percentage from the control version's metric value.
-
Confidence interval (±2σ): Visualization with the confidence intervals of the experimental and control variants on a number line.
-
P-value: Main quantifier of the test statistic result. It represents the probability of getting as extreme or more extreme results under the assumption that the metric value doesn't change (hypothesis H0 in the example). To learn more, see this article.
To accept a hypothesis, the P-value is compared with the significance level: P-value ≤ alpha. The default threshold is alpha = 0.05.
Keep in mind that alpha sets the probability of type I errors (false positives). It's impractical to set a very low alpha value, as it increases the probability of type II errors (false negatives) and raises the MDE.
-
MDE (%): Minimum detectable effect. This is the smallest change in the metric that can be detected with the existing amount of data and with the probability of type I errors alpha = 0.05 and type II errors beta = 0.2. The MDE is expressed as a percentage of a metric value in the control variant. You can lower the MDE by increasing the audience size and duration of the experiment.
Other metrics
Note
This option is available for paid AppMetrica plans.
You can expand reports on ongoing or stopped experiments with additional metrics.
This is useful if you selected an irrelevant metric when creating the experiment or if you need to analyze the experiment's impact on other metrics.
-
In AppMetrica, go to Varioqub → A/B experiments.
-
Under the name of the experiment, click View result.
-
In the menu that appears, click the selected metric and choose Other metric.
A list of available metrics opens, excluding those you've already selected. Select the desired metric to see how it was affected by the experiment.
Filters
If you selected multiple metrics when creating an experiment, you can view a chart for each of them. To do this, select a metric name from the dropdown list above the chart.
By default, complete data for each metric is shown. You can filter the data displayed in the chart and table.
Currently, the following filters are available:
Time period
-
Entire period (selected by default): The entire period of the experiment.
-
Exact period: Choose specific dates for which you want to view data on the chart in more detail. To do this, select Exact period in the dropdown above the chart and then select the dates to show results for. After that, click Apply.
Tip
If you want to view the results for a specific day, select Exact period and double-click a date in the date selection window. After that, click Apply.
Additional filters
-
Triggering experiment: Only display results for events the experiment was triggered for.
-
Only full days: Show results for the days when the experiment lasted a full day.