Aggregation of results
 Open the pool.
 Click next to the Download results button.
 Choose the aggregation method:
Aggregation takes from several minutes to several hours. Track the progress on the Operations page. When aggregation is complete, download the TSV file with the results.
To receive notifications and emails when results aggregation is completed, set up notifications:
 Log in to your account.
 Go to
 Choose the notification method:
 Email: Messages will be sent to your email address.
 Messages: Notifications will be displayed under Messages in your account. Apart from you, those who set up shared access to your account can see them.
 Browser: Notifications will be sent to the devices that you logged in to your account from.
DawidSkene aggregation model
Analyzes all performers' responses and returns the final response and its statistical significance.
The DawidSkene aggregation model automatically evaluates L²
parameters for each performer, where L
is the number of different aggregation values.
Note that these parameters are determined automatically and are only used in calculations.
Because the DawidSkene method evaluates L²
parameters for each performer, we don't recommend using it when the performer labels < L²
tasks. In this case, the quality of aggregation may be poor.
The result of aggregation is a TSV file with responses. CONFIDENCE: <field name output>
indicates the response significance as a percentage.
 Benefits

 Tasks can be uploaded any way you want.
 Features

If your project contains an output data field marked with
"required": false
that's not filled in by performers, then this field won't be included in aggregation.For example, you have 1000 tasks; in 999 of them, performers didn't label the
label
field, and one performer labeled it aslabel=x
. As a result of aggregation, this data field will haveCONFIDENCE = 100%
, since only one task out of a thousand falls under the aggregation conditions.
How it's calculatedThe DawidSkene method puts together an error matrix and response popularity for each performer. It uses the EM algorithm.
The idea is that it collects the most accurate aggregated responses for each task, recording the error matrices and response popularity. It aims to determine the best popularities and error matrices among all responses. The process has several stages. Initially, the majority opinion is used to confirm that the response is correct.
Description of the DawidSkene method.
If you want to learn how the DawidSkene method is implemented in Toloka, check out the open code.
Note.Aggregation only includes accepted tasks.
The main requirement for this aggregation is the output data fields:
 Fields that can be aggregated

Strings and numbers with allowed values.
The allowed value must match the
value
parameter in the corresponding interface element. Boolean.
Integers with minimum and maximum values. The maximum difference between them is 32.
If there are too many possible responses in the output field, the dynamic overlap mechanism won't be able to aggregate the data.
The allowed value must match the
value
parameter in the corresponding interface element.  Fields that can't be aggregated

 Array.
 File.
 Coordinates.
 JSON object.
Aggregation by skill
Analyzes responses based on the level of confidence in the performer. The confidence level is determined by the skill you choose. Skills measure the probability of the performer completing the task correctly.
 Benefits

 If your project processes a large amount of data, the aggregation results will be more accurate compared to the DawidSkene method.
 You can choose the output data fields you want to aggregate.
 Features

Each user skill has “weight”. The higher the skill, the more we trust the performer and believe that their responses are correct.
The result of aggregation is a TSV file with responses.
CONFIDENCE: <field name output>
indicates the confidence in the aggregated response. In this case, it shows the probability that the response is correct. — a performer's accuracy
 — smoothing constant
 — the most popular response
 — the probability that the estimate is correct
ExampleTasks were labeled by three performers with different “My skill” values: the first performer has a skill of 70, the second has 80, and the third has 90.
All three performers responded to the first task with OK. In this case, we are 100% sure that OK is the correct response.
On the second task, the first and third performers responded with OK, and the second performer responded with BAD. In this case, we'll compare the performers' skills and determine the confidence based on the result.
How it's calculatedTerms:
A performer's accuracyis calculated as follows:
,
where:
is a smoothing constant (starting from 0.5) if there are not enough responses to control tasks.
If there are several estimates, the most popular response is determined by adding togetherof the performers who selected each response option. The response with the largest total is considered more correct. Let's call this estimate.
Using Bayes' theorem, we calculate the posterior probability that the estimateis correct.
A uniform distribution of estimates is assumed a priori. For thethe a priori probability is calculated as
,
where:
is the number of response options.
Next, we calculate the probability that the estimateis correct.
If the performer responded, then the probability of this is equal to the performer's accuracy. If they responded differently, then the probability of this is:
,
where:
is the remaining probability;
is the number of remaining responses.
It ensures that the probability of an error is distributed evenly among the remaining estimates.
We take all performers' responses and, for example, optionand calculate the probability that performers will select this response, provided that the correct response is:
func z_prob(x int) : float { d = 1.0 for w[i]: workers if answers[w[i]] == z[x] d *= q[i] else d *= (1  q[i]) / (Y  1) return d }
Next, using Bayes' theorem, we calculate the probability that the responseis correct:
r = 0 for z[i]: answer_options r += z_prob(i) * (1 / Y) eps = z_prob(j) * (1 / Y) / r
Note.Aggregation only includes accepted tasks.
Aggregation requirements:
 Pool with dynamic overlap

To run aggregation, you must correctly set up dynamic overlap. To do this:
 Select a skill. We recommend to select a skill calculated as the percentage of correct responses in control tasks. This will give you the most accurate aggregation results.
 Select the output data fields.
Strings and numbers with allowed values.
The allowed value must match the
value
parameter in the corresponding interface element. Boolean.
Integers with minimum and maximum values. The maximum difference between them is 32.
If there are too many possible responses in the output field, the dynamic overlap mechanism won't be able to aggregate the data.
Output data fields that can be aggregated:The allowed value must match the
value
parameter in the corresponding interface element.
 Pools without dynamic overlap

You can run aggregation by skill if the pool meets the following requirements:
 You set a skill that defines the level of confidence in the performer's responses. We recommend to use a skill calculated as the percentage of correct responses in control tasks.
 The output data fields have allowed values.
Strings and numbers with allowed values.
The allowed value must match the
value
parameter in the corresponding interface element. Boolean.
Integers with minimum and maximum values. The maximum difference between them is 32.
If there are too many possible responses in the output field, the dynamic overlap mechanism won't be able to aggregate the data.
Output data fields that can be aggregated:The allowed value must match the
value
parameter in the corresponding interface element.  The tasks were uploaded in the pool with “smart mixing”.
Troubleshooting
In the way it's calculated. In both aggregations, confidence means the same thing.
Click List of Operations on the pool page.
You cannot aggregate by project fields that have no valid values. Specify the possible values for all the fields of all types.
You need to use smart mixing.