The field in which you work determines the acceptable level of agreement. If it is a sporting competition, you can accept a 60% agreement to nominate a winner. However, if you look at the data from oncologists who choose to take a treatment, you need a much higher agreement – more than 90%. In general, more than 75% are considered acceptable in most areas. In this competition, the judges agreed on 3 out of 5 points. The approval percentage is 3/5 – 60%. The results of Interrater`s analysis are Kappa – 0.676 with p < 0.001. While this level of agreement is statistically significant, it is only marginally convincing. Generally, Kappa values of 0.40 to 0.59 are considered moderate, 0.60 to 0.79 as substantial and 0.80 as absent (Landis-Koch, 1977).
Most statisticians prefer Kappa`s values to be at least 0.6 and more often than 0.7 before making a good deal. Although they are not displayed in the edition, you can find a 95% confidence interval with the generic formula for 95% confidence intervals: Cohen (1968) offers an alternative weighted kappa that allows researchers to differentiate differences because of the magnitude of discrepancies. Cohen`s weighted Kappa is generally used for category data with an ordinal structure, for example. B in an evaluation system that categorizes the high, medium or low presence of a particular attribute. In this case, a subject considered high by one coder and low by another should lead to a lower estimate of the ERREUR than that of a subject considered high by one coder and another by another. Norman and Streiner (2008) show that the use of a weighted cappa with square weights for ordination scales is identical to a single two-sided mixed ICC and can be replaced. This interchangeability is a particular advantage if three or more coders are used in a study, as CCIs may contain three or more coders, while weighted kappa can only contain two codes (Norman-Streiner, 2008). Although it was definitively rejected as an appropriate measure of the IRR (Cohen, 1960; Krippendorff, 1980), many researchers continue to report on the percentage agreed by coders in their ratings as a coder agreement index. For classified data, this can be expressed as the number of agreements in observations divided by the total number of observations. In the case of ordinal data, intervals or reports, where close but not perfect agreement may be acceptable, the percentages of agreement are sometimes expressed as a percentage of evaluations that coincide over a certain interval.
Perhaps the biggest criticism of the percentages of the agreement is that they are not correct for agreements that would be expected by chance and therefore overestimate the level of the agreement.