Critical values ​​of the Kolmogorov criterion. Kolmogorov-Smirnov's goodness-of-fit criterion is a way of assessing the distribution of the population. Using a criterion to test for normality

Criterion assignment

The criterion is designed to compare two distributions:

a) empirical with theoretical, for example, uniform or normal;

b) one empirical distribution with another empirical distribution.

The criterion allows you to find the point at which the sum of the accumulated discrepancies between the two distributions is the greatest, and to assess the reliability of this discrepancy.

Description of the criterion

If in the method we compared the frequencies of the two distributions separately for the first digit, then for the sum of the first and second digits, then for the sum of the first, second and third digits, etc. Thus, we compare each time the frequencies accumulated for a given category.

If the differences between the two distributions are significant, then at some point the difference in the accumulated frequencies will reach a critical value, and we will be able to recognize the differences as statistically significant. This difference is included in the criterion formula. The greater the empirical value, the more significant the differences.

Hypotheses

The differences between the distributions are unreliable (judging by the point of the maximum accumulated difference between them).

: Differences between distributions are reliable (judging by the point of maximum accumulated discrepancy between them).

To apply the Kolmogorov-Smirnov criterion, the following conditions must be met:

1. Measurement can be carried out on a scale of intervals and ratios.

2. Samples must be random and independent.

3. It is desirable that the total volume of two samples is ≥ 50. With an increase in the sample size, the accuracy of the test increases.

4. Empirical data should allow for the possibility of ordering in ascending or descending order of any feature and must necessarily reflect some of its unidirectional change. In the event that it is difficult to observe the principle of the ordering of the attribute, it is better to use the criterion hee-square.

This criterion is used to solve the same problems as the criterion x and-square. In other words, it can be used to compare the empirical distribution with the theoretical one, or two empirical distributions with each other. However, if when applying hee-square we compare the frequencies of the two distributions, then in this criterion the accumulated (cumulative) frequencies are compared for each category (alternative). Moreover, if the difference between the accumulated frequencies in two distributions turns out to be large, then the differences between the two distributions are significant.

Task 8.12. Suppose that in an experiment a psychologist needs to use a six-sided dice with numbers on the sides from 1 to 6. For the purity of the experiment, it is necessary to obtain an "ideal" cube, ie. such that for a sufficiently large number of tosses, each of its edges would fall out approximately the same number of times. The challenge is to find out if a given cube will be close to perfect?

Solution. Let's roll the dice 120 times and compare the obtained empirical distribution with the theoretical one. Since the theoretical distribution is equally probable, the corresponding theoretical frequencies are 20. The distribution of empirical and theoretical frequencies is presented together in Table 8.15:

To calculate by the Kolmogorov – Smirnov criterion, it is necessary to carry out a number of transformations with the data in Table 8.15. We present these transformations in Table 8.16 and explain how they are obtained:

Symbol FE in table 8.16 we will denote the accumulated theoretical frequencies. In the table they are obtained as follows: to the first theoretical frequency 20, the second frequency is added, also equal to 20, the number 20 + 20 = 40 is obtained. The number 40 is put in place of the second frequency. Then the next theoretical frequency is added to the number 40, the resulting value 60 is put in place of the third theoretical frequency, and so on.

Symbol FB in table 8.16 the accumulated empirical frequencies are indicated. To calculate them, it is necessary to arrange the empirical frequencies in ascending order: 15, 18, 18, 21, 23, 25 and then add them in order. So, first there is the first frequency equal to 15, the second largest frequency is added to it and the resulting sum 15 + 18 = 33 is put in place of the second frequency, then 18 is added to 33 (33 + 18 = 51), the resulting number 51 is put in place of the third frequencies, etc.

Symbol | FE- FB | in table 8.16, the absolute values ​​of the difference between the theoretical and empirical frequencies are indicated for each column separately.

The empirical value of this criterion, which is denoted as D emp is obtained using the formula (8.13):

To get it among the numbers | FE - FB | find the maximum number (in our case it is 9) and divide it by the sample size NS. In our case NS= 120, therefore

For this criterion, a table with critical values ​​is given in Appendix 1 under No. 13. From Table 13 of Appendix 1 it follows, however, that if the number of elements in the sample is more than 100, then the values ​​of critical values ​​are calculated by formula (8.14).

Kolmogorov-Smirnov criterion. Testing the Hypothesis of Sample Homogeneity

Sample homogeneity hypotheses are hypotheses that the samples in question are drawn from the same general population.

Let there be two independent samples generated from populations with unknown theoretical distribution functions and.

The null hypothesis to be tested has the form against the competing hypothesis. We will assume that the functions and are continuous and for the estimation we use the statistics Kolmogorov - Smirnova.

Kolmogorov-Smirnov criterion uses the same idea as Kolmogorov's criterion. However, the difference lies in the fact that the Kolmogorov test compares the empirical distribution function with the theoretical one, and the Kolmogorov-Smirnov test compares two empirical distribution functions.

The statistics of the Kolmogorov-Smirnov test is as follows:

, (9.1)

where and are empirical distribution functions constructed from two samples with volumes and.

The hypothesis is rejected if the actually observed value of the statistic is greater than the critical one, i.e. , and is accepted otherwise.

For small sample sizes, the critical values ​​for the given significance levels of the criterion can be found in special tables. At (and practically at) the distribution of statistics is reduced to the Kolmogorov distribution for statistics. In this case, the hypothesis is rejected at the significance level if the actually observed value is greater than the critical one, i.e. , and is accepted otherwise.

Example 1.^ CHECKING THE UNIFORMITY OF TWO SAMPLES

Two inspections of outlets were carried out in order to identify underweight. The results are summarized in the table:


^ Interval number

Underweight intervals, g

Frequencies

Sample 1

Sample 2

1

0 – 10

3

5

2

10 – 20

10

12

3

20 – 30

15

8

4

30 – 40

20

25

5

40 – 50

12

10

6

50 – 60

5

8

7

60 – 70

25

20

8

70 – 80

15

7

9

80 – 90

5

5

The size of the first sample was equal, and the second was.

Solution:

Let us denote and - accumulated sampling rates 1 and 2;
, - the values ​​of their empirical distribution functions, respectively. The processed results are summarized in the table:














10

3

5

0.027

0.050

0.023

20

13

17

0.118

0.170

0.052

30

28

25

0.254

0.250

0.004

40

48

50

0.436

0.500

0.064

50

60

60

0.545

0.600

0.055

60

65

68

0.591

0.680

0.089

70

90

88

0.818

0.880

0.072

80

105

95

0.955

0.950

0.005

90

110

100

1.000

1.000

0.000

The last column of the table shows that. By formula (9.1), we obtain ... It is known from statistical tables that. Since, then the null hypothesis is accepted, i.e. customer underweight are described by the same distribution function.

^

STATISTICAL INDEPENDENCE AND TREND IDENTIFICATION


When analyzing random data, situations often arise when it is necessary to find out whether observations or estimates of parameters are statistically independent or they are subject to a trend. This is especially important when analyzing non-stationary data.

Such studies are usually carried out on the basis of distribution-free or nonparametric methods, in which no assumptions are made regarding the distribution function of the data under study.
^

Series criterion


Consider a sequence of observed values ​​of a random variable, each observation being assigned to one of two mutually exclusive classes, which can be denoted simply (+) or
(-). Let's look at a number of examples:

In each of these examples, a sequence of the form is formed:

^ A series is a sequence of observations of the same type, before and after which observations of the opposite type follow, or there are no observations at all.

In the given sequence, the number of observations is; and the number of series is equal.

If a sequence of observations consists of independent outcomes of the same random variable, i.e. if the probability of individual outcomes [(+) or (-)] does not change from observation to observation, then the sample distribution of the number of series in the sequence is a random variable with the mean and variance:

(9.2)

(9.3)

Here, the number of outcomes (+), and the number of outcomes (-), of course. In a particular case, if, then:

. (9.4)

Suppose there is reason to suspect a trend in the sequence of observations, i.e. there is reason to believe that the probability of occurrence of (+) or (-) varies from observation to observation. The existence of a trend can be verified as follows. Let us assume that there is no trend as a null hypothesis, i.e. suppose the observations are independent outcomes of the same random variable. Then, to test the hypothesis with any required significance level, it is necessary to compare the observed number of series with the boundaries of the hypothesis acceptance region equal to and, where.

If the observed number of series is outside the range of hypothesis acceptance, then the null hypothesis should be rejected with a level of significance. Otherwise, the null hypothesis can be accepted.

Example 2.^ APPLICATION OF THE SERIES CRITERION

There is a sequence of independent observations:


5.5

5.1

5.7

5.2

4.8

5.7

5.0

6.5

5.4

5.8

6.8

6.6

4.9

5.4

5.9

5.4

6.8

5.8

6.9

5.5

Let us check the independence of the observations by counting the number of series in the sequence obtained by comparing the observations with the median. Let's apply the criterion with the level of significance.

From the analysis of the data, we find that the value is the median. Then we introduce the notation (+) for, (-) for. So, we get:

In our example, and the area of ​​acceptance of the hypothesis is:

.

We find by statistical tables. Because

In practice, in addition to the χ 2 criterion, the Kolmogorov criterion is often used, in which the maximum value of the absolute value of the difference between the empirical distribution function and the corresponding theoretical distribution function is considered as a measure of the discrepancy between the theoretical and empirical distributions

called the statistics of the Kolmogorov test.

By setting the significance level α, one can find the corresponding critical value

The table shows the critical values ​​of the Kolmogorov criterion for some α.

Table 4.2.

Scheme of application of the Kolmogorov criterion

1.Construct an empirical distribution function and an assumed theoretical distribution function F (x).

2.The Kolmogorov statistics D is determined - a measure of the discrepancy between the theoretical and empirical distribution and the value is calculated

3. If the calculated value λ is greater than the critical value, then the null hypothesis H 0 that the random variable X has given law distribution is rejected.

If, then it is believed that the hypothesis H 0 does not contradict the experimental data.

Example. Using the Kolmogorov criterion at the significance level α = 0.05, test the hypothesis H 0 that the random variable X - the output of the workers of the enterprise - has a normal distribution law.

Solution... 1. Let's construct an empirical and theoretical distribution function.

The empirical distribution function is plotted according to the relative accumulated frequencies.

We construct the theoretical distribution function according to the formula

where

We will summarize the calculation results in a table:

Table 4.3.

Question 3

λ - Kolmogorov-Smirnov test

Criterion assignment

Criterion λ designed to match two distributions:

a) empirical with theoretical eg uniform or normal;

b) one empirical sharing with another empirical distribution.

The criterion allows you to find the point at which the sum of the accumulated discrepancies between the two distributions is the greatest, and to assess the reliability of this discrepancy.

Description of the criterion

If in the χ 2 method we compared the frequencies of the two distributions separately for each category, then here we first compare the frequencies for the first digit, then for the sum of the first and second digits, then for the sum of the first, second and third digits, etc. Thus, we compare each time the frequencies accumulated for a given category.

If the differences between the two distributions are significant, then at some point the difference in the accumulated frequencies will reach a critical value, and we will be able to recognize the differences as statistically significant. In the criterion formula λ this difference is included. The greater the empirical value λ , the more significant the differences.

Hypotheses -

H 0: Differences between the two distributions are not significant (judging by the point of maximum accumulated difference between them).

H 1: Differences between the two distributions are significant (judging by the point of maximum accumulated difference between them).

Graphical representation criterion

Let us consider, for illustration, the distribution of yellow (No. 4) color in M. Luscher's 8-color test. If the subjects chose colors at random, then yellow, like all the others, would equally likely occupy any of the 8 positions of the choice. In practice, however, most subjects place this color, the "color of expectation and hope," at the top of the row.

In Fig. 4.9 columns represent the relative frequencies of 8 hits of yellow first on the 1st position (the first left column), then on the 1st and 2nd positions (the second column), then on the 1st, 2nd and 3rd positions and so on. We see that the height of the bars is constantly increasing, as they reflect the relative frequencies accumulated for a given position. For example, the post in the 3rd position has a height of 0.51. This means that 51% of the subjects put yellow in the first three positions.

8 Relative frequency, or frequency, is the frequency referred to the total number of observations; in this case, it is the frequency of hitting the yellow color at a given position, referred to the number of subjects. For example, the frequency of hitting the yellow color on the 1st position ƒ = 24; the number of subjects n = 102; relative frequency ƒ * = ƒ / n = O, 235.

The broken line in Fig. 4.9 the points are connected, reflecting the accumulated frequencies that would be observed if the yellow color with equal probability hit each of the 8 positions. Solid lines indicate discrepancies between empirical and theoretical relative frequencies. These discrepancies are denoted as d.

Fig 4.9... Comparisons in the λ criterion: arrows indicate discrepancies between empirical and theoretical accumulations of relative frequencies for each category

The maximum discrepancy in Fig. 4.9 denoted as d max It is this third position of the color that is the turning point, which determines whether this empirical distribution is significantly different from the uniform one. We will check this by looking at Example 1.

Limitations of the criterionλ

1. The criteria requires the sample to be large enough. When comparing two empirical distributions, it is necessary that n 1,2 > 50. Comparison of the empirical distribution with the theoretical one is sometimes allowed when n > 5 (Van der Waerden B.L., 1960; Gubler E.V., 1978).

2. The digits must be ordered in ascending or descending order of any sign. They must necessarily reflect some unidirectional change in it. For example, we can take as discharges the days of the week, the 1st, 2nd, 3rd months after the course of therapy, an increase in body temperature, an increase in the feeling of insufficiency, etc. At the same time, if we take discharges that accidentally turned out to be lined up in a given sequence, then the accumulation of frequencies will reflect only this element of the random neighborhood of the discharges. For example, if six stimulus pictures in Heckhausen's technique are presented to different subjects in a different order, we have no right to talk about the accumulation of reactions during the transition from picture No. 1 of the standard set to picture No. 2, etc. We cannot talk about a unidirectional change in a feature when comparing categories "order of birth", "nationality", "specificity of education received", etc. These data are nominative scales: they do not have any unambiguous unidirectional change in the trait.

So, we cannot accumulate frequencies by discharges, which differ only qualitatively and do not represent a scale of order. In all those cases when the digits are not ordered in ascending or descending order of any attribute of the category, we should apply the method χ 2 .

Example 1:Comparison of empirical and theoretical distribution

In a sample of healthy males, students of technical and military-technical universities between the ages of 19 and 22, with an average age of 20, the Luscher test was carried out in an 8-color version. It was found that the yellow color is preferred by the subjects more often than rejected (Table 4.16). Can it be argued that the distribution of the yellow color over 8 positions in healthy subjects differs from the uniform distribution?

Table 4.16

Empirical hit rates of yellow for each of the 8 positions (n ​​= 102)

Yellow positions

Empirical frequencies

Let us formulate hypotheses.

H 0: The empirical distribution of yellow over the eight positions does not differ from the uniform distribution.

H 1: The empirical distribution of yellow over the eight positions is different from the uniform distribution.

Now let's start the calculations, gradually filling in the results with the table for calculating the criterion λ . It is better to track all operations according to Tab. 4.17, then they will be clearer.

Let's enter into the table the names (numbers) of the digits and the empirical frequencies corresponding to them (the first column of Table 4.17).

Then we calculate the empirical frequencies ƒ * by the formula:

ƒ* j= ƒ*/ n

where f j - the frequency of hitting the yellow color on the given position; n is the total number of observations;

j - position number in order.

Let's write down the results in the second column (see Table 4.17).

Now we need to calculate the accumulated empirical frequencies ∑ƒ*. For this, we will summarize the empirical frequencies of ƒ *. For example, for the 1st category, the accumulated empirical frequency will be equal to the empirical frequency of the 1st category, Eƒ * 1 = 0.235 9.

For the 2nd digit, the accumulated empirical frequency will be the sum of the empirical frequencies of the 1st and 2nd digits:

Eƒ * 1 + 2 = O, 235 + 0.147 = 0.382

For the 3rd digit, the accumulated empirical frequency will be the sum of the empirical frequencies of the 1st, 2nd and 3rd digits:

Eƒ * 1 + 2 + 3 = 0.235 + 0.147 + 0.128 = 0.510

We see that it is possible to simplify the task by summing the accumulated empirical frequency of the previous discharge with the empirical frequency of this discharge, for example, for the 4th category:

Eƒ * 1 + 2 + 3 + 4 = 0.510 + 0.078 = O, 588

Let's write the results of this work in the third column.

Now we need to compare the accumulated empirical frequencies with the accumulated theoretical frequencies. For the 1st category, the theoretical frequency is determined by the formula:

f* theor = 1/k

9 All formulas are given for discrete features that can be expressed in whole numbers, for example: ordinal number, number of subjects, quantitative composition of the group, etc.

where k - the number of bits (in this case - color positions).

For the example in question:

f * theor =1/8=0,125

This theoretical frequency applies to all 8 digits. Indeed, the probability of hitting yellow (or any other) color on each of the 8 positions with a random selection is 1/8, i.e. 0.125.

The accumulated theoretical frequencies for each discharge are determined by summation.

For the 1st category, the accumulated theoretical frequency is equal to the theoretical frequency of entering the discharge:

f * t1 =0,125

For the 2nd digit, the accumulated theoretical frequency is the sum of the theoretical frequencies of the 1st and 2nd digits:

f * t1 + 2 =0,125+0,125=0,250

For the 3rd category, the accumulated theoretical frequency is the sum of the theoretical frequency accumulated to the previous category with the theoretical frequency of this category:

f * t1 + 2 + 3 =0,250+0,125=0,375

You can determine the theoretical accumulated frequencies and by multiplying:

S f * T j = f * theor * j

where f * theor - theoretical frequency;

j is the ordinal number of the bit.

Let's enter the calculated accumulated theoretical frequencies in the fourth column of the table (Table 4.17).

Now it remains for us to calculate the differences between the empirical and theoretical accumulated frequencies (columns 3 and 4). The fifth column records the absolute values ​​of these differences, denoted as d.

Let us determine by column 5 which of the absolute values ​​of the difference is the greatest. It will be called d max. In this case, d max = 0.135.

Now we need to turn to Tab. X Annex 1 for the determination of critical values d max at n = 102.

Table 4.17

Calculation of the criterion when comparing the distribution of yellow choices with a uniform distribution (n = 102)

Yellow position

Empirical frequency

Empirical frequency

Accumulated empirical frequency

Accumulated theoretical frequency

Difference

For this case, therefore,

Obviously, the more the distributions differ, the greater the differences in the accumulated frequencies. Therefore, it will not be difficult for us to distribute the zones of significance and insignificant ™ along the corresponding axis:

d emp - d cr

Answer: But it is rejected at p = 0.05. The distribution of the yellow color over the eight positions is different from the uniform distribution. We represent all the performed actions in the form of an algorithm

ALGORITHM 14

Calculation of the absolute value of the differenced between empirical and uniform distributions

1. Enter v a table of the names of the digits and the corresponding empirical frequencies (first column).

ƒ * emp = ƒ emp /n

where ƒ emp- empirical frequency for this category;

NS- the total number of observations.

Enter the results in the second column.

f* j=∑ f* j -1 + f* j

where f* j -1

j is the ordinal number of the bit;

f * j: - empirical frequency of the given j-ro rank.

Enter the results in the third column of the table.

f*Tj=∑ f*Tj -1 + f*Tj

where =∑ f*Tj -1 - theoretical frequency accumulated in the previous discharges;

j is the ordinal number of the bit;

ƒ * t j: - theoretical frequency of the given discharge. Enter the results in the third column of the table.

5. Calculate the differences between the empirical and theoretical accumulated frequencies for each digit (between the values ​​of the 3rd and 4th columns).

6. Write down in the fifth column the absolute values ​​of the obtained differences, without their sign. Designate them as d.

7. Determine the largest absolute value of the difference from the fifth column - d max .

8. According to Table. X Appendix 1 determine or calculate critical values d max for a given number of observations n.

If d max equal to critical value d or exceeds it, the differences between the distributions are significant.

Example 2: matching twoempirical distributions

It is interesting to compare the data obtained in the previous example with the data of X. Klar's survey of 800 subjects (Klar H., 1974, p. 67). X. Clar showed that yellow is the only color, the distribution of which over 8 positions does not differ from the uniform one. For comparisons, he used the χ 2 . The empirical frequencies obtained by him are presented in Table. 4.18.

Table 4.18

Empirical frequencies of hitting yellow for each of the 8 positions in the study of X. Klara (after: Klar H., 1974) (n = 800)

Digits-positions of yellow color

Empirical frequencies

Let us formulate hypotheses.

H 0: Empirical distributions of yellow for 8 positions in the domestic sample and the sample X. Clara do not differ.

H 1: Empirical distributions of yellow for 8 positions in the domestic sample and sample X. Clara differ from each other.

Since in this case we will compare the accumulated empirical frequencies for each category, we are not interested in theoretical frequencies.

All calculations will be carried out in the table according to Algorithm 15.

ALGORITHM 15

Calculation of criterion λwhen comparing two empirical distributions

1. Enter in the table the names of the digits and the corresponding empirical frequencies obtained in distribution 1 (first column) and in distribution 2 (second column).

ƒ * e = ƒ e /n 1

where ƒ uh

n 1 [- number of observations in the sample.

Enter empirical distribution frequencies 1 in the third column.

ƒ * e = ƒ e /n 2

where ƒ uh- empirical frequency in a given category;

n 2 - the number of observations in the 2nd sample.

Enter empirical distribution frequencies 2 in the fourth column of the table.

∑ƒ* j =∑ƒ* j -1 +ƒ* j

where ∑ƒ* j -1 - the frequency accumulated in the previous discharges;

j - the ordinal number of the category;

ƒ* j -1 - the frequency of this discharge.

Record the results in the fifth column.

7. Determine by the seventh column the greatest absolute value of the difference

where n 1 - number of observations in the first sample;

n 2 - the number of observations in the second sample.

9. According to Table XI of Appendix 1 to determine to which level of statistical significance the obtained value λ corresponds .

If λ emp > 1.36, the differences between the distributions are significant.

The sequence of samples can be chosen arbitrarily, since the differences between them are estimated by the absolute value of the differences. In our case, we will consider the domestic sample as the first, and the Clara sample as the second.

Table 4.19

Calculation of the criterion when comparing empirical distributions

yellow in the domestic sample (n1 = 102)

and the sample Clara (n2 =: 800)

Yellow position

Empirical frequencies

Empirical Frequencies

Accumulated empirical particulars

Difference

∑ƒ * 1 -∑ƒ * 2

∑ƒ * 1

∑ƒ * 2

The maximum difference between the accumulated empirical frequencies is 0.118 and falls on the second digit.

In accordance with point 8 of Algorithm 15, we calculate the value of λ :

According to Table. XI Appendix 1 define the level of statistical
the significance of the obtained value: p = 0.16:

Let's build an axis of significance for clarity.

The axis shows the critical values ​​of λ corresponding to the accepted levels of significance: λ 0.05 = 1.36, λ 0.01 = 1.63.

The zone of significance extends to the right, from 1.63 onwards, and the zone of insignificance extends to the left, from 1.36 to lower values.

λ emp< λ кр

Answer: But it is accepted. The empirical distributions of yellow for 8 positions in the domestic sample and the sample X. Klara coincide. Thus, the distributions of yellow in the two samples do not differ, but at the same time, they correlate differently with the uniform distribution: in Klara, no differences from the uniform distribution were found, and in 8 domestic samples, differences were found (p<0,05). Возможно, картину могло бы прояснить применение другого метода?

E.V. Gubler (1978) proposed to combine the use of the λ criterion with the φ * criterion (angular Fisher transform).

We will talk about these possibilities of combining the λ and φ * methods in the next lecture.

.5. Algorithm for choosing a criterion for comparing distributions

This criterion also allows you to assess the significance of the differences between the two samples, including its possible use for

This criterion also allows us to assess the significance of the differences between the two samples, including its possible use to compare the empirical distribution with the theoretical one.

The criterion allows you to find the point at which the sum of the accumulated frequencies of discrepancies between the two distributions is the greatest, and to assess the reliability of this discrepancy. Null hypothesis H 0 = (differences between the two distributions are unreliable (judging by the point of maximum accumulated divergence between them)).

Schematically, the algorithm for applying the Kolmogorov-Smirnov criterion can be represented as follows:

Let us illustrate the use of the Kolmogorov-Smirnov criterion with an example.

When studying the creative activity of students, the results were obtained for the experimental and control groups (see table). Are the differences between the control and treatment groups significant?

Assimilation level

Frequency in the experimental group

Frequency in the control group

Good

172 people

120 people

Approximate

36 people

49 people

Bad

15 people

36 people

Sample size

n 1 = 172 + 36 + 15 = 223

n 2 = 120 + 49 + 36 = 205

Calculating the relative frequencies f , equal to the quotient of dividing the frequencies by the sample size, for the two available samples.

As a result, the original table will look like this:

The relative frequency of the experimental group ( f exp)

The relative frequency of the control group ( f counter)

Frequency difference module | f exp - f counter |

172/223≈ 0.77

120/205≈ 0.59

0.18

36/223≈ 0.16

49/205≈ 0.24

0.08

15/223≈ 0.07

36/205≈ 0.17

Among the obtained moduli of the differences of relative frequencies, we select the largest modulus, which is denoted d max ... In the example under consideration, 0.18> 0.1> 0.08, therefore d max = 0.18.

The empirical value of the criterion λ emp is determined using the formula:

To conclude that the criterion under consideration is similar between the two groups, let us compare the experimental value of the criterion with its critical value determined from a special table based on the level of significance. As a null hypothesis, we will accept the statement that the compared groups differ insignificantly from each other in the level of assimilation. In this case, the null hypothesis should be accepted if the observed value of the criterion does not exceed its critical value.

Considering that, according to the table, we determine the critical value of the criterion: λ cr(0,05)=1,36.

Thus, λ emp = 1.86> 1.36 = λ cr. Consequently, the null hypothesis is rejected, and the groups differ significantly according to the considered attribute.

Note that the volumes of the samples under consideration should be large enough: n 1 ≥50, n 2 ≥50.



 
Articles on topic:
Congenital dislocation of the hip joint in children and adults: treatment and prevention Dislocation of the hip in children, symptoms and treatment
Congenital dislocation of the hip is a common pathology of the musculoskeletal system. Early detection and timely treatment are important tasks of modern orthopedics. Disability prevention is based on adequate therapy with
Butterlets: description and cultivation of mycelium at home Butterlets are conditionally edible
The oiler mushroom got its name because of its oily skin on the cap. It is advisable to remove this slippery and sticky film before cooking dishes from mushrooms. How easy it is to clean oil, read below. Otherwise, it is ordinary, classic in shape.
Characteristics of the main ecological groups of mushrooms Varieties of edible mushrooms
All mushrooms, both edible and poisonous, are divided into several types. They are divided into various categories according to several characteristics, including both biological and purely utilitarian, for example, nutritional value and benefits for the body. Knowledge class
Striped Glass Striped Glass - Cyathus striatus Pers
Goblet striped, striped nest, (Cyathus striatus Pers.) Insert-tree-fungus Syn. Peziza striata Huds. Fruit bodies are initially clavate, goblet, then oblong, 0.5-1 cm high, 0.3-0.7 cm wide at the top, 0.1-0.2 cm wide below. (by Cej