Fundamentals of mathematical statistics. Mathematical statistics for specialists in various fields Methods of mathematical statistics are used for

The data obtained as a result of the experiment is characterized by variability, which can be caused by a random error: the error of the measuring device, the heterogeneity of the samples, etc. After collecting a large amount of homogeneous data, the experimenter needs to process it to extract the most accurate information possible about the quantity under consideration. To process large amounts of measurement data, observations, etc., which can be obtained during an experiment, it is convenient to use methods of mathematical statistics.

Mathematical statistics is inextricably linked with probability theory, but there is a significant difference between these sciences. Probability theory uses already known distributions of random variables, on the basis of which the probabilities of events, mathematical expectation, etc. are calculated. Mathematical Statistics Problem– obtain the most reliable information about the distribution of a random variable based on experimental data.

Typical directions mathematical statistics:

  • sampling theory;
  • appraisal theory;
  • testing statistical hypotheses;
  • regression analysis;
  • analysis of variance.

Methods of mathematical statistics

Methods for assessing and testing hypotheses are based on probabilistic and hyper-random models of data origin.

Mathematical statistics evaluates parameters and functions of them that represent important characteristics of distributions (median, expected value, standard deviation, quantiles, etc.), densities and distribution functions, etc. Point and interval estimates are used.

Modern mathematical statistics contains a large section - statistical sequential analysis, in which it is possible to form an array of observations from one array.

Mathematical statistics also contains general hypothesis testing theory and a large number of methods for testing specific hypotheses(for example, about the symmetry of the distribution, about the values ​​of parameters and characteristics, about the agreement of the empirical distribution function with a given distribution function, the hypothesis of testing homogeneity (the coincidence of characteristics or distribution functions in two samples), etc.).

Carrying out sample surveys, related to the construction of adequate methods for assessing and testing hypotheses, with the properties of different sampling schemes, is a branch of mathematical statistics that is of great importance. Methods of mathematical statistics directly use the following basic concepts.

Sample

Definition 1

Sampling refers to the data obtained during the experiment.

For example, the results of the flight range of a bullet when fired by the same or a group of similar guns.

Empirical distribution function

Note 1

Distribution function makes it possible to express all the most important characteristics of a random variable.

In mathematical statistics there is a concept theoretical(not known in advance) and empirical distribution functions.

The empirical function is determined according to experimental data (empirical data), i.e. by sample.

bar chart

Histograms are used for a visual, but rather approximate, representation of an unknown distribution.

bar chart is a graphical representation of the data distribution.

To obtain a high-quality histogram, adhere to the following: rules:

  • The number of sample elements must be significantly less than the sample size.
  • The split intervals must contain a sufficient number of sample elements.

If the sample is very large, the interval of sample elements is often divided into equal parts.

Sample mean and sample variance

Using these concepts, you can obtain an estimate of the necessary numerical characteristics of an unknown distribution without resorting to constructing a distribution function, histogram, etc.

3.1.1 Problems and methods of mathematical statistics

Math statistics is a branch of mathematics devoted to methods of collecting, analyzing and processing the results of statistical observational data for scientific and practical purposes. Methods of mathematical statistics are used in cases where distribution is studied mass phenomena, i.e. a large collection of objects or phenomena distributed on a certain basis.

Let a set of homogeneous objects, united by a common feature or property of a qualitative or quantitative nature, be studied. The individual elements of such a collection are called its members. The entire number of members of the population constitutes it volume. We will call the set of all objects united according to some characteristic general population. For example, the income of the population, the market value of shares, or deviations from State Standards are studied in the course of a qualitative assessment of manufactured products.

Mathematical statistics is closely related to probability theory and is based on its conclusions. In particular, the concept population in mathematical statistics corresponds to the concept spaces of elementary events in probability theory.

Studying the entire population is most often impossible or impractical due to significant material costs, damage or destruction of the research object. Thus, it is impossible to obtain objective and complete information about the income of the population of the entire region, i.e. each individual inhabitant. Due to damage to the research object, it is impossible to obtain reliable information about the quality, for example, of some medicines or food products.

Main task mathematical statistics is the study of a general population using sample data depending on the goal, that is, the study of the probabilistic properties of the population: distribution law, numerical characteristics, etc. for making management decisions under conditions of uncertainty.

3.1.2 Types of sampling

One of the methods of mathematical statistics is sampling method. In practice, most often it is not the entire population that is studied, but a limited sample from it.

Sampling(sample population) is a collection of randomly selected objects. Using the sampling method, not the entire population is studied, but a sample ( X 1 ,X 2 ,...,x n) as a result of a limited number of observations. Then, based on the probabilistic properties of a given sample from a certain population, a judgment is made about the entire population. Various selection methods are used to obtain a sample. After studying, the objects of research can be included in the general population, which corresponds to
sample.

The sample is called representative or representative, if it reproduces the general population well, that is, the probabilistic properties of the sample coincide or are close to the properties of the general population itself.

So, the effectiveness of using the sampling method increases if a number of conditions are met, which include the following:

    Number of sample elements studied enough for conclusions, that is, the sample is representative or “ representative».

Thus, a sufficient number of parts in a batch being checked for quality (defects) is established using the laws of probability theory and mathematical statistics.

    Sample items must be varied, taken at random, those. the principle must be respected randomization.

    Character being studied characteristic, typical for all elements of the set of objects under study those. for the entire population.

    The trait being studied is significant for all elements of this class.

A change in a characteristic of a statistical population studied by a sampling method is called variation, and the observed values ​​of the characteristic x i - option. Absolute frequency (frequency or frequency) options x i is the number of members of a population (general or sample) that have a value x i(i.e. this is the number of particles i- th variety).

Ranked grouping option by individual values ​​of a characteristic (or by intervals of change), i.e. a sequence of variants arranged in ascending order is called variation series. Any function ( X 1 ,X 2 ,…,X n) from observation results X 1 ,X 2 ,…,X n the random variable under study is called statistics.

Accepted population size designate N, its absolute frequencies are N i, sample size - n, its absolute frequencies are n i. It's obvious that

,
.

The ratio of frequency to population volume is called relative frequency or statistical probability and is designated W i or :

.

If the number of variants is large or close to the sample size (with a discrete distribution), and also if the sample is taken from a continuous population, then the variation series is not compiled from individual ones - point – values, and according to intervals population values. The variation series presented in a table, constructed using the grouping procedure, will be called interval. When compiling an interval variation series, the first row of the table is filled with equal-length intervals of values ​​of the population under study, the second - with the corresponding absolute or relative frequencies.

Let from some general population as a result n observations extracted sample size P. Statistical distribution samples called a list of options and their corresponding absolute or relative frequencies. Point variation series absolute frequencies can be represented by a table:

x i

X k

n i

n k

and
.

Point variation series relative frequencies presented in a table:

x i

X k

and
.

When constructing an interval distribution, there are rules in choosing the number of intervals or the size of each interval. The criterion here is the optimal ratio: with an increase in the number of intervals, representativeness improves, but the volume of data and the time for processing it increase. Difference x max - x min between the largest and smallest values ​​the option is called scope samples.

To count the number of intervals k The empirical Sturgess formula is usually used:

k= 1+3.3221g n (3.1)

(implies rounding to the nearest integer). Accordingly, the size of each interval h can be calculated using the formula:

. (3.2)

x min = x max - 0,5h.

Each interval must contain at least five options. In the case when the number of variants in an interval is less than five, adjacent intervals are usually combined.

Odessa National Medical University Department of Biophysics, Informatics and Medical Equipment Guidelines for 1st year students on the topic “Fundamentals of Mathematical Statistics” Odessa 2009

1. Topic: “Fundamentals of mathematical statistics.”

2. Relevance of the topic.

Mathematical statistics is a branch of mathematics that studies methods of collecting, systematizing and processing the results of observations of mass random events in order to clarify and practically apply existing patterns. Methods of mathematical statistics have found wide application in clinical medicine and healthcare. They are used, in particular, in the development of mathematical methods of medical diagnostics, in the theory of epidemics, in planning and processing the results of a medical experiment, in the organization of healthcare. Statistical concepts are used, consciously or unconsciously, in decision making in such matters as clinical diagnosis, predicting the course of disease in an individual patient, predicting the likely outcome of programs in a given population, and selecting the appropriate program in particular circumstances. Familiarity with the ideas and methods of mathematical statistics is an essential element of the professional education of every health care worker.

3. Entire classes. The general goal of the lesson is to teach students to consciously use mathematical statistics when solving problems of a biomedical profile. Specific whole lessons:
  1. to acquaint students with the basic ideas, concepts and methods of mathematical statistics, paying attention mainly to issues related to processing the results of observations of mass random events in order to clarify and practically apply existing patterns;
  2. to teach students to consciously apply the basic concepts of mathematical statistics when solving simple problems that arise in the professional activity of a doctor.
The student must know (level 2):
  1. determination of class frequency (absolute and relative)
  2. determination of the general aggregate and sampling, sampling volume
  3. point and interval estimation
  4. reliable interval and reliability
  5. definition of mode, median and sample mean
  6. definition of range, interquartile range, quartile deviation
  7. determination of mean absolute deviation
  8. determination of sample covariance and variance
  9. determination of sample standard deviation and coefficient of variation
  10. determination of sample regression coefficients
  11. empirical linear regression equations
  12. determination of the sample correlation coefficient.
The student must master basic calculation habits (level 3):
  1. mode, median and sample mean
  2. range, interquartile range, quartile deviation
  3. mean absolute deviation
  4. sample covariance and variance
  5. sample standard deviation and coefficient of variation
  6. reliable interval for expectation and variance
  7. sample regression coefficients
  8. sample correlation coefficient.
4. Ways to achieve the goals of the lesson: To achieve the goals of the lesson, you need the following background knowledge:
  1. Definition of distribution, distribution series and multi-knot distribution of a discrete random variable
  2. Determination of functional variation between random variables
  3. Determination of correlation between random variables
You also need to be able to calculate the probabilities of incompatible and compatible events using the appropriate rules. 5. A task for students to test their initial level of knowledge. Control questions
  1. Definition of a flash event, its relative frequency and probability.
  2. Theorem for composing the probabilities of incompatible events
  3. Theorem for compiling probabilities of joint events
  4. Theorem for multiplying the probabilities of independent events
  5. Theorem for multiplying probabilities of dependent events
  6. Total probability theorem
  7. Bayes' theorem
  8. Definition of random variables: discrete and continuous
  9. Definition of distribution, distribution series and distribution polygon of a discrete random variable
  10. Definition of the distribution function
  11. Definition of distribution center position measures
  12. Determination of measures of variability of random variable values
  13. Determination of the thickness of the distribution and the distribution curve of a continuous random variable
  14. Determination of functional dependence between random variables
  15. Determining the correlation between random variables
  16. Regression definition, equation and regression lines
  17. Determination of covariance and correlation coefficient
  18. Definition of linear regression equation.
6. Information for strengthening initial knowledge and skills can be found in the manuals:
  1. Zhumatiy P.G. Lecture “Probability Theory”. Odessa, 2009.
  2. Zhumatiy P.G. “Fundamentals of probability theory.” Odessa, 2009.
  3. Zhumatiy P.G., Senitska Y.R. Elements of probability theory. Guidelines for medical institute students. Odessa, 1981.
  4. Chaly O.V., Agapov B.T., Tsekhmister Y.V. Medical and biological physics. Kyiv, 2004.
7. Contents of educational material from this topic, highlighting the main key issues.

Mathematical statistics is a branch of mathematics that studies methods of collecting, systematizing, processing, depicting, analyzing and interpreting observational results in order to identify existing patterns.

The use of statistics in health care is necessary at both the community and individual patient levels. Medicine deals with individuals who differ from each other in many characteristics, and the values ​​by which a person can be considered healthy vary from one individual to another. No two patients or groups of patients are exactly alike, so decisions that affect individual patients or populations must be made based on experience gained from other patients or populations with similar biological characteristics. It is necessary to realize that, given the existing discrepancies, these decisions cannot be absolutely accurate - they are always associated with some uncertainty. This is precisely the viral nature of medicine.

Some examples of the application of statistical methods in medicine:

interpretation of variation (variability of the characteristics of an organism when deciding what value of one or another characteristic will be ideal, normal, average, etc., makes it necessary to use appropriate statistical methods).

diagnosis of diseases in individual patients and assessment of the health status of a population group.

predicting the end of a disease in individual patients or the possible outcome of a control program for a particular disease in any population group.

selecting an appropriate influence on a patient or population group.

planning and conducting medical research, analyzing and publishing results, reading and critically evaluating them.

health care planning and management.

Useful health information is usually hidden in masses of raw data. It is necessary to concentrate the information contained in them and present the data so that the structure of variation is clearly visible, and then select specific methods of analysis.

Data presentation provides an introduction to the following concepts and terms:

variation series (ordered arrangement) - a simple arrangement of individual observations of a quantity.

class is one of the intervals into which the entire range of values ​​of a random variable is divided.

extreme points of the class - values ​​that limit the class, for example 2.5 and 3.0, lower and upper limits of the class 2.5 - 3.0.

(absolute) class frequency - the number of observations in a class.

relative class frequency - the absolute frequency of a class, expressed as a fraction of the total number of observations.

cumulative (accumulated) frequency of a class - the number of observations that is equal to the sum of the frequencies of all previous classes and this class.

Stovptsev diagram - a graphical representation of data frequencies for nominal classes using columns whose heights are directly proportional to the class frequencies.

pie chart - a graphical representation of data frequencies for nominal classes using sectors of a circle, the areas of which are directly proportional to the class frequencies.

histogram - a graphical representation of the frequency distribution of quantitative data with areas of rectangles directly proportional to the class frequencies.

frequency polygon - a graph of the frequency distribution of quantitative data; the point corresponding to the class frequency is located above the middle of the interval, each two adjacent points are connected by a straight line segment.

ogive (cumulative curve) - a graph of the distribution of cumulative relative frequencies.

All medical data has inherent variability, so analysis of measurement results is based on the study of information about what values ​​the random variable under study took.

The set of all possible values ​​of a random variable is called general.

The part of the general population registered as a result of tests is called a sample.

The number of observations included in the sample is called the sample volume (usually denoted n).

The task of the sampling method is to use the resulting voter to make a correct estimate of the random variable that is being studied. Therefore, the main requirement for a sample is the maximum reflection of all the features of the general population. A sample that satisfies this requirement is called representative. The representativeness of the sample determines the quality of the assessment, that is, the degree of correspondence of the assessment to the parameter that it characterizes.

When estimating the parameters of a population based on a voter (parametric estimation), the following concepts are used:

point estimation - an estimate of a population parameter in the form of a single value that it can take with the highest probability.

interval estimation - estimation of a population parameter in the form of an interval of values ​​that has a given probability of covering its true value.

When using interval assessment, the concept is used:

reliable interval - an interval of values ​​that has a given probability of covering the true value of the population parameter during interval estimation.

reliability (reliable probability) - the probability with which the reliable interval covers the true value of the population parameter.

reliable limits - lower and upper limits of the reliable interval.

Conclusions obtained by methods of mathematical statistics are always based on a limited, selective number of observations, so it is natural that for the second sample the results may be different. This circumstance determines the international nature of the conclusions of mathematical statistics and, as a consequence, the widespread use of probability theory in the practice of statistical research.

A typical statistical research path is:

Having estimated the quantities or relationships between them based on observational data, they make the assumption that the phenomenon being studied can be described by one or another stochastic model

using statistical methods, this assumption can be confirmed or rejected; upon confirmation, the goal has been achieved - a model has been found that describes the patterns under study; otherwise, work continues, putting forward and testing a new hypothesis.

Definition of sample statistical estimates:

mode is the value that occurs most often in the voter,

median - central (average) value of the variation series

range R - the difference between the largest and smallest values ​​in a series of observations

percentiles - a value in a variation series that divides the distribution into 100 equal parts (thus, the median will be the fiftieth percentile)

first quartile - 25th percentile

third quartile - 75th percentile

interquartile range - the difference between the first and third quartiles (covers the central 50% of observations)

quartile deviation - half of the interquartile range

sample mean - arithmetic mean of all sample values ​​(sample estimate of mathematical expectation)

average absolute deviation - the sum of deviations from the corresponding beginning (without taking into account the sign), divided by the sample volume

the average absolute deviation from the sample mean is calculated using the formula

sample variance (X) - (sample variance estimate) is given by

sample covariance -- (sample estimate of covariance K ( X,Y )) equals

the sample regression coefficient of Y on X (sample estimate of the regression coefficient of Y on X) is equal to

the empirical linear regression equation of Y on X has the form

the sample regression coefficient of X on Y (sample estimate of the regression coefficient of X on Y) is equal to

the empirical linear regression equation of X on Y has the form

sample standard deviation s(X) - (sample estimate of standard deviation) equals the square root of the sample variance

sample correlation coefficient - (sample estimate of the correlation coefficient) equals

sample coefficient of variation  - (sample estimate of coefficient of variation CV) is equal to

.

8. Task for independent preparation of students. 8.1 Task for independent study of material from the topic.

8.1.1 Practical calculation of sample estimates

Practical calculation of sample point estimates

Example 1.

The duration of the disease (in days) in 20 cases of pneumonia was:

10, 11, 6, 16, 7, 13, 15, 8, 9, 10, 11, 13, 7, 8, 13, 15, 16, 13, 14, 15

Determine the mode, median, range, interquartile range, sample mean, mean absolute deviation from the sample mean, sample dispersion, sample coefficient of variation.

Rozv"zok.

The variation series for sampling has the form

6, 7, 7, 8, 8, 9, 10, 10, 11, 11, 13, 13, 13, 13, 14, 15, 15, 15, 16, 16

Fashion

The most common number in the voter is 13. Therefore, the value of the mode in the voter will be this number.

Median

When a variation series contains a pair of observations, the median is equal to the average of the two central terms of the series, in this case 11 and 13, so the median is 12.

Scope

The minimum value in a voter is 6 and the maximum is 16, so R = 10.

Interquartile range, quartile deviation

In a variation series, a quarter of all data has a value less than, or level 8, so the first quartile is 8, and 75% of all data has a value less, or level 12, so the third quartile is 14. So, the interquartile range is 6, and the quartile deviation is 3.

Sample mean

The arithmetic mean of all sample values ​​is equal to

.

Mean absolute deviation from sample mean

.

Sample variance

Sample standard deviation

.

Birk coefficient of variation

.

In the following example, we will consider the simplest means of studying the stochastic dependence between two random variables.

Example 2.

When examining a group of patients, data were obtained on height H (cm) and circulating blood volume V (l):

Find empirical linear regression equations.

Rozv"zok.

The first thing you need to calculate is:

sample mean

sample mean

.

The second thing you need to calculate is:

sample variance (H)

sample variance (V)

sample covariance

Third, is the calculation of sample regression coefficients:

sample regression coefficient V on H

sample regression coefficient H on V

.

Fourth, write down the required equations:

the empirical linear regression equation of V on H has the form

the empirical linear regression equation of H on V has the form

.

Example 3.

Using the conditions and results of example 2, calculate the correlation coefficient and check the reliability of the existence of a correlation between human height and circulating blood volume with a 95% reliable probability.

Rozv"zok.

The correlation coefficient is related to regression coefficients and a practically useful formula

.

For a sample assessment of the correlation coefficient, this formula has the form

.

Using the values ​​of the sample regression coefficients and in Example 2, we obtain

.

Checking the reliability of the correlation between random variables (assuming a normal distribution for each of them) is carried out as follows:

  • calculate the value of T

  • find the coefficient in the Student distribution table

  • the existence of a correlation between random variables is confirmed when performing the unevenness

.

Since 3.5 > 2.26, then with a 95% reliable probability of the existence of a correlation between the patient’s height and the volume of circulating blood, it can be considered established.

Interval estimates for mathematical expectation and variance

If the random variable has a normal distribution, then interval estimates for the mathematical expectation and variance are calculated in the following sequence:

1.find the sample mean;

2. calculate the sample variance and sample standard deviation s;

3. in the Student distribution table, using the reliable probability  and sample volume n, find the Student coefficient;

4. The reliable interval for the mathematical expectation is written in the form

5.in the distribution table "> and the sampling volumen, find the coefficients

;

6. The reliable interval for the dispersion is written in the form

The value of the reliable interval, the reliable probability and the sampling volumen depend on each other. In fact, the attitude

decreases with increasing n, so, with a constant value of the reliable interval, with increasing n, u increases. At a constant reliable probability, as the volume of viborkip increases, the value of the reliable interval decreases. When planning medical research, this connection is used to determine the minimum sampling volume that will provide the required values ​​of the reliable interval and reliable probability according to the conditions of the problem being solved.

Example 5.

Using the conditions and results of Example 1, find the interval estimates of the mathematical expectation and variance for the 95% reliable probability.

Rozv"zok.

In example 1, the point estimates of the mathematical expectation (sample mean = 12), variance (sample variance = 10.7) and standard deviation (sample standard deviation) are determined. The sample volume is n = 20.

From the Student distribution table we find the value of the coefficient

Next, we calculate the half-widthd of the reliable interval

and write down the interval estimate of the mathematical expectation

10,5 < < 13,5 при = 95%

From the Pearson distribution table "chi-square" we find the coefficients

calculate the lower and upper reliable bounds

and write the interval estimate for the variance in the form

6.2 23 at = 95% .

8.1.2. Problems to solve independently

For independent solution, problems 5.4 C 1 – 8 are offered (P.G. Zhumatiy. “Mathematical processing of medical and biological data. Problems and examples.” Odessa, 2009, pp. 24-25)

8.1.3. Control questions
  1. Class frequency (absolute and relative).
  2. Population and sample, sample size.
  3. Point and interval estimation.
  4. Reliable interval and reliability.
  5. Mode, median and sample mean.
  6. Range, interquartile range, quarterly deviation.
  7. Average absolute deviation.
  8. Sample covariance and variance.
  9. Sample standard deviation and coefficient of variation.
  10. Sample regression coefficients.
  11. Empirical regression equations.
  12. Calculation of the correlation coefficient and reliability of the correlation.
  13. Construction of interval estimates of normally distributed random variables.
8.2 Basic literature
  1. Zhumatiy P.G. “Mathematical processing of medical and biological data. Tasks and examples.” Odessa, 2009.
  2. Zhumatiy P.G. Lecture “Mathematical statistics”. Odessa, 2009.
  3. Zhumatiy P.G. “Fundamentals of mathematical statistics.” Odessa, 2009.
  4. Zhumatiy P.G., Senitska Y.R. Elements of probability theory. Guidelines for medical institute students. Odessa, 1981.
  5. Chaly O.V., Agapov B.T., Tsekhmister Y.V. Medical and biological physics. Kyiv, 2004.
8.3 Further reading
  1. Remizov O.M. Medical and biological physics. M., “Higher School”, 1999.
  2. Remizov O.M., Isakova N.Kh., Maksina O.G.. Collection of problems from medical and biological physics. M., ., “Higher School”, 1987.
Methodological instructions compiled by Assoc. P. G. Zhumatiy.

RANDOM VARIABLES AND THE LAWS OF THEIR DISTRIBUTION.

Random They call a quantity that takes values ​​depending on a combination of random circumstances. Distinguish discrete and random continuous quantities.

Discrete A quantity is called if it takes on a countable set of values. ( Example: the number of patients at a doctor's appointment, the number of letters on a page, the number of molecules in a given volume).

Continuous is a quantity that can take values ​​within a certain interval. ( Example: air temperature, body weight, human height, etc.)

Law of distribution A random variable is a set of possible values ​​of this variable and, corresponding to these values, probabilities (or frequencies of occurrence).

EXAMPLE:

x x 1 x 2 x 3 x 4 ... x n
p p 1 p 2 p 3 p 4 ... p n
x x 1 x 2 x 3 x 4 ... x n
m m 1 m 2 m 3 m 4 ... m n

NUMERICAL CHARACTERISTICS OF RANDOM VARIABLES.

In many cases, along with the distribution of a random variable or instead of it, information about these quantities can be provided by numerical parameters called numerical characteristics of a random variable . The most common of them:

1 .Expected value - (average value) of a random variable is the sum of the products of all its possible values ​​and the probabilities of these values:

2 .Dispersion random variable:


3 .Standard deviation :

“THREE SIGMA” rule - if a random variable is distributed according to a normal law, then the deviation of this value from the average value in absolute value does not exceed three times the standard deviation

GAUSS LAW – NORMAL DISTRIBUTION LAW

Often there are quantities distributed over normal law (Gauss's law). main feature : it is the limiting law to which other laws of distribution approach.

A random variable is distributed according to the normal law if it probability density has the form:



M(X)- mathematical expectation of a random variable;

s- standard deviation.

Probability Density(distribution function) shows how the probability assigned to an interval changes dx random variable, depending on the value of the variable itself:


BASIC CONCEPTS OF MATHEMATICAL STATISTICS

Math statistics- a branch of applied mathematics directly adjacent to probability theory. The main difference between mathematical statistics and probability theory is that mathematical statistics does not consider actions on distribution laws and numerical characteristics of random variables, but approximate methods for finding these laws and numerical characteristics based on the results of experiments.

Basic concepts mathematical statistics are:

1. General population;

2. sample;

3. variation series;

4. fashion;

5. median;

6. percentile,

7. frequency polygon,

8. bar chart.

Population- a large statistical population from which part of the objects for research is selected

(Example: the entire population of the region, university students of a given city, etc.)

Sample (sample population)- a set of objects selected from the general population.

Variation series- statistical distribution consisting of variants (values ​​of a random variable) and their corresponding frequencies.

Example:

X,kg
m

x- value of a random variable (mass of girls aged 10 years);

m- frequency of occurrence.

Fashion– the value of the random variable that corresponds to the highest frequency of occurrence. (In the example above, the fashion corresponds to the value 24 kg, it is more common than others: m = 20).

Median– the value of a random variable that divides the distribution in half: half of the values ​​are located to the right of the median, half (no more) - to the left.

Example:

1, 1, 1, 1, 1. 1, 2, 2, 2, 3 , 3, 4, 4, 5, 5, 5, 5, 6, 6, 7 , 7, 7, 7, 7, 7, 8, 8, 8, 8, 8 , 8, 9, 9, 9, 10, 10, 10, 10, 10, 10

In the example we observe 40 values ​​of a random variable. All values ​​are arranged in ascending order, taking into account the frequency of their occurrence. You can see that to the right of the highlighted value 7 are 20 (half) of the 40 values. Therefore, 7 is the median.

To characterize the scatter, we will find the values ​​not higher than 25 and 75% of the measurement results. These values ​​are called 25th and 75th percentiles . If the median divides the distribution in half, then the 25th and 75th percentiles are cut off by a quarter. (The median itself, by the way, can be considered the 50th percentile.) As can be seen from the example, the 25th and 75th percentiles are equal to 3 and 8, respectively.

Use discrete (point) statistical distribution and continuous (interval) statistical distribution.

For clarity, statistical distributions are depicted graphically in the form frequency range or - histograms .

Frequency polygon- a broken line, the segments of which connect points with coordinates ( x 1 ,m 1), (x 2 ,m 2), ..., or for relative frequency polygon – with coordinates ( x 1 ,р * 1), (x 2 ,р ​​* 2), ...(Fig.1).


m m i /n f(x)

Fig.1 Fig.2

Frequency histogram- a set of adjacent rectangles built on one straight line (Fig. 2), the bases of the rectangles are the same and equal dx , and the heights are equal to the ratio of frequency to dx , or R * To dx (probability density).

Example:

x, kg 2,7 2,8 2,9 3,0 3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3,9 4,0 4,1 4,2 4,3 4,4
m

Frequency polygon

The ratio of relative frequency to interval width is called probability density f(x)=m i / n dx = p* i / dx

An example of constructing a histogram .

Let's use the data from the previous example.

1. Calculation of the number of class intervals

Where n - number of observations. In our case n = 100 . Hence:

2. Calculation of interval width dx :

,

3. Drawing up an interval series:

dx 2.7-2.9 2.9-3.1 3.1-3.3 3.3-3.5 3.5-3.7 3.7-3.9 3.9-4.1 4.1-4.3 4.3-4.5
m
f(x) 0.3 0.75 1.25 0.85 0.55 0.6 0.4 0.25 0.05

bar chart

Mathematical statistics is one of the main branches of the science of mathematics, and is a branch that studies methods and rules for processing certain data. In other words, it explores ways to discover patterns that are characteristic of large populations of identical objects, based on their sampling.

The objective of this section is to construct methods for assessing the probability or making a certain decision about the nature of developing events, based on the results obtained. Tables, charts, and correlation fields are used to describe data. rarely used.

Mathematical statistics are used in various fields of science. For example, for economics it is important to process information about homogeneous sets of phenomena and objects. They can be products produced by industry, personnel, profit data, etc. Depending on the mathematical nature of the observation results, we can distinguish statistics of numbers, analysis of functions and objects of a non-numerical nature, multidimensional analysis. In addition, general and specific problems (related to the recovery of dependencies, the use of classifications, and selective research) are considered.

The authors of some textbooks believe that the theory of mathematical statistics is only a section of the theory of probability, others - that it is an independent science with its own goals, objectives and methods. However, in any case, its use is very extensive.

Thus, mathematical statistics is most clearly applicable in psychology. Its use will allow a specialist to correctly justify finding the relationship between data, generalize them, avoid many logical errors, and much more. It should be noted that it is often simply impossible to measure a particular psychological phenomenon or personality trait without computational procedures. This suggests that the basics of this science are necessary. In other words, it can be called the source and basis of probability theory.

The research method, which relies on the consideration of statistical data, is used in other areas. However, it should immediately be noted that its features, when applied to objects of different natures of origin, are always unique. Therefore, it makes no sense to combine physical science into one science. The general features of this method boil down to counting a certain number of objects that are included in a particular group, as well as studying the distribution of quantitative characteristics and applying probability theory to obtain certain conclusions.

Elements of mathematical statistics are used in areas such as physics, astronomy, etc. Here, the values ​​of characteristics and parameters, hypotheses about the coincidence of any characteristics in two samples, the symmetry of the distribution, and much more can be considered.

Mathematical statistics plays a major role in conducting their research. Their goal is most often to construct adequate estimation methods and test hypotheses. Currently, computer technology is of great importance in this science. They allow not only to significantly simplify the calculation process, but also to create samples for multiplication or when studying the suitability of the results obtained in practice.

In general, the methods of mathematical statistics help to draw two conclusions: either to accept the desired judgment about the nature or properties of the data being studied and their relationships, or to prove that the results obtained are not enough to draw conclusions.