Statistical data processing. Statistical data processing Statistical data processing definition

  • Slide 2

    • Statistics is an exact science that studies methods of collecting, analyzing and processing data that describe mass actions, phenomena and processes
    • Mathematical statistics is a branch of mathematics that studies methods of collecting, systematizing and processing the results of observations of random mass phenomena in order to identify existing patterns.
  • Slide 3

    Statistics studies

    • the number of individual population groups of the country and its regions,
    • production and consumption of various types of products,
    • transportation of goods and passengers by various modes of transport,
    • natural resources and much more.
    • The results of statistical studies are widely used for practical and scientific conclusions.
    • Currently, statistics begins to be studied already in high school, in universities this is a compulsory subject, because it is associated with many sciences and fields.
    • To increase the number of sales in a store, to improve the quality of knowledge in school, to move the country towards economic growth, it is necessary to conduct statistical studies and draw appropriate conclusions. And everyone should be able to do this.
  • Slide 4

    The main goals of studying the elements of statistics

    • Formation of skills in primary processing of statistical data;
    • image and analysis of quantitative information presented in different forms (in the form of tables, diagrams, graphs of real dependencies);
    • developing ideas about important statistical ideas, namely: the idea of ​​estimation and the idea of ​​testing statistical hypotheses;
    • developing the ability to compare the probabilities of random events occurring with the results of specific experiments.
  • Slide 5

    • Data series
    • Data series volume
    • Range of data series
    • Data series mode
    • Median of the series
    • Average
    • Ordered data series
    • Data distribution table
    • Let's sum it up
    • Nominative data series
    • Result Frequency
    • Percent frequency
    • Grouping data
    • Data processing methods
    • Let's sum it up
  • Slide 6

    Definition

    • A data series is a series of results of some measurements.
    • For example:1) measuring human height
    • 2) Human (animal) weight measurements
    • 3) Meter readings (electricity, water, heat...)
    • 4) Results in the 100-meter dash
    • Etc.
  • Slide 7

    • The volume of a data series is the amount of all data.
    • For example: given a series of numbers 1; 3; 6; -4; 0
    • its volume will be equal to 5. Why?
  • Slide 8

    Complete the task

    • Determine the volume of this series.
    • Answer: 10
  • Slide 9

    Definition

    • Range is the difference between the largest and smallest numbers in a data series.
    • For example: if given a series of numbers 1; 3; 6; -4; 0; 2, then the range of this data series will be equal to 6 (since 6 – 0 = 6)
  • Slide 10

    Complete the task

    • At the institute we took a test in higher mathematics. There were 10 people in the group, and they received the corresponding ratings: 3, 5, 5, 4, 4, 4, 3, 2, 4, 5.
    • Determine the range of this series.
    • Answer: 3
  • Slide 11

    Definition

    • The mode of a data series is the number of the series that occurs most often in this series.
    • A data series may or may not have a mode.
    • Thus, in the data series 47, 46, 50, 52, 47, 52, 49, 45, 43, 53, each of the numbers 47 and 52 occurs twice, and the remaining numbers less than twice. In such cases, it was agreed that the series has two modes: 47 and 52.
  • Slide 12

    Complete the task

    • So, in the data series
    • 47, 46, 50, 52, 47, 52, 49, 45, 43, 53 each of the numbers 47 and 52 appears twice, and the remaining numbers less than twice. In such cases, it was agreed that the series has two modes: 47 and 52.
    • At the institute we took a test in higher mathematics. There were 10 people in the group, and they received the appropriate ratings:
    • 3, 5, 5, 4, 4, 4, 3, 2, 4, 5.
    • Determine the mode of this series.
    • Answer: 4
  • Slide 13

    Definition

    • A median with an odd number of terms is the number written in the middle.
    • A median with an even number of terms is the arithmetic mean of the two numbers written in the middle.
    • For example: determine the median of a series of numbers
    • 16; -4; 5; -2; -3; 3; 3; -2; 3. Answer: -3
    • 2) -1; 0; 2; 1; -1; 0;2; -1. Answer: 0
  • Slide 14

    Complete the task

    • At the institute we took a test in higher mathematics. There were 10 people in the group, and they received the corresponding ratings: 3, 5, 5, 4, 4, 4, 3, 2, 4, 5.
    • Determine the median of this series.
    • Answer: 4
  • Slide 15

    Definition

    • The arithmetic mean is the quotient of dividing the sum of the numbers in a series by their number.
    • For example: given a series of numbers -1; 0; 2; 1; -1; 0; 2; -1. Then the arithmetic mean will be equal to: (-1+0+2+1+(-1)+0+2+(-1)):8 =2:8=0.25
  • Slide 16

    Complete the task

    • At the institute we took a test in higher mathematics. There were 10 people in the group, and they received the corresponding ratings: 3, 5, 5, 4, 4, 4, 3, 2, 4, 5.
    • Determine the arithmetic mean of this series.
    • Answer: 3.9
  • Slide 17

    Practical work

    • Assignment: characterize student Ivanov’s performance in mathematics for the fourth quarter.
    • COMPLETING OF THE WORK:
    • 1.Collection of information:
    • The grades written out from the magazine are: 5,4,5,3,3,5,4,4,4.
    • 2.Processing of received data:
    • volume = 9
    • range = 5 - 3 = 2
    • fashion = 4
    • median = 3
    • arithmetic mean =(5+4+5+3+3+5+4+4+4) : 9 ≈ 4
    • Characteristics of academic performance: the student is not always ready for the lesson.
    • Mostly he studies with grades "4". A quarter comes out to “4”.
  • Slide 18

    On one's own

    • We need to find the volume of the series, the range of the series, the mode, the median and the arithmetic mean:
    • Card 1. 22.5; 23; 21.5; 22; 23.
    • Card 2. 6; -4; 5; -2; -3; 3; 3; -2; 3.
    • Card 3. 12.5; 12; 12; 12.5; 13; 12.5; 13.
    • Card 4. -1; 0; 2; 1; -1; 0; 2; -1.
    • Card 5. 125; 130; 124; 131.
    • Card 6. 120; 100; 110.
  • Slide 19

    Let's check

    • Card 1.
    • row volume = 5
    • range of row = 10
    • fashion = 23
    • median = 21.5
    • arithmetic mean = 13.3
    • Card 3.
    • row volume = 7
    • range of series = 1
    • mode = 12.5
    • median = 12.5
    • arithmetic mean = 12.5
    • Card 2.
    • row volume = 9
    • range of row = 10
    • fashion = 3
    • median = -3
    • arithmetic mean = 1
    • Card 4.
    • row volume = 8
    • range of row = 3
    • mode = -1
    • median = 0
    • arithmetic mean = 0.25
  • Slide 20

    • Card 5.
    • row volume = 4
    • row range = 7
    • fashion = no
    • median = 127
    • arithmetic mean =127.5
    • Card 6.
    • row volume = 3
    • range range = 20
    • fashion = no
    • median = 100
    • arithmetic mean = 110
  • Slide 21

    Definition

    • Ordered data series are series in which the data is arranged according to some rule.
    • How to arrange a series of numbers? (Write the numbers so that each subsequent number is no less (no more) than the previous one); or write down some names “alphabetically”...
  • Slide 22

    Complete the task

    • Given a series of numbers:
    • -1;-3;-3;-2;3;3;2;0;3;3;-3;-3;1;1;-3;-1
    • Sort it in ascending numbers.
    • Solution:
    • -3;-3;-3;-3;-3;-2;-1;-1;0;1;1;2;3;3;3;3
    • The result is an ordered series. The data itself has not changed, only the order in which they appear has changed.
  • Slide 23

    Definition

    • A data distribution table is a table of an ordered series in which, instead of repeating the same number, the number of repetitions is recorded.
    • Conversely, if the distribution table is known, then an ordered series of data can be compiled.
    • For example:
    • From it we get the following ordered series:
    • -3;-3;-3;-1;-1;-1;-1;5;5;7;8;8;8;8;8
  • Slide 24

    Complete the task

    • In a women's shoe store, statistical research was carried out and a corresponding table was compiled for the price of shoes and the number of sales:
    • Price (RUB): 500 1200 1500 1800 2000 2500
    • Quantity: 8 9 14 15 3 1
    • For these indicators, you need to find statistical characteristics:
    • create an ordered series of data
    • volume of data series
    • series range
    • fashion series
    • median of the series
    • arithmetic mean of a data series
  • Slide 25

    And answer the following questions

    • From these price categories, at what price should the store not sell the shoes?
    • Shoes, at what price should it be distributed?
    • What price should you aim for?
  • Slide 26

    Let's sum it up

    • We got acquainted with the initial concepts of how statistical data processing occurs:
    • data is always the result of some measurement
    • A number of some data can be found:
    • volume, range, mode, median and
    • average
    • 3) any data series can be
    • organize and compose
    • data distribution table
  • Slide 27

    Definition

    • The nominative series of data is NOT NUMERICAL DATA, but, for example, names; titles; nominations...
    • For example: list of World Cup finalists since 1930: Argentina, Czechoslovakia, Hungary, Brazil, Hungary, Sweden, Czechoslovakia, Germany, Italy, Netherlands, Netherlands, Germany, Germany,
    • Argentina, Italy, Brazil, Germany, France
  • Slide 28

    Complete the task

    • Find from previous example:
    • volume of row 2) mode of row
    • 3) create a distribution table
    • Solution: volume = 18; fashion – German team.
  • Laboratory work No. 3. Statistical data processing in the MatLab system

    General statement of the problem

    The main purpose of execution laboratory work is to familiarize yourself with the basics of working with statistical data processing in the MatLAB environment.

    Theoretical part

    Primary statistical data processing

    Statistical data processing is based on primary and secondary quantitative methods. The purpose of the primary processing of statistical data is to structure the information obtained, which involves grouping the data into summary tables according to various parameters. Primary data must be presented in a format that allows a person to make an approximate assessment of the resulting data set and to identify information about the data distribution of the resulting data sample, such as the homogeneity or compactness of the data. After the primary data analysis, methods of secondary statistical data processing are applied, on the basis of which statistical patterns in the existing data set are determined.

    Carrying out primary statistical analysis on a data array allows you to gain knowledge about the following:

    Which value is most typical for the sample? To answer this question, measures of central tendency are defined.

    How large is the spread of data relative to this characteristic value, i.e., what is the “fuzziness” of the data? In this case, measures of variability are determined.

    It is worth noting the fact that statistical indicators of central tendency and variability are determined only on quantitative data.

    Measures of central tendency– a group of values ​​around which the rest of the data is grouped. Thus, measures of central tendency generalize the array of data, which makes it possible to form conclusions both about the sample as a whole and to conduct comparative analysis different samples with each other.

    Suppose we have a data sample, then measures of central tendency are assessed by the following indicators:

    1. Sample mean is the result of dividing the sum of all sample values ​​by their number. Determined by formula (3.1).

    (3.1)

    Where - i th element of the selection;

    n– number of sample elements.

    The sample mean provides the greatest accuracy in the process of estimating central tendency.

    Let's say we have a sample of 20 people. The sample elements are information about the average monthly income of each person. Let's assume that 19 people have an average monthly income of 20 thousand rubles. and 1 person with an income of 300 tr. The total monthly income of the entire sample is 680 rubles. The sample mean in this case is S=34.


    2. Median– forms a value above and below which the number of different values ​​is the same, i.e. this is the central value in a sequential series of data. Determined depending on the even/odd number of elements in the sample using formulas (3.2) or (3.3). Algorithm for estimating the median for a data sample:

    First of all, the data is ranked (ordered) in descending/ascending order.

    If the ordered sample has an odd number of elements, then the median coincides with the central value.

    (3.2)

    Where n

    In the case of an even number of elements, the median is defined as the arithmetic mean of the two central values.

    (3.3)

    where is the average element of the ordered sample;

    - element of the ordered selection next to ;

    Number of sample elements.

    If all sample elements are different, then exactly half of the sample elements are greater than the median, and the other half are less. For example, for the sample (1, 5, 9, 15, 16), the median is equal to element 9.

    In statistical data analysis, the median helps identify sample elements that greatly influence the value of the sample mean.

    Let's say we have a sample of 20 people. The sample elements are information about the average monthly income of each person. Let's assume that 19 people have an average monthly income of 20 thousand rubles. and 1 person with an income of 300 tr. The total monthly income of the entire sample is 680 rubles. The median, after ordering the sample, is defined as the arithmetic mean of the tenth and eleventh elements of the sample) and is equal to Me = 20 tr. This result is interpreted as follows: the median divides the sample into two groups, so that we can conclude that in the first group each person has an average monthly income of no more than 20 thousand rubles, and in the second group no less than 20 thousand rubles. IN in this example we can say that the median is characterized by how much the “average” person earns. While the value of the sample average is significantly exceeded S=34, which indicates the unacceptability of this characteristic when assessing average earnings.

    Thus, the greater the difference between the median and the sample average, the greater the dispersion of the sample data (in the example considered, a person with an income of 300 rubles clearly differs from the average people in a particular sample and has a significant impact on the estimate of average income). What to do with such elements is decided in each individual case. But in the general case, to ensure the reliability of the sample, they are removed, since they have a strong influence on the assessment of statistical indicators.

    3. Fashion (Mo)– generates the value that occurs most frequently in the sample, i.e. the value with the highest frequency. Mode estimation algorithm:

    In the case when a sample contains elements that occur equally frequently, it is said that there is no mode in such a sample.

    If two neighboring elements samples have the same frequency, which is greater than the frequency of the remaining elements of the sample, then the mode is defined as the average of these two values.

    If two sample elements have the same frequency, which is greater than the frequency of the remaining sample elements, and these elements are not adjacent, then the sample is said to have two modes.

    Mode in statistical analysis is used in situations where a quick assessment of the measure of central tendency is necessary and high accuracy is not required. For example, fashion (by size or brand) can be conveniently used to determine the clothes and shoes that are in greatest demand among customers.

    Measures of scatter (variability)– a group of statistical indicators characterizing the differences between individual sample values. Based on the indicators of the measures of dispersion, the degree of homogeneity and compactness of the sample elements can be assessed. Measures of dispersion are characterized by the following set of indicators:

    1. Range - this is the interval between the maximum and minimum values ​​of the observation results (sample elements). The range indicator indicates the spread of values ​​in the data set. If the range is large, then the values ​​in the aggregate are very scattered, otherwise (the range is small) it is said that the values ​​in the aggregate lie close to each other. The range is determined by formula (3.4).

    (3.4)

    Where - maximum sample element;

    - minimum sample element.

    2.Average deviation– arithmetic mean difference (in absolute value) between each value in the sample and its sample mean. The average deviation is determined by formula (3.5).

    (3.5)

    Where - i th element of the selection;

    The sample mean value calculated using formula (3.1);

    Number of sample elements.

    Module necessary due to the fact that deviations from the average for each specific element can be both positive and negative. Consequently, if you do not take the module, then the sum of all deviations will be close to zero and it will be impossible to judge the degree of data variability (crowding of data around the sample mean). When performing statistical analysis, the mode and median may be taken instead of the sample mean.

    3. Dispersion- a measure of dispersion that describes the comparative deviation between data values ​​and the average value. It is calculated as the sum of the squared deviations of each sample element from the average value. Depending on the sample size, the variance is estimated different ways:

    For large samples (n>30) according to formula (3.6)

    (3.6)

    For small samples (n<30) по формуле (3.7)

    (3.7)

    where X i is the i-th sample element;

    S – sample mean;

    Number of sample elements;

    (X i – S) - deviation from the average value for each value of the data set.

    4. Standard deviation-a measure of how widely scattered data points are relative to their mean.

    The process of squaring individual deviations when calculating variance increases the degree of deviation of the resulting deviation from the original deviations, which in turn introduces additional errors. Thus, in order to bring the estimate of the spread of data points relative to their mean closer to the value of the mean deviation, the square root of the variance is taken. The extracted root of the variance characterizes a measure of variability called the root mean square or standard deviation (3.8).

    (3.8)

    Let's say you are the manager of a software development project. You have five programmers under your command. By managing the project execution process, you distribute tasks among programmers. To simplify the example, we will proceed from the fact that the tasks are equal in complexity and completion time. You decided to analyze the work of each programmer (the number of completed tasks during the week) over the last 10 weeks, as a result of which you received the following samples:

    Week Name

    By estimating the average number of completed tasks, you get the following result:

    Week Name S
    22,3
    22,4
    22,2
    22,1
    22,5

    Based on the S indicator, all programmers work on average with the same efficiency (about 22 tasks per week). However, the variability indicator (range) is very high (from 5 tasks for the fourth programmer to 24 tasks for the fifth).

    Week Name S P
    22,3
    22,4
    22,2
    22,1
    22,5

    Let’s estimate the standard deviation, which shows how the values ​​in the samples are distributed relative to the average, and specifically, in our case, estimate how large the spread in task completion is from week to week.

    Week Name S P SO
    22,3 1,56
    22,4 1,8
    22,2 2,84
    22,1 1,3
    22,5 5,3

    The resulting estimate of the standard deviation indicates the following (we will evaluate two extreme cases, programmers 4 and 5):

    Each value in the sample of 4 programmers deviates on average by 1.3 assignments from the average value.

    Each value in the programmer's sample 5 deviates on average by 5.3 items from the average value.

    The closer the standard deviation is to 0, the more reliable the mean, since it indicates that each value in the sample is almost equal to the mean (in our example, 22.5 items). Therefore, programmer 4 is the most consistent, unlike programmer 5. The variability of task completion from week to week for the 5th programmer is 5.3 tasks, which indicates a significant spread. In the case of the 5th programmer, the average cannot be trusted, and, therefore, it is difficult to predict the number of completed tasks for the next week, which in turn complicates the planning procedure and adherence to work schedules. It doesn’t matter what management decision you make in this course. It is important that you receive an assessment on the basis of which you can make appropriate management decisions.

    Thus, a general conclusion can be drawn that the average does not always evaluate the data correctly. The correctness of the average estimate can be judged by the value of the standard deviation.

    Methods for statistical processing of experimental results are mathematical techniques, formulas, methods of quantitative calculations, with the help of which indicators obtained during an experiment can be generalized, brought into a system, revealing hidden patterns in them.

    We are talking about patterns of a statistical nature that exist between the variables studied in the experiment.

    Data are the basic elements to be classified or categorized for the purpose of processing 26 .

    Some of the methods of mathematical-statistical analysis make it possible to calculate the so-called elementary mathematical statistics that characterize the sample distribution of data, for example:

    Sample mean,

    Sample variance,

    Median and a number of others.

    Other methods of mathematical statistics make it possible to judge the dynamics of changes in individual sample statistics, for example:

    Analysis of variance,

    Regression analysis.

    Using the third group of sampling data methods, one can reliably judge the statistical relationships that exist between the variables that are studied in this experiment:

    Correlation analysis;

    Factor analysis;

    Comparison methods.

    All methods of mathematical and statistical analysis are conventionally divided into primary and secondary 27 .

    Primary methods are those that can be used to obtain indicators that directly reflect the results of measurements made in an experiment.

    Secondary methods are called statistical processing methods, with the help of which, on the basis of primary data, statistical patterns hidden in them are revealed.

    Primary methods of statistical processing include, for example:

    Determination of sample average;

    Sample variance;

    Selective fashion;

    Sample median.

    Secondary methods usually include:

    Correlation analysis;

    Regression analysis;

    Methods for comparing primary statistics in two or more samples.

    Let's consider methods for calculating elementary mathematical statistics, starting with the sample average.

    Arithmetic mean – it is the ratio of the sum of all data values ​​to the number of terms 28.

    The average value as a statistical indicator represents the average assessment of the psychological quality studied in the experiment.

    This assessment characterizes the degree of its development as a whole in the group of subjects who were subjected to a psychodiagnostic examination. By directly comparing the average values ​​of two or more samples, we can judge the relative degree of development of the assessed quality in the people making up these samples.

    The sample mean is determined using the following formula 29:

    where x cf is the sample average or the arithmetic mean of the sample;

    n is the number of subjects in the sample or private psychodiagnostic indicators on the basis of which the average value is calculated;

    x k - private values ​​of indicators for individual subjects. There are n such indicators in total, so the index k of this variable takes values ​​from 1 to n;

    ∑ is the sign accepted in mathematics for summing the values ​​of those variables that are to the right of this sign.

    Dispersion is a measure of the spread of data relative to the mean value of 30.

    The greater the variance, the greater the deviation or spread of the data. It is determined so that it is possible to distinguish from each other values ​​that have the same average, but different scatter.

    The variance is determined by the following formula:

    where is the sample variance, or simply variance;

    An expression meaning that for all x k from the first to the last in a given sample, it is necessary to calculate the differences between the partial and average values, square these differences and sum them up;

    n is the number of subjects in the sample or primary values ​​from which the variance is calculated.

    Median is the value of the characteristic being studied, which divides the sample, ordered by the value of this characteristic, in half.

    Knowing the median is useful in order to determine whether the distribution of partial values ​​of the studied characteristic is symmetrical and approximating the so-called normal distribution. The mean and median for a normal distribution are usually the same or very little different from each other.

    If the sample distribution of features is normal, then methods of secondary statistical calculations based on the normal distribution of data can be applied to it. Otherwise, this cannot be done, as serious errors may creep into the calculations.

    Fashion another elementary mathematical statistics and characteristic of the distribution of experimental data. Mode is the quantitative value of the characteristic being studied, which is most often found in the sample.

    For symmetric distributions of features, including the normal distribution, the mode values ​​coincide with the values ​​of the mean and median. For other types of distributions, asymmetrical, this is not typical.

    The method of secondary statistical processing, through which the connection or direct dependence between two series of experimental data is determined, is called correlation analysis method. It shows how one phenomenon influences or is related to another in its dynamics. Dependencies of this kind exist, for example, between quantities that are in cause-and-effect relationships with each other. If it turns out that two phenomena are statistically significantly correlated with each other, and if there is confidence that one of them can act as a cause of the other phenomenon, then the conclusion that there is a cause-and-effect relationship between them definitely follows.

    There are several varieties of this method:

    Linear correlation analysis allows you to establish direct connections between variables based on their absolute values. These connections are graphically expressed by a straight line, hence the name “linear”.

    The linear correlation coefficient is determined using the following formula 31:

    where r xy - linear correlation coefficient;

    x, y - average sample values ​​of the compared values;

    X i ,y i - private sample values ​​of compared quantities;

    P - the total number of values ​​in the compared series of indicators;

    Dispersions, deviations of compared values ​​from average values.

    Rank correlation determines the dependence not between the absolute values ​​of variables, but between the ordinal places, or ranks, occupied by them in a series ordered by value. The formula for the rank correlation coefficient is as follows 32:

    where R s is the Spearman rank correlation coefficient;

    d i - the difference between the ranks of indicators of the same subjects in ordered series;

    P - the number of subjects or digital data (ranks) in correlated series.

    The purpose of the lesson:
    - creating conditions for mastering the topic at the level of comprehension and primary memorization;
    - for the formation of mathematical competence of the student’s personality.

    Lesson Objectives
    Educational: form an idea of ​​statistics as a science; familiarize students with the concepts of basic statistical characteristics; develop the ability to find the range and mode of a series, analyze data, and improve mental calculation skills.
    Educational: promote mastery of concepts and their interpretation; development of supra-subject skills of analysis, comparison, systematization and generalization; continue the formation of subject language, promote the formation of key competencies (cognitive, informational, communicative) at various stages of the lesson, promote the formation of a unified scientific picture of the world among students by identifying interdisciplinary connections between statistics and various sciences.
    Educational: cultivate interest in the subject being studied, information culture; readiness to comply with generally accepted norms and rules, high efficiency and organization.

    Technologies used: Technology of student-centered learning, information and communication technologies.
    Necessary equipment, materials: multimedia projector, computer, interactive whiteboard.

    During the classes

    1. Organizational moment.

    Checking students' readiness for class

    Checking Attendance

    2. Goal setting.

    Rationale for the need to study this topic

    Involving students in the process of setting lesson goals

    From what sources do we receive and collect information? (Suggested answers: radio, television, newspapers, magazines, telephone, people, Internet, letters).

    Where do people store information? (Suggested answers : in memory and on external media).
    Is studying at a technical school about obtaining information? At school you studied general education subjects, but when you study at a technical school, what else do you get? (Suggested answer: s professional knowledge). The more we learn, the more information our memory contains.

    Today I offer you another piece of information. You are trained as a mining operator, you will work on EKG-8I excavators. What is the performance of this excavator. At my request, the plant provided me with the following information. (Excavator performance - table)

    By waste rock (thousand tons)

    Guys, is a lot of information good? Can all information be useful and of high quality? What should we be able to do so as not to get lost in the maze of information? (Students’ expected answer: “Must be able to separate useful, high-quality information from low-quality information”). Those. be able to process it.

    CONCLUSION: today in lesson we will learn to process information.

    3. Organization of activities to study new material.(students make notes in notebooks and complete assignments during the explanation process)

    1. Definition of statistics

    What are statistics? It is said that the English Prime Minister Benjamin Disraeli (1804 - 1881) answered this question as follows: “There are three types of lies: lies, damned lies and statistics.”

    Statistics is an exact science that studies methods of collecting, analyzing and processing data that describe mass actions, phenomena and processes.

    (An excerpt from the novel “The Twelve Chairs” by Ilf and Petrov is read out.

    “Statistics knows everything”, it is known how much food the average citizen of the republic eats per year: it is known how many hunters, ballerinas: machines, bicycles, monuments, lighthouses and sewing machines in the country: How much life, full of ardor, passions and thoughts, looks at us from statistical tables!..”

    Its name comes from the Latin word “status” - state, from this root the words stato (Italian), statistik (German), state (English) - state.

    Statistics studies:

    The main goals of studying the elements of statistics:

    • the number of individual population groups of the country and its regions,
    • production and consumption of various types of products,
    • transportation of goods and passengers by various modes of transport,
    • natural resources and much more.

    Do you know in which country statistical practice began (in China); the country's first population censuses date back to the 5th century. II millennium BC

    In the 19th century, it became possible to process data using formulas, mathematical laws and special characteristics. This?.... ( mat. statistics).

    2. Math statistics

    Math statistics is a branch of mathematics that studies methods of collecting, systematizing and processing the results of observations of random mass phenomena in order to identify existing patterns.

    So why did Disraeli compare statistics to lies? (There was no scientific, rigorous processing of information; the data was interpreted by anyone as they wished).

    Mathematical statistics has universal methods of information processing
    This is what allowed the heroes of the film “Office Romance” to say the following words about statistics ( fragment of the film "Office Romance").
    CONCLUSION: Statistics bring information into the system.

    3. Graphical representation of information

    Distribution polygon

    Distribution histogram

    Pie chart

    4. Measurement characteristics
    1. A series of data is a series of results of any measurements.

    For example: 1) measuring human height

    2) Human (animal) weight measurements

    3) Meter readings (electricity, water, heat...)

    4) Results in the 100-meter dash

    2. Volume of a data series - the volume of a data series is the amount of all data.

    For example: given a series of numbers 1; 3; 6; -4; 0

    its volume will be equal to 5. Why?

    3. The range of a data series is the difference between the largest and smallest numbers from a data series.

    For example: if given a series of numbers 1; 3; 6; -4; 0; 2, then scope this data series will be equal to 6 (since 6 - 0 = 6)

    4. Mode of a data series - the mode of a data series is the number of the series that occurs most often in this series.

    For example: p data poison may or may not have a mode.

    Thus, in the data series 47, 46, 50, 52, 47, 52, 49, 45, 43, 53, each of the numbers 47 and 52 occurs twice, and the remaining numbers less than twice. In such cases, it was agreed that the series has two modes: 47 and 52.

    5. Median of the series

    A median with an odd number of terms is the number written in the middle.

    Median with an even number of terms - this is the arithmetic mean of the two numbers written in the middle.

    For example: determine the median of a series of numbers

    16; -4; 5; -2; -3; 3; 3; -2; 3. Answer: -3

    2) -1; 0; 2; 1; -1; 0;2; -1. Answer: 0

    6. The arithmetic mean is the quotient of dividing the sum of the numbers in a series by their number.

    For example: given a series of numbers -1; 0; 2; 1; -1; 0; 2; -1. Then the arithmetic mean will be equal to: (-1+0+2+1+(-1)+0+2+(-1)): 8 = 2: 8 = 0.25

    4. Consolidation of the studied material.

    Practical work

    Exercise: characterize the performance of student Peter Ivanov in mathematics for the fourth quarter.

    Completing of the work:

    1. Collection of information:

    The grades written out from the magazine are: 5,4,5,3,3,5,4,4,4.

    2. Processing of received data:

    Lecture 12. Methods for statistical processing of results.

    Methods of statistical processing of results are called mathematical techniques, formulas, methods of quantitative calculations, with the help of which indicators obtained during an experiment can be generalized, brought into a system, revealing patterns hidden in them. We are talking about patterns of a statistical nature that exist between the variables studied in the experiment.

    1. Methods for primary statistical processing of experimental results

    All methods of mathematical and statistical analysis are conventionally divided into primary and secondary. Primary methods are those that can be used to obtain indicators that directly reflect the results of measurements made in an experiment. Accordingly, by primary statistical indicators we mean those that are used in the psychodiagnostic methods themselves and are the result of the initial statistical processing of the psychodiagnostic results. Secondary methods are called statistical processing methods, with the help of which, on the basis of primary data, statistical patterns hidden in them are revealed.

    Primary methods of statistical processing include, for example, determining the sample mean, sample variance, sample mode and sample median. Secondary methods usually include correlation analysis, regression analysis, and methods for comparing primary statistics in two or more samples.

    Let's consider methods for calculating elementary mathematical statistics.

    Fashion They call the quantitative value of the characteristic being studied, which is most often found in the sample.

    Median is the value of the characteristic being studied, which divides the sample, ordered by the value of this characteristic, in half.

    Sample mean(arithmetic mean) value as a statistical indicator represents the average assessment of the psychological quality studied in the experiment.

    Scatter(sometimes this value is called the range) of the sample is denoted by the letter R. This is the simplest indicator that can be obtained for the sample - the difference between the maximum and minimum values ​​of this particular variation series.

    Dispersion is the arithmetic mean of the squared deviations of the values ​​of a variable from its mean value.

    2. Methods for secondary statistical processing of experimental results

    With the help of secondary methods of statistical processing of experimental data, hypotheses associated with the experiment are directly tested, proven or disproved. These methods, as a rule, are more complex than methods of primary statistical processing, and require the researcher to be well trained in elementary mathematics and statistics.

    The group of methods under discussion can be divided into several subgroups:

    1 Regression calculus

    Regression calculus is a method of mathematical statistics that allows you to reduce private, scattered data to some linear graph that approximately reflects their internal relationship, and to be able to approximately estimate the probable value of another variable based on the value of one of the variables.

    2.Correlation

    The next method of secondary statistical processing, through which the connection or direct dependence between two series of experimental data is determined, is called the method of correlations. It shows how one phenomenon influences or is related to another in its dynamics. Dependencies of this kind exist, for example, between quantities that are in cause-and-effect relationships with each other. If it turns out that two phenomena are statistically significantly correlated with each other, and if there is confidence that one of them can act as a cause of the other phenomenon, then the conclusion that there is a cause-and-effect relationship between them definitely follows.

    3 Factor analysis

    Factor analysis is a statistical method that is used when processing large amounts of experimental data. The objectives of factor analysis are: reducing the number of variables (data reduction) and determining the structure of relationships between variables, i.e. classification of variables, so factor analysis is used as a data reduction method or as a structural classification method.

    Review questions

    1.What are statistical processing methods?

    2.What subgroups are secondary methods of statistical processing divided into?

    3.Explain the essence of the correlation method?

    4. In what cases are statistical processing methods used?

    5. How effective do you think is the use of statistical processing methods in scientific research?

    2. Consider the features of statistical data processing methods.

    Literature

    1.. Gorbatov D.S. Workshop on psychological research: Proc. allowance. - Samara: "BAKHRAH - M", 2003. - 272 p.

    2. Ermolaev A.Yu. Mathematical statistics for psychologists. - M.: Moscow Psychological and Social Institute: Flinta, 2003.336p.

    3. Kornilova T.V. Introduction to psychological experiment. Textbook for universities. M.: CheRo Publishing House, 2001.