Lesson 2.5: Data Analysis: Descriptive Statistics
By analyzing students’ performance, you will obtain information to improve both tests and instruction, to determine grades or the passing score on the test, to determine whether the test was too easy or too hard, and to evaluate the effectiveness of instruction. For example, if many scores are low, such scores may reflect students’ effort and learning abilities. However, they may also indicate that the instruction was ineffective or that the test was too difficult. Now that you have access to multiple forms of data from standard and teachermade tests, you organize and summarize them so that they provide useful information. Data can be organized through tables such as in a frequency distribution, and data can be presented in a visual format through the use of graphs and charts such as a histogram, frequency polygon or a scatterplot. Statistical methods are used to summarize and describe data. One way to summarize a set of scores is to look at the measures of central tendency (mean, median, and mode). Another method to describe how much scores are different from one another is by examining measures of variability (range and standard deviation). Frequency DistributionIn statistics, the term “frequency” refers to the number of times each score or event occurs. We are usually interested in frequency as it relates to the number of students obtaining each score on a test. One way to record frequencies is in a frequency distribution table, where each score is listed in a column on the left side and the frequency with which it occurred is listed on the right. While a frequency table helps to organize data, it does not provide a great deal of descriptive information about the scores. Frequency distributions provide the initial organization and information that is the starting point for many other statistical methods. Example: Mr. Walker wants to examine the general performance trends among his eighth grade language arts students in order to evaluate student learning and his own instruction. He gives a midterm language arts exam and obtains the following scores for his students:
To evaluate those scores, he creates a frequency distribution table to organize the test data. The obtained scores are listed in order from highest to lowest. Frequency Distribution
Graphic Displays of Distributions The data from a frequency table can be displayed graphically. A graph can provide a visual display of the distributions, which gives us another view of the summarized data. For example, the graphic representation of the relationship between two different test scores through the use of scatter plots. We learned that we could describe in general terms the direction and strength of the relationship between scores by visually examining the scores as they were arranged in a graph. Some other examples of these types of graphs include histograms and frequency polygons. A histogram is a bar graph of scores from a frequency table. The horizontal xaxis represents the scores on the test, and the vertical yaxis represents the frequencies. The frequencies are plotted as bars. Histogram of MidTerm Language Arts Exam
A frequency polygon is a line graph representation of a set of scores from a frequency table. The horizontal xaxis is represented by the scores on the scale and the vertical yaxis is represented by the frequencies. Frequency Polygon of MidTerm Language Arts Exam
A frequency polygon could also be used to compare two or more sets of data by representing each set of scores as a line graph with a different color or pattern. For example, you might be interested in looking at your students’ scores by gender, or comparing students’ performance on two tests (see Figure 9.4). Frequency Polygon of Midterm by Gender One way to summarize your data is to look at the measures of central tendency: mean, median, and mode. Mean (or arithmetic average) The mean is the average performance level of a group of students. It is obtained by taking the sum of a set of scores and dividing by the total number of scores. The mean can be distorted if there are some scores that are extremely different (outliers) from the mean of the majority of scores for the group. Consequently, the median is the most descriptive measure of central tendency. Median The median is the point in the distribution that splits the scores in two equal groups, which is also known as the midpoint of a distribution, or the 50th percentile. To calculate the median, organize the raw scores in rank order. The median is the middle value on the scale that divides the number of scores into equal halves, if the number of scores is odd. When the number of scores is even, the median is calculated as the average of the two middle scores. ModeThe mode is the most frequently occurring score in a distribution. There are no mathematical calculations needed for the mode. Once the data are organized in a frequency distribution format, the mode can be identified. In some cases there may be more than one mode in a distribution if two or more scores share the highest frequency. A set of scores with two modes is called bimodal; those with more than two are called multimodal. The following is a stepbystep demonstration of calculating the measures of central tendency. Mr. Walker creates a table to display the range of scores obtained by his students: Calculating Mean, Median, and Mode
Representation of the Mean, Median, and Mode on a Curve When representing a standardized test score, they will normally be distributed and have a symmetrical distribution. Notice that in this distribution the mean, median, and mode are equal. The mean is also the number that divides the scores into two equal groups and is the score that occurs most frequently. The shape of a distribution of your test scores can provide useful clues about your test and your students’ performance. When representing students’ scores on a graph, the scores often will be positively or negatively skewed. When the distribution is positively skewed, that implies that the most frequent scores (the mode) and the median are below the mean. If your test is very difficult, there may be many low scores and few high ones. The distribution of scores would have a shape similar to the one depicted below that is positively skewed. When the tail points to the left, the distribution is negatively skewed. In this distribution there are high scores and relatively few low scores. Notice that the mean is influenced by the skewing.
The mean can be distorted if there are some scores that are extremely different (outliers) from the mean of the majority of scores for the group. Consequently, the median is the most descriptive measure of central tendency. Indicators of Variability Variability is the dispersion of the scores within a distribution. Given a test, a group of students with a similar level of performance on a specific skill tend to have scores close to the mean. Another group with varying levels of performance will have scores widely spread and further from the mean. In other words, how varied are the scores? Two common measures of variability are the range and standard deviation. RangeThe range, R, is the difference between the lowest and the highest scores in a distribution. The range is easy to compute and interpret, but it only indicates the difference between the two extreme scores in a set. If we use the scores from Mr. Walker’s class (above), we would calculate the range as: Range (R) = the highest score – the lowest score in the distribution.
R = 100  22 = 78, so the range is 78. Standard DeviationA more useful statistic than simply knowing the range of scores would be to see how widely dispersed different scores are from the mean. The most common measure of variability is the standard deviation (SD). The standard deviation is defined as the numeric index that describes how far away from the mean the scores in the distribution are located. The formula for the standard deviation is:
The higher the standard deviation, the wider the distribution of the scores is around the mean. This indicates a more heterogeneous or dissimilar spread of raw scores on a scale. A lower value of the standard deviation indicates a narrower distribution (more similar or homogeneous) of the raw scores around the mean. Example: Table 5.1.6
More information about data analysis is available at:


Continue to Lesson 3.1 by using the next button below. The Development of the training was funded through the Florida Department of Education, Copyright © 2005 NEFEC except where noted. All rights reserved. Unauthorized use prohibited. 