Statistics-1 Week 3 Summary

Welcome to this Statistics-1 Week 3 Summary post. It’s not an explanatory article, hope you keep that in mind. Once you are done with your lessons, this post should come in very handy though. Although the best notes are the ones you make yourself, if you are short of time, you can use this. Or you can use this as template to prepare your own summary- that would be the best thing. So go on, and if you find any errors, do let me know in the comment below.

Before you move on to week 3 though, you might want to brush up your week-1 and 2 first by heading over to these posts : – Week 1 Stats 1 Summary and Week 2 Stats 1 Summary

Happy Learning!

Week 3

Statistics 1

Summary

  • To summarize numerical data first group the observations into classes or categories or bin and then treat the classes as distinct values of qualitative data.
  • Organize continuous data into several classes, preferably between 5 to 20.
  • Each observation should belong to some class and no observation should belong to more than one class.
  • It is common although not essential to choose class intervals of equal length.
  • Each class interval has a lower-class limit and upper-class limit.
  • Class mark is the average of the two class limits.
  • Class width is the difference of the two class limits.
  • A class interval contains the left end but not its right end boundary point. This is by convention.
  • The horizontal axis contains the classes, and the vertical axis contains the frequencies or relative frequencies.
  • Histograms are used to summarize grouped data from a frequency table.
  • Since class intervals are equal and continuous there is no gap between the bars in a histogram — which is what differentiates it from bar chart.
  • Class interval in Google Sheets is called bucket size.
  • In a stem and leaf diagram or stem plot each observation is separated into two parts – namely a stem consisting of all but the rightmost digit and a leaf– the right most digit.
  • E.g. Stem Plot for 15, 16 17 would be: – 1|5,6,7. 1- Stem, 5,6,7- Leaf
  • The leaves in a stem plot must be arranged in an ascending order.
  • Descriptive measures are quantities whose values are determined by the data. These are compact measures which describe the data set.
  • Most used descriptive measures are of two types: – 1) Measures of central tendency and 2) Measures of dispersion.
  • Measures of central tendency are those that indicate the most typical value or center of a data set. These include mean, median and mode
  • Measures of dispersion/variation/spread are those which indicate the variability or spread of a data set. These include range, variance, standard deviation, and Interquartile Range.
  • The most commonly used measure of central tendency is the mean or average which is the sum of observations divided by the number of observations.
  • For discrete observations:
Sample Mean and Population mean of discrete observations
  • The mean of a data set is extremely sensitive to outliers.
  • Outliers are numbers which are quite different from what the typical data set behaves like.
  • Mean calculated for continuous data using group data frequency table and taking midpoints gives an approximate average value and not the exact value of the mean.
  • If a constant C is added to every observation, New Avg = Old Avg + C
  • If a constant C is multiplied with every observation, New Avg = Old Avg X C
  • The units of average of a data set are the same as observations of the data set
  • The range of a data set is the difference between its largest and smallest values .it considers only the first and last observation; that is the maximum and minimum value.
  • Range is extremely sensitive to outliers.
  • Variance considers deviations of data value from a central value. It takes into account all the observations.
Formulae for Population Variance and Sample Variance
  • μ= Population Mean, S = Sample Variance, s = Standard Deviation
  • The Google sheet command for sample variance is VAR.S and the command for population variance is VAR.P
  • If a constant C is added to every observation, variance remains unchanged.
  • If a constant C is multiplied with every observation, New Variance = Old Variance X C2
  • Square root of variance is called standard deviation.
  • Units of variance are square of the units of original variable.
  • Units of standard Deviation are the same as the units of original data.
  • Google Sheet command for standard deviation is STDEV
  • Adding a constant does not change the variability and standard division of a set. If a constant C is added to every observation, Standard Deviation remains unchanged.
  • Multiplying with the constant changes the variability and standard deviation by a scalar multiple.
  • If a constant C is multiplied with every observation, New Standard Deviation = Old Standard Deviation X C
  • The sample 100 P percentile is that data value having the property that at least 100P percent of the data are less than or equal to it and at least 100 (1-P) percent of the data values are greater than or equal to it.
  • If two data values satisfy this condition, then the sample 100 P percentile is the arithmetic average of these values.
  • For example, 99th percentile would have 100 X 0.99 that is 99% of the data less than it, but one percent is greater than it.
  • The median is the 50th percentile.
  • If n/2 is not an integer, then the next higher integer for n/2 would give the 50th percentile for n observations.
  • In a set of n observations, pth percentile would be: – observation corresponding to next higher integer of np if np is not an integer OR Observation corresponding to position that is average of np and np+1 if np is an integer.
  •  Algorithm used by google sheet to calculate pth percentile is ordered data value corresponding to integer part of rank r=[p(n-1)] +1 ; then
Formula for Percentile
  • Xi = Observation corresponding to rank r; X i+1= Observation corresponding to rank +1
  • The sample 25th percentile = Q1 = First Quartile = Lower Quartile = 25% of data value is less than this
  • The sample 50th percentile = Q2 = Second Quartile = Median = 25% of data value is less than this but greater than first quartile. Another 25% of data value is more than this but lesser than upper quartile.
  • The sample 75th percentile = Q3 =Third Quartile = Upper Quartile = 25% of data value is larger than this. 
  • Quartiles break up a data set into four parts.
  • The Interquartile range, IQR, is the difference between the 1st and the 3rd quartiles; That is, IQR=Q3 – Q1
  •  Single Common element = [Sum of 1st m terms + Sum of last n terms]- Sum of all Terms.
  • The Five Number Summary (FNS) is a particularly good way of summarizing a data set and consists of five values the minimum, the lower quartile, the median, the upper quartile, and the maximum value.
  • The FNS is graphically represented using a Box Plot/Box & Whiskers Plot/ Candle Sticks Chart. 

For summaries of other weeks, check these: –

See ya!

Peace!

✌🏻

Subscribe

Sign up for our newsletter and stay up to date

*