Welcome to this Statistics-1 Week 2 Summary post. It’s not an explanatory article, hope you keep that in mind. Once you are done with your lessons, this post should come in very handy though. Although the best notes are the ones you make yourself, if you are short of time, you can use this. Or you can use this as template to prepare your own summary- that would be the best thing. So go on, and if you find any errors, do let me know in the comment below.
Before you move on to week 2 though, you might want to brush up your week-1 first by heading over to this post : – Week 1 Stats 1 Summary.
Happy Learning!
Week 2
Statistics 1
Summary
- A frequency distribution of qualitative data is a listing of the distinct values and their frequencies.
- Frequency means count.
- Each row for frequency table lists a category along with the number of cases in this category.
- Use of tally marks to create a frequency table.
- Relative frequency of category is ratio frequency of category to the total frequency of categorical variable.
- Sum total of a relative frequency always adds up to one.
- Relative frequency is a good standard for comparison since even though two different datasets might be completely different, yet the relative frequency might be same suggesting they are not so different.
- Two data sets having the same frequency distribution will always have the same relative frequency, however two datasets having the same relative frequency may or may not have the same frequency distribution.
- Bar chart and pie chart display frequency table of categorical variable to describe it.
- A pie chart is a circle divided into pieces or wedges proportional to the relative frequencies of the qualitative data.
- The share of a pie in the chart = Relative frequency X 360
- A pie chart displays the share/proportions of a particular category.
- Pie charts are useful when the objective is to compare parts of a whole.
- Pie charts are an effective way to show that one category makes up more than half of the total.
- Bar chart displays the distinct values of the qualitative/categorical data. The categories on a horizontal axis and the relative frequencies or frequency percent or count of those values on a vertical axis.
- The frequency or relative frequency of each distinct value is directly proportional to the height of the vertical bars.
- Each bar should be separate, i.e., not touching each other and must be of equal width.
- A bar chart can be vertical as well as horizontal.
- When the categories in a bar chart are sorted by frequency the bar chart is sometimes called a Pareto chart.
- Pareto charts are popular in quality control to identify problems in a business process.
- If the categorical variable is ordinal, then the bar chart must preserve the ordering. Pareto charts should be avoided for variables with ordinal scale
- Bar charts are used to compare things between separate groups.
- All charts must be properly labelled along with an essential legend.
- If there is a substantial number that comes from the smaller categories a good idea is to club them together instead of ignoring them.
- Display of data must obey a fundamental rule called the area principle which states that the area occupied by a part of the graph should correspond to the amount of data it represents.
- Violation of area principle leads to misleading communication of data.
- An infographic is decorated chart to attract attention but there is no baseline and does not convey any proper information.
- Omitting baselines or the axis of a graph is one of the most common ways to manipulate graphs known as truncated graphs. This misleading tactic makes one group look better than the another.
- Truncated graphs and infographics violate the integrity of data.
- Truncated graphs can be produced by introducing a Y-axis break and mentioning the start point.
- Expanding or compressing the scale on a graph that can make changes in the data seem less significant than they are -is known as manipulation of Y axis.
- Rounding off should be avoided as it introduces errors. The total may add to a value slightly different from 100% and thus introduce Round-Off errors thus violating area principle.
- The mode of a categorical variable: nominal and ordinal both, is the most common category, the category with the highest frequency.
- Mode is represented by the longest bar in a bar chart, the widest slice in a pie chart and the first bar in pareto chart.
- If two or more categories tie for the highest frequency the data are set to be bimodal or multimodal accordingly.
- If no value occurs more than once, then the data set has no mode.
- If a constant C is added to every observation, New mode = Old mode + C.
- If a constant C is multiplied with every observation, New mode = Old mode X C.
- Median of an ordinal variable is a category of the middle observation of the sorted values. It is the number that divides the bottom 50% of the data from the top 50%.
- If the data has odd number of observations, then [(n+1)/2]th observation is the median.
- If the data has even number of observations, then observation corresponding to mean of n/2 and [(n/2)+1] is the median.
- Median is that observation which divides a data set into exactly 2 halves.
- Median is not defined for nominal variables because they have no order.
- The sample mean is sensitive to outliers whereas the sample median is not.
- Median of a data set may or may not be a member of the given data set.
- If a constant C is added to every observation, New Median = Old Median + C
- If a constant C is multiplied with every observation, New Median = Old Median X C
- Scale down each observation by x% ====> Each observation is Multiplied by x/100
For other summaries, check here: –
See ya!
Peace!
✌🏻