Descriptive statistics are simple summaries of your data and measures and can be considered the cornerstone of statistical analysis. Before running most other types of statistical analysis (especially inferential statistics), researchers need to look at descriptive statistics for their data.
What are descriptive statistics used for?
Descriptive statistics are a set of methods used to describe and summarize the key features or characteristics of a dataset. Simply put, descriptive statistics help us consolidate large amounts of data into a concise and easy-to-use form. For example, if we have to examine differences in blood pressure for a sample of 1000 patients pre- and post-intervention, it’s very tedious to check each patient’s data individually. Instead, we can use descriptive statistics to summarize blood pressure levels of the entire sample for each time point. We then need to compare only 2 numbers, which makes our calculations simpler.
Another important use of descriptive statistics is to understand the distribution of your data – whether it is normally distributed or not. This is critical in determining the types of tests that you would use for inferential analysis, going forward. Choosing parametric vs non-parametric tests is dependent on the type of data and its distribution.
Finally, descriptive statistics can be used to conduct simple analysis and draw inferences, for example, measuring change pre and post-intervention.
Types of descriptive statistics
Let’s look at the different types of descriptive statistics and how to use them.
Measures of frequency: count, percentage, frequency
These are used when you want to show how often something happens or a response is given.
Example: Major adverse cardiovascular events were observed in 28/683 patients (4.09%).
Measures of central tendency: mean, median, mode
These are used when you want to show what is most common or typical of a set of data.
Example: Mean pain intensity scores were 2.6 for the intervention group and 4.9 for the control group.
Measures of variation: range, interquartile range, standard deviation
These are used to indicate how spread out your data are. Usually, they are reported alongside the measure of central tendency you’re using (mean with standard deviation, median with interquartile range)
Choosing and reporting descriptive statistics
While conducting your analyses or writing your research paper, here are five key considerations to keep in mind whenever you choose or report descriptive statistics.
- Check the type of data you are summarizing
Before summarizing the data by using any measure of central tendency, it’s important to first inspect the data. For example, the mean is suitable when the data appear normally distributed and without any extreme values or outliers. When the data is skewed, the values are ordinal/categorical, or there are outliers, the median is preferred. Since choosing the right descriptive statistics is so important for subsequent analyses, journals like JAMA specify this in their instructions to authors.
- Present measures of central tendency along with measures of variation
It’s not a good idea to report just means or medians of the data. Always, report them along with their respective measure of variation (standard deviation for mean, range, or interquartile range for median).
Also note that many style guidelines and journals, like the Journal of Rehabilitation Medicine, advise against using the ± symbol while reporting standard deviations because standard deviation by nature implies ±.
- Report standard deviation, not standard error of the mean
It’s advisable to report standard deviations along with means, because standard deviation is an actual measure of variation or dispersion in the data. Standard error of the mean tells readers how accurate the mean is with respect to the true mean of the population from which your sample was drawn. Therefore, it measures the precision of your mean but doesn’t give readers an idea about the spread or variability of your data. The SAMPL (Statistical Analyses and Methods in the Published Literature) guidelines explicitly discourage using standard error of the mean to describe the variability of a dataset.
- Give the baseline for any counts
When reporting count data, it’s a good idea to specify the denominator each time. For instance, you may start your study with a sample of 500 patients, but some might be lost to follow up or provide invalid data. Therefore, rather than saying “At follow up, 35 patients reported no improvement”, you can say “At follow up, 35/483 patients reported no improvement” to give readers a more accurate picture of your findings.
- Avoid using percentages when the denominator is very small
When the denominator is small, using percentages could give readers a distorted impression of your findings. For example, the sentence “The novel coronavirus variant was observed in 50% of the new cases” might create panic if the reader doesn’t know or overlooks the fact that the number of new cases is 7.
If you’re looking for an expert statistical analysis service to support you in choosing and analyzing descriptive statistics, book a conversation with our expert consultant today.