What is a Scatter Plot?
A scatter plot is a chart type that is normally used to observe and visually display the relationship between variables. The values of the variables are represented by dots. The positioning of the dots on the vertical and horizontal axis will inform the value of the respective data point; hence, scatter plots make use of Cartesian coordinates to display the values of the variables in a data set. Scatter plots are also known as scattergrams, scatter graphs, or scatter charts.
- A scatter plot is a chart type that is normally used to observe and visually display the relationship between variables. It is also known as a scattergram, scatter graph, or scatter chart.
- The data points or dots, which appear on a scatter plot, represent the individual values of each of the data points and also allow pattern identification when looking at the data holistically.
- The most common use of the scatter plot is to display the relationship between two variables and observe the nature of such a relationship. The relationships observed can either be positive or negative, non-linear or linear, and/or, strong or weak.
Scatter Plot Applications and Uses
1. Demonstration of the relationship between two variables
The most common use of the scatter plot is to display the relationship between two variables and observe the nature of the relationship. The relationships observed can either be positive or negative, non-linear or linear, and/or, strong or weak. The data points or dots, which appear on a scatter plot, represent the individual values of each of those data points and also allow pattern identification when looking at the data holistically.
2. Identification of correlational relationships
Another common use of scatter plots is that they enable the identification of correlational relationships. Scatter plots tend to have independent variables on the horizontal axis and dependent variables on the vertical axis. It allows the observer to know or get an idea of what the possible vertical value may be, provided there is information on the horizontal value.
3. Identification of data patterns
Data pattern identification is also possible with scatter plots. Data points can be grouped together based on how close their values are, and this also makes it easy to identify any outlier points when there are data gaps.
Seeing as scatter plots aid in the identification of correlations between variables, the nature of the correlations can also be estimated based on a specific confidence level.
- Positive correlation depicts a rise, and it is seen on the diagram as data points slope upwards from the lower-left corner of the chart towards the upper-right.
- Negative correlation depicts a fall, and this is seen on the chart as data points slope downwards from the upper-left corner of the chart towards the lower-right.
- Data that is neither positively nor negatively correlated is considered uncorrelated (null).
Also, through the use of a “Line of Best Fit” or a trendline, scatter plots to help identify trends. Following the best-fit framework, an equation can be derived in conjunction with the relationship that exists between the variables. Linear regression is part of the best-fit framework and is used for linear correlations.
Creating a Scatter Plot Diagram
The scatter plot diagram for the data above is seen below:
To create a scatter plot diagram similar to the one above, the following steps can be taken in Excel:
- Firstly, all the data should be recorded in Excel, as seen in the image above with the title “Raw Data.”
- Secondly, the data range should be selected – i.e., Series 1 and Series 2 in our example.
- Next, on the “Insert” tab on the Excel ribbon, click onto the scatter plot symbol as seen below:
- Followed by the Scatter selection:
The chart will be generated, and the heading and visual presentation can be adjusted in accordance with preference.
Challenges with Using Scatter Plots
Two common issues have been identified with the use of scatter plots – overplotting and the interpretation of causation as correlation.
Overplotting occurs when there are too many data points to plot, which results in the overlapping of different data points. It can make relationship identification between variables challenging.
Concerning correlation, it is important to remember that correlation does not mean that the changes observed in one variable are responsible for the changes observed in another variable. Correlation should not be interpreted as causation. Causation implies that an event occurring will have an impact on an outcome.
CFI is the official provider of the Business Intelligence & Data Analyst (BIDA)® certification program, designed to transform anyone into a world-class financial analyst.
In order to help you become a world-class financial analyst and advance your career to your fullest potential, these additional resources will be very helpful: