What is Data-Mining Bias?
Data-mining bias refers to an assumption of importance a trader assigns to an occurrence in the market that actually was a result of chance or unforeseen events. The data-mining bias, for many analysts, is considered an “insidious threat” because it can sneak up on traders and analysts alike during the research processes that lead traders and investors to make the plays they make in the market.
If data-mining bias isn’t identified and kept in check, at best, it leads to skewed results and a few unwise choices. At its worst, however, it can lead a trader or market analyst to develop and follow an entirely flawed trading strategy, which can spell financial disaster.
What is Data Mining?
Data mining is a time-honored process of research and analysis of substantial amounts of data or information. For traders and market analysts, data mining is the process by which movements in the market are tracked, patterns are identified, and potential turns or changes in market direction can be identified and acted upon. It is one of the most important processes that traders and analysts employ in order to make the most advantageous trades.
Data-mining bias creeps in slowly when anomalies or happenings in the market are given more weight or importance than they deserve. A trader may act on such a bias and get a negative result – either through a lack of desired profit or, worse, through the loss of his or her initial investment.
The truest threat with such bias is when one or more traders build their entire trading strategy and plan on misunderstood market occurrences, which often leads to substantial time and financial losses.
How Data-Mining Bias Develops
There are two primary culprits that lead to data-mining bias – two aspects that occur during a trader’s data-mining process.
The first aspect is the propensity for randomness within a dataset. When a trader looks at market data, the data set will inherently possess some randomness – outliers or movements that don’t necessarily fall in line with other market movements or happenings.
Traders sometimes fall into the trap of examining a single outlier and, because it seems out of place, make the determination that it deserves more weight than the other data in the set. Acting on such an observation may prove profitable, at least initially.
This is where the second issue of bias comes in; traders become biased to the fact that at some point, they acted on an outlier, and it proved fruitful. Unfortunately, it may lead them to conclude, therefore, that all outliers must hold a certain or high amount of importance.
The issue is also known as sequential comparison or sequential selection – choosing an outlier or a similar outlier over and over again, assuming that it holds the same type of significance as the first one. The reality is that the more outliers the trader selects or acts on, the lower and lower the probability of likelihood of significance the outlying data actually holds.
With technology being what it is today, traders and analysts are able to use a variety of tools and programs, meaning the information or datasets they can access is massive.
Possessing a lot of information can be good. However, the more data there is to mine, the more chance there is for data-mining bias to occur. It’s important for traders and analysts to be aware of the potential for bias and to keep their strategies in check before making any significant plays.
CFI is the official provider of the global Capital Markets & Securities Analyst (CMSA)® certification program, designed to help anyone become a world-class financial analyst. To keep learning and advancing your career, the additional CFI resources below will be useful: