It's been a decades-long experiment to throw darts at a dartboard and see if it outperforms the picks of experts. Darts win and lose at the same rate as the average of experts.

Some people clearly make money in financial markets but the old saying among brokers = 'we make money selling stocks, not buying them' - still applies.

Yet someone must know what they are doing. And backtesting is a good way to seem empirical. Example: Your financial advisor calls you up to suggest a new

investment scheme. Drawing on 20 years of data, he has set his

computer to work on this question: If you had invested according to

this scheme in the past, which portfolio would have been the best?

His model assembles thousands of such simulated portfolios and

calculated for each one an industry-standard measure of return on

risk. Out of this gargantuan calculation, your advisor has chosen the

optimal portfolio. After briefly reminding you of the oft-repeated

slogan that "past performance is not an indicator of future results",

the advisor enthusiastically recommends the portfolio, noting that it

is based on sound mathematical methods. Should you invest?

If that isn't a metaphor for any numerical model-driven projection, it's hard to know what is.

The answer is, probably not. This backtesting - examining a huge

number of sample past portfolios - might seem

like a good way to zero in on the best future portfolio, but if the

number of portfolios in the backtest is so large as to be out of

balance with the number of years of data in the backtest, the

portfolios that look best are actually just those that target extremes

in the dataset. When an investment strategy "overfits" a backtest in

this way, the strategy is not capitalizing on any general financial

structure but is simply highlighting vagaries in the data.

"Recent computational advances allow investment managers to

methodically search through thousands or even millions of potential

options for a profitable investment strategy," the authors write. "In

many instances, that search involves a pseudo-mathematical argument

which is spuriously validated through a backtest."

Unfortunately, the overfitting of backtests is commonplace not only in

the offerings of financial advisors but also in research papers in

mathematical finance. One way to lessen the problems of backtest

overfitting is to test how well the investment strategy performs on

data outside of the original dataset on which the strategy is based;

this is called "out-of-sample" testing. However, few investment

companies and researchers do out-of-sample testing.

The design of an investment strategy usually starts with identifying a

pattern that one believes will help to predict the future value of a

financial variable. The next step is to construct a mathematical

model of how that variable could change over time. The number of ways

of configuring the model is enormous, and the aim is to identify the

model configuration that maximizes the performance of the investment

strategy. To do this, practitioners often backtest the model using

historical data on the financial variable in question.

They also rely

on measures such as the "Sharpe ratio", which evaluates the

performance of a strategy on the basis of a sample of past returns.

But if a large number of backtests are performed, one can end up

zeroing in on a model configuration that has a misleadingly good

Sharpe ratio. As an example, the authors note that, for a model based

on 5 years of data, one can be misled by looking at even as few as 45

sample configurations. Within that set of 45 configurations, at least

one of them is guaranteed to stand out with a good Sharpe ratio for

the 5-year dataset but will have a dismal Sharpe ratio for

out-of-sample data.

The authors note that, when a backtest does not report the number of

configurations that were computed in order to identify the selected

configuration, it is impossible to assess the risk of overfitting the

backtest. And yet, the number of model configurations used in a

backtest is very often not revealed---neither in academic papers on

finance, nor by companies selling financial products.

"[W]e suspect

that a large proportion of backtests published in academic journals

may be misleading," the authors write. "The situation is not likely

to be better among practitioners. In our experience, overfitting is

pathological within the financial industry." Later in the article

they state: "We strongly suspect that such backtest overfitting is a

large part of the reason why so many algorithmic or systematic hedge

funds do not live up to the elevated expectations generated by their

managers."

Probably many fund managers unwittingly engage in backtest overfitting

without understanding what they are doing, and their lack of knowledge

leads them to overstate the promise of their offerings. Whether this

is fraudulent is not so clear. What is clear is that mathematical

scientists can do much to expose these problematic practices---and

this is why the authors wrote their article.

"[M]athematicians in the

twenty-first century have remained disappointingly silent with regard

to those in the investment community who, knowingly or not, misuse

mathematical techniques such as probability theory, statistics, and

stochastic calculus," they write. "Our silence is consent, making us

accomplices in these abuses."