A trend factor
Any economic gains from using information over investment horizons?
Last updated
Any economic gains from using information over investment horizons?
Last updated
In this tutorial, we will implement the paper "", from Yufeng Han, Guofu Zhou and Yingzi Zhu. This paper was published in the Journal of Financial Economics (2016).
Unlike previous studies that examine short-term reversals (daily/monthly), momentum (6-12 months), and long-term reversals (3-5 years) separately, the authors construct a single factor that incorporates all three price trends using moving averages over different time horizons.
In the paper, the authors report that the trend factor earns an average return of 1.63% per month, significantly higher than short-term reversal (0.79%), momentum (0.79%), and long-term reversal (0.34%). It more than doubles the Sharpe ratios of existing factors.
During the 2007-2009 financial crisis, the trend factor earned +0.75% per month, while:
The market lost -2.03% per month.
The momentum factor lost -3.88% per month.
The short-term reversal factor lost -0.82% per month.
The long-term reversal factor barely gained 0.03%.
For a more comprehensive summary of the paper and my results, please check my article on quantitativo.com. Here, we will focus on the code implementation.
The study uses daily stock prices from January 2, 1926, to December 31, 2014, obtained from the Center for Research in Security Prices (CRSP).
In our replication, we will use daily stock prices from January 1, 1990 to January 1, 2025, obtained from Norgate data. Norgate provides a high-quality survivorship bias-free daily data for US stock market that is very affordable. For more information on how to acquire a Norgate data subscription, please check .
The paper explains that, to compute the trend factor, monthly moving average signals are calculated at the end of each month. So, first, let's create a fullcalendar
variable, which is a that holds the last trading day of each month:
Now, let's see what the paper says about the stock universe:
The dataset includes all domestic common stocks listed on NYSE, AMEX and NASDAQ;
The dataset excludes Close-end funds, REITs, unit trusts, ADRs and foreign stocks;
Price Filter: Stocks with prices below $5 at the end of each month are excluded;
Size Filter: Stocks in the smallest decile (based on NYSE breakpoints) are excluded.
These filters are applied to reduce noise and ensure liquidity, following the methodology used in Jegadeesh & Titman (1993) for constructing momentum strategies.
We will implement something close: we will only consider Russell 3000 current & past constituents. That should address the first, second and last bullets above. Considering stocks only when they were part of the index also ensures we are not adding survivorship bias. Finally, we will exclude stocks whenever their unadjusted closing price was below $5. Here's how that translates into code for a given symbol
:
This ensures stationarity and prevents biases from high-priced stocks.
The paper considers MAs of lag lengths 3-, 5-, 10-, 20-, 50-, 100-, 200-, 400-, 600-, 800- and 1,000-days. Let's see how this translates into code:
We are getting to the final steps of the data-gathering stage. Now, we must add the target variable that will be used to predict the monthly expected stock returns cross-sectionally.
In other words, we must compute the next month return for every date:
First, we gather the prices from a given symbol;
Next, we compute the normalized MAs for all lags;
Then, we select only the last day of every month and compute the next month's return;
Finally, we apply the size/price filters.
We already did (1), (2) and (4). Now, let's see how to do (3):
It's important to observe the correct use of the .shift(x)
operator.
So, when we run (df['Close'] / df['Close'].shift(1) - 1)
, we are computing the current month's return. After that, when we apply .shift(-1)
, this results in the next month's return, which is exactly what we need.
Now, let's put everything together into a method that retrieves data for a given symbol:
Let's see the data from the past 12 months of AAPL by running get_data('AAPL').tail(12)
:
We can now gather data for all stocks in the Russell 3000 universe:
The data
DataFrame is a table with approximately 800k rows and 12 columns that looks like this:
Great! We are ready to move to the next step: compute the trend factors. It's important to highlight how the data is organized:
In the first level of our index, we have the last day of each month;
In the second level of our index, we have all stocks in our universe for that particular date;
In the columns, we have the MAs computed with the prices up until that specific date for that specific stock, and the next month return for that particular stock.
where:
which is the average of the estimated loadings on the trend signals over the past 12 months.
Note that we do not include an intercept above because it is the same for all stocks in the same cross-section regression, and thus it plays no role in ranking the stocks.
Now, we are ready to construct the trend factor. We loop through all dates, performing the dot product between the estimated expected coefficients of the trend signals and the trend signals matrices:
And that's all there is to it. We get the following table, with approximately 800k rows and 4 columns:
In our last step, we sort all stocks into five portfolios by their expected returns. The portfolios are equal-weighted and rebalanced every month. The return difference between the quintile portfolio of the highest expected returns and the quintile portfolio of the lowest is defined as the return on the trend factor. Intuitively, the trend factor buys stocks that are forecasted to yield the highest expected returns (Buy High) and shorts stocks that are forecasted to yield the lowest expected returns (Sell Low).
Adding the quantiles is straightforward:
The code above produces the following table:
Now, we group by date:
The rets
DataFrame has the monthly returns and the number of stocks from each quantile, as we can see below:
Finally, we can plot the return of the trend factor:
If we reduce the exposure on the shorts to 0.5 instead of 1, we can get a better equity curve:
Finally, we can see monthly and annual returns with the following code:
This paper introduces a trend factor that synthesizes short-, intermediate-, and long-term price trends using moving averages, significantly outperforming traditional factors like momentum, short-term reversal, and long-term reversal. The trend factor provides higher returns, better risk-adjusted performance, and reduced crash risk, making it a valuable addition to both asset pricing models and portfolio construction strategies.
Implementing this approach in Python is an excellent exercise in quantitative finance and systematic trading. It allows practitioners to explore data handling, time-series analysis, and cross-sectional regressions using libraries such as Pandas, NumPy, and Statsmodels. Coding this methodology in Python is a practical way to deepen one’s understanding of factor-based investing and trend-following strategies.
Now, let's compute the moving averages (trend signals). Moving averages are computed at the end of each month using stock prices over different lag lengths. The moving average (MA) for stock with lag at month is defined as:
where is the closing price for stock on the last trading day of month , and is the lag length. Then, we normalize the moving average prices by the closing price on the last trading day of the month:
The last few lines organize the columns and the index. The index, in particular, is organized in a Multi-level indexing, which is a great Pandas feature to work with higher dimensional data. To more information about MultiIndex / advanced indexing, please check .
To predict the monthly expected stock returns cross-sectionally, we use a two-step procedure. In the first step, we run in each month a cross-section regression of stock returns on observed normalized MA signals to obtain the time-series of the coefficients on the signals:
return on stock in month
trend signal at the end of month on stock with lag
coefficient of the trend signal with lag in month
intercept in month
To do the regressions, we will use Python's . We loop through all dates, doing the cross-sectional regressions:
The code above is straightforward. It produces the coefs
DataFrame, a table with close to 400 rows and 12 columns with all coefficients:
The paper has the following important sentence: "It should be noted that only information in month or prior is used above to regress returns in month ." This is what the .shift(1)
operator in the last line of the last code block is for. Omitting that code would result in lookahead bias and results too good to be true.
We estimate the expected return for month from
where is our forecasted expected return on stock for month and is the estimated expected coefficient of the trend signal with lag and is given by
First, let's compute the matrix :