Tutorials
  • Momentum & Trend-Following
    • Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks
    • A trend factor
Powered by GitBook
On this page
  • Why implementing this paper?
  • Data
  • Step 1: Cross-sectional regressions
  • Step 2: Expected returns
  • Trend factor
  • Conclusion
Export as PDF
  1. Momentum & Trend-Following

A trend factor

Any economic gains from using information over investment horizons?

PreviousApplying Deep Learning to Enhance Momentum Trading Strategies in Stocks

Last updated 4 months ago

In this tutorial, we will implement the paper "", from Yufeng Han, Guofu Zhou and Yingzi Zhu. This paper was published in the Journal of Financial Economics (2016).

Why implementing this paper?

Unlike previous studies that examine short-term reversals (daily/monthly), momentum (6-12 months), and long-term reversals (3-5 years) separately, the authors construct a single factor that incorporates all three price trends using moving averages over different time horizons.

In the paper, the authors report that the trend factor earns an average return of 1.63% per month, significantly higher than short-term reversal (0.79%), momentum (0.79%), and long-term reversal (0.34%). It more than doubles the Sharpe ratios of existing factors.

During the 2007-2009 financial crisis, the trend factor earned +0.75% per month, while:

  • The market lost -2.03% per month.

  • The momentum factor lost -3.88% per month.

  • The short-term reversal factor lost -0.82% per month.

  • The long-term reversal factor barely gained 0.03%.

For a more comprehensive summary of the paper and my results, please check my article on quantitativo.com. Here, we will focus on the code implementation.

Data

The study uses daily stock prices from January 2, 1926, to December 31, 2014, obtained from the Center for Research in Security Prices (CRSP).

In our replication, we will use daily stock prices from January 1, 1990 to January 1, 2025, obtained from Norgate data. Norgate provides a high-quality survivorship bias-free daily data for US stock market that is very affordable. For more information on how to acquire a Norgate data subscription, please check .

The paper explains that, to compute the trend factor, monthly moving average signals are calculated at the end of each month. So, first, let's create a fullcalendar variable, which is a that holds the last trading day of each month:

df = norgatedata.price_timeseries(
    '$SPX', 
    timeseriesformat='pandas-dataframe'
)[['Close']]
df['Date'] = df.index
df = df.groupby([df.index.year, df.index.month]).last().set_index('Date')
fullcalendar = df.index

Now, let's see what the paper says about the stock universe:

  • The dataset includes all domestic common stocks listed on NYSE, AMEX and NASDAQ;

  • The dataset excludes Close-end funds, REITs, unit trusts, ADRs and foreign stocks;

  • Price Filter: Stocks with prices below $5 at the end of each month are excluded;

  • Size Filter: Stocks in the smallest decile (based on NYSE breakpoints) are excluded.

These filters are applied to reduce noise and ensure liquidity, following the methodology used in Jegadeesh & Titman (1993) for constructing momentum strategies.

We will implement something close: we will only consider Russell 3000 current & past constituents. That should address the first, second and last bullets above. Considering stocks only when they were part of the index also ensures we are not adding survivorship bias. Finally, we will exclude stocks whenever their unadjusted closing price was below $5. Here's how that translates into code for a given symbol:

# Retrieve the pricing data
adjustment = norgatedata.StockPriceAdjustmentType.CAPITALSPECIAL
df = norgatedata.price_timeseries(
    symbol,
    timeseriesformat='pandas-dataframe',
    stock_price_adjustment_setting=adjustment
)[['Close', 'Unadjusted Close']]

# TODO: add the moving averages (trend signals)

# Get only the last trading day of the month
df['Date'] = df.index
df = df.groupby([df.index.year, df.index.month]).last().set_index('Date')

# TODO: add the target variable

# Instead of using the size filter (we don't have market cap data), we will 
# approximate it by considering only stocks that belong to Russell 3000
idx = norgatedata.index_constituent_timeseries(
    symbol, 
    'Russell 3000', 
    timeseriesformat='pandas-dataframe'
)['Index Constituent']
calendar = idx[idx == 1].index
df = df.loc[calendar.intersection(df.index).intersection(fullcalendar)]

# Apply the price filter
df = df[df['Unadjusted Close'] > 5]

Now, let's compute the moving averages (trend signals). Moving averages are computed at the end of each month using stock prices over different lag lengths. The moving average (MA) for stock jjj with lag LLL at month ttt is defined as:

Aj,t,L=Pj,d−L+1+Pj,d−L+2+⋯+Pj,dLA_{j,t,L} = \frac{P_{j,d-L+1} + P_{j,d-L+2} + \dots + P_{j,d}}{L}Aj,t,L​=LPj,d−L+1​+Pj,d−L+2​+⋯+Pj,d​​

where Pj,dP_{j,d}Pj,d​ is the closing price for stock jjj on the last trading day ddd of month ttt, and LLL is the lag length. Then, we normalize the moving average prices by the closing price on the last trading day of the month:

A~j,t,L=Aj,t,LPj,d\tilde{A}_{j,t,L} = \frac{A_{j,t,L}}{P_{j,d}}A~j,t,L​=Pj,d​Aj,t,L​​

This ensures stationarity and prevents biases from high-priced stocks.

The paper considers MAs of lag lengths 3-, 5-, 10-, 20-, 50-, 100-, 200-, 400-, 600-, 800- and 1,000-days. Let's see how this translates into code:

# MA prices normalized
for lag in [3, 5, 10, 20, 50, 100, 200, 400, 600, 800, 1000]:
    df[f'ma_{lag}'] = df['Close'].rolling(window=lag).mean() / df['Close']

We are getting to the final steps of the data-gathering stage. Now, we must add the target variable that will be used to predict the monthly expected stock returns cross-sectionally.

In other words, we must compute the next month return for every date:

  1. First, we gather the prices from a given symbol;

  2. Next, we compute the normalized MAs for all lags;

  3. Then, we select only the last day of every month and compute the next month's return;

  4. Finally, we apply the size/price filters.

We already did (1), (2) and (4). Now, let's see how to do (3):

# Get only the last trading day of the month
df['Date'] = df.index
df = df.groupby([df.index.year, df.index.month]).last().set_index('Date')
# Add the target variable: the next month's return
df['next_month_return'] = (df['Close'] / df['Close'].shift(1) - 1).shift(-1)

It's important to observe the correct use of the .shift(x) operator.

.shift(1) gets the previous value, while .shift(-1) gets the next value. Messing with these operators is a common source of error in many quant codebases found online.

So, when we run (df['Close'] / df['Close'].shift(1) - 1) , we are computing the current month's return. After that, when we apply .shift(-1) , this results in the next month's return, which is exactly what we need.

Now, let's put everything together into a method that retrieves data for a given symbol:

def get_data(symbol, adjustment=norgatedata.StockPriceAdjustmentType.CAPITALSPECIAL):
    df = norgatedata.price_timeseries(
        symbol,
        timeseriesformat='pandas-dataframe',
        stock_price_adjustment_setting=adjustment
    )[['Close', 'Unadjusted Close']]
    if len(df) == 0:
        return None

    # MA prices normalized
    for lag in [3, 5, 10, 20, 50, 100, 200, 400, 600, 800, 1000]:
        df[f'ma_{lag}'] = df['Close'].rolling(window=lag).mean() / df['Close']
    
    # Get only the last trading day of the month
    df['Date'] = df.index
    df = df.groupby([df.index.year, df.index.month]).last().set_index('Date')
    # Add the target variable: the next month's return
    df['next_month_return'] = (df['Close'] / df['Close'].shift(1) - 1).shift(-1)
    # Remove the NAs and treat last month
    if df.index[-1] == fullcalendar[-1]:  # Complete month
        df = df.dropna()
    else:
        df = df.dropna().iloc[:-1, :]  # Incomplete month: disregard last
    
    # Instead of using the size filter (we don't have market cap data), we will
    # approximate it by considering only stocks that belong to Russell 3000
    idx = norgatedata.index_constituent_timeseries(
        symbol, 
        'Russell 3000', 
        timeseriesformat='pandas-dataframe'
    )['Index Constituent']
    calendar = idx[idx == 1].index
    df = df.loc[calendar.intersection(df.index).intersection(fullcalendar)]
    
    # Apply the price filter
    df = df[df['Unadjusted Close'] > 5]
    if len(df) == 0:
        return None

    # Finally, organize the columns and the index in multi-levels
    df = df[[c for c in df.columns if c not in ['Close', 'Unadjusted Close']]]
    df.index = pd.MultiIndex.from_tuples([(d, symbol) for d in df.index])
    return df

Let's see the data from the past 12 months of AAPL by running get_data('AAPL').tail(12) :

We can now gather data for all stocks in the Russell 3000 universe:

symbols = norgatedata.watchlist_symbols('Russell 3000 Current & Past')
data = []
for symbol in tqdm(symbols):
    df = get_data(symbol)
    if df is not None:
        data.append(df)

data = pd.concat(data, axis=0).sort_index()

The data DataFrame is a table with approximately 800k rows and 12 columns that looks like this:

Great! We are ready to move to the next step: compute the trend factors. It's important to highlight how the data is organized:

  • In the first level of our index, we have the last day of each month;

  • In the second level of our index, we have all stocks in our universe for that particular date;

  • In the columns, we have the MAs computed with the prices up until that specific date for that specific stock, and the next month return for that particular stock.

Step 1: Cross-sectional regressions

To predict the monthly expected stock returns cross-sectionally, we use a two-step procedure. In the first step, we run in each month ttt a cross-section regression of stock returns on observed normalized MA signals to obtain the time-series of the coefficients on the signals:

rj,t=β0,t+∑iβi,tA~j,t−1,Li+εj,tr_{j,t} = \beta_{0,t} + \sum_{i} \beta_{i,t} \tilde{A}_{j,t-1,L_i} + \varepsilon_{j,t}rj,t​=β0,t​+i∑​βi,t​A~j,t−1,Li​​+εj,t​

where:

  • rj,t=r_{j,t}=rj,t​= return on stock jjj in month ttt

  • A~j,t−1,Li=\tilde{A}_{j,t-1,L_i}=A~j,t−1,Li​​= trend signal at the end of month t−1t - 1t−1 on stock jjj with lag LiL_iLi​

  • βi,t=\beta_{i,t}=βi,t​= coefficient of the trend signal with lag LiL_iLi​ in month ttt

  • β0,t=\beta_{0,t}=β0,t​= intercept in month ttt

coefs = []
for date in tqdm(data.index.get_level_values(0).unique()):
    df = data.loc[date]
    X = sm.add_constant(df.iloc[:, :-1])
    y = df.iloc[:, -1]
    # Fit the model
    model = sm.OLS(y, X).fit()
    c = model.params
    c.name = date
    coefs.append(c)

# The shift(1) is extremely important. It ensures we avoid lookahead bias
coefs = pd.concat(coefs, axis=1).T.shift(1).dropna()

The code above is straightforward. It produces the coefs DataFrame, a table with close to 400 rows and 12 columns with all βi,t\beta_{i,t}βi,t​ coefficients:

The paper has the following important sentence: "It should be noted that only information in month ttt or prior is used above to regress returns in month ttt." This is what the .shift(1) operator in the last line of the last code block is for. Omitting that code would result in lookahead bias and results too good to be true.

Step 2: Expected returns

We estimate the expected return for month t+1t + 1t+1 from

Et[rj,t+1]=∑iEt[βi,t+1]A~j,t,LiE_t[r_{j,t+1}]=\sum_i E_t[\beta_{i,t+1}] \tilde{A}_{j,t,L_i}Et​[rj,t+1​]=i∑​Et​[βi,t+1​]A~j,t,Li​​

where Et[rj,t+1]E_t[r_{j,t+1}]Et​[rj,t+1​] is our forecasted expected return on stock jjj for month t+1t+1t+1 and Et[βi,t+1]E_t[\beta_{i,t+1}]Et​[βi,t+1​] is the estimated expected coefficient of the trend signal with lag LiL_iLi​ and is given by

Et[βi,t+1]=112∑m=112βi,t+1−mE_t[\beta_{i,t+1}] = \frac{1}{12}\sum_{m=1}^{12} \beta_{i,t+1-m}Et​[βi,t+1​]=121​m=1∑12​βi,t+1−m​

which is the average of the estimated loadings on the trend signals over the past 12 months.

First, let's compute the matrix Et[βi,t+1]E_t[\beta_{i,t+1}]Et​[βi,t+1​]:

exp_coefs = coefs.rolling(window=12).mean().dropna()
exp_coefs = exp_coefs[[f for f in exp_coefs.columns if f.startswith('ma_')]]

Note that we do not include an intercept above because it is the same for all stocks in the same cross-section regression, and thus it plays no role in ranking the stocks.

Trend factor

Now, we are ready to construct the trend factor. We loop through all dates, performing the dot product between the estimated expected coefficients of the trend signals and the trend signals matrices:

factors = []
for date in tqdm(exp_coefs.index):
    df = data.loc[date, [f for f in df.columns if f.startswith('ma_')]]
    c = exp_coefs.loc[date]
    tf = df @ c
    tf.name = 'expected_return'
    tf.index.name = 'symbol'
    tf = pd.DataFrame(tf)
    tf['date'] = date
    tf['next_month_return'] = data.loc[date, 'next_month_return']
    tf = tf[['date', 'expected_return', 'next_month_return']].reset_index()
    factors.append(tf)

factors = pd.concat(factors)

And that's all there is to it. We get the following table, with approximately 800k rows and 4 columns:

In our last step, we sort all stocks into five portfolios by their expected returns. The portfolios are equal-weighted and rebalanced every month. The return difference between the quintile portfolio of the highest expected returns and the quintile portfolio of the lowest is defined as the return on the trend factor. Intuitively, the trend factor buys stocks that are forecasted to yield the highest expected returns (Buy High) and shorts stocks that are forecasted to yield the lowest expected returns (Sell Low).

Adding the quantiles is straightforward:

num_quantiles = 5
# This code computes the ranking by date
factors['rank'] = factors.groupby('date')['expected_return'].rank(pct=True)
# Create the quantiles
labels = np.arange(1, num_quantiles + 1)[::-1]
factors['quantile'] = pd.cut(factors['rank'], num_quantiles, labels=labels)
factors['quantile'] = factors['quantile'].apply(lambda x: 'Q' + str(x))

The code above produces the following table:

Now, we group by date:

rets = factors[['date', 'next_month_return', 'quantile']]\
    .groupby(['date', 'quantile'], observed=False)
rets = rets.agg(['count', 'mean']).droplevel(0, axis=1)
rets = rets.reset_index().set_index(['date', 'quantile']).unstack(['quantile'])
rets = rets.rename(columns={'mean': 'return'}, level=0)
max_q = rets.columns.get_level_values(1).categories[-1]
min_q = rets.columns.get_level_values(1).categories[0]

The rets DataFrame has the monthly returns and the number of stocks from each quantile, as we can see below:

Finally, we can plot the return of the trend factor:

monthly_returns = rets[('return', max_q)] * 1 - rets[('return', min_q)] * 1
cumulative_returns = (1 + monthly_returns).cumprod()
cumulative_returns.plot(logy=True)

If we reduce the exposure on the shorts to 0.5 instead of 1, we can get a better equity curve:

Finally, we can see monthly and annual returns with the following code:

monthly_returns_df = monthly_returns.to_frame(name='monthly_return')
monthly_returns_df['year'] = monthly_returns_df.index.year
monthly_returns_df['month'] = monthly_returns_df.index.month
# Pivot the DataFrame
pivot_table = monthly_returns_df.pivot(
    index='year', 
    columns='month', 
    values='monthly_return'
)
annual_returns = cumulative_returns.resample('YE').last().pct_change().dropna()
annual_returns.index = annual_returns.index.year
pivot_table['Annual Return'] = annual_returns
first = cumulative_returns.resample('YE').last().iloc[0] - 1
pivot_table = pivot_table.reindex(columns=list(range(1, 13)) + ['Annual Return'])
pivot_table = pivot_table.sort_index(ascending=False)
pivot_table.iloc[-1, -1] = first

Conclusion

This paper introduces a trend factor that synthesizes short-, intermediate-, and long-term price trends using moving averages, significantly outperforming traditional factors like momentum, short-term reversal, and long-term reversal. The trend factor provides higher returns, better risk-adjusted performance, and reduced crash risk, making it a valuable addition to both asset pricing models and portfolio construction strategies.

Implementing this approach in Python is an excellent exercise in quantitative finance and systematic trading. It allows practitioners to explore data handling, time-series analysis, and cross-sectional regressions using libraries such as Pandas, NumPy, and Statsmodels. Coding this methodology in Python is a practical way to deepen one’s understanding of factor-based investing and trend-following strategies.

The last few lines organize the columns and the index. The index, in particular, is organized in a Multi-level indexing, which is a great Pandas feature to work with higher dimensional data. To more information about MultiIndex / advanced indexing, please check .

To do the regressions, we will use Python's . We loop through all dates, doing the cross-sectional regressions:

A Trend Factor: Any Economic Gains from Using Information Over Investment Horizons?
Norgate website
Pandas DatetimeIndex
Pandas documentation
Statsmodels package
Data from the past 12 months of AAPL
DataFrame ready to compute the trend factors
Coefficients of the trend signals
Estimated expected coefficients of the trend signals
Trend factors for every symbol-date pair
Trend factor quantiles
Final results
Trend factor returns
Trend factor with less short exposure
Monthly and annual returns
Page cover image