Feature selection and engineering

In this section students need to decide which features are helpful in predicting the target

variable – for example, serial correlation, momentum, technical analysis indicators (such as

RSI), and signals from trend-following strategies (such as the moving average crossover).

  1. Select at least four explanatory variables and perform the necessary transformations

so that they are useful in the model phase. You are encouraged to use more than four

variables. Investigate feature engineering techniques such as PCA and encoding

target variables using one-hot encoding.

  1. Write a short paragraph about each technique investigated and show an

implementation of it in a Jupyter Notebook. Make sure to include references that

indicate where the ideas were sourced.

  1. At this stage groups should take the opportunity to familiarize themselves with the

cross-validation techniques for forecasting financial time series – for example,

traditional k-fold cross-validation versus walk forward analysis, and Purged K-Fold

CV. Write a short paragraph explaining each technique researched. Research at least

three (they don’t have to be the 3 mentioned here).

Helpful resources

The following techniques are covered in Dr Lopéz de Prado’s book (an implementation of the

first and second techniques can be found on Github, and a relevant blog post can be found


  1. The triple-barrier method (Labeling)
  2. Meta-labeling
  3. Fractionally Differentiated Features

The following papers provide insights into using technical analysis for features:

1 Kim, K.J. (2003). ‘Financial Time Series Forecasting Using Support Vector Machines’.

Neurocomputing, 55(1-2), pp.307-319.

2 Patel, J., Shah, S., Thakkar, P. and Kotecha, K. (2015). ‘Predicting Stock Market Index

Using Fusion of Machine Learning Techniques’. Expert Systems with Applications,

42(4), pp.2162-2172.

3 Patel, J., Shah, S., Thakkar, P. and Kotecha, K. (2015). ‘Predicting Stock And Stock Price

Index Movement Using Trend Deterministic Data Preparation and Machine Learning

Techniques’. Expert Systems with Applications, 42(1), pp.259-268.

4 Kara, Y., Boyacioglu, M.A. and Baykan, Ö.K. (2011). ‘Predicting Direction of Stock Price

Index Movement Using Artificial Neural Networks And Support Vector Machines: The

Sample of The Istanbul Stock Exchange’. Expert systems with Applications, 38(5),


PCA as a technique was covered in Module 2.

There are also many blogs that provide some insights:

1 Quantopian

2 QuantStart

3 QuantInsti

4 Robot Wealth