Data-Driven Methods in Finance

Fall 2023, IEOR 4576

Course Summary:

The Data-Driven Methods in Finance course at Columbia University provides a hands-on, project-based learning experience focusing on systematic quantitative investment. The course takes students through the entire data science workflow—from conceptualization to performance evaluation. Core topics include statistics, forecasting, machine learning, data scraping, and MLOps.

Students will compete in a real-time financial forecasting competition as part of the course. Participants will make investment decisions and present their analyses using open-source financial and alternative data.

Prerequisites for the course include a background in Python, linear algebra, statistics, and probability. Familiarity with operations research topics and optimization packages is also recommended. Attendance with a laptop is mandatory, as the course is computer-intensive and incorporates both lectures and lab work.

Ideally, this course suits students aspiring to careers as quants or data scientists in the financial sector.

For supplementary materials and summaries of class discussions, you can visit our GitHub page, which will also feature essential Python functions corresponding to each topic covered.

Course Schedule:

1) Mon, September 11, 2023: First class

2) Mon, September 18, 2023:

3) Mon, September 25, 2023:

4) Mon, October 2, 2023:

5) Mon, October 9, 2023:

6) Mon, October 16, 2023:

7) Mon, October 23, 2023:

8) Mon, October 30, 2023:

9) Mon, November 06, 2023: Academic holiday - no class

10) Mon, November 13, 2023: 

11) Mon, November 20, 2023: 

12) Mon, November 27, 2023: 

13) Mon, December 04, 2023: Final presentations

14) Mon, December 11, 2023: Final exam

Past slides: 


Past exams: Spring_2023, Fall_2023

Final Projects:

Competition Standing:

Logistics: 

Course Title: Data-Driven Methods in Finance (IEOR 4576)

Department: Industrial Engineering and Operations Research (IEOR)

Term: Fall 2024 (previous offering: Fall 2023, Spring 2023, Fall 2022)

Instructor: Dr. Naftali Cohen

Email: naftali.cohen(at)columbia.edu

Level: Graduate, 3 credits

Time: TBD

Dates: TBD

Location: TBD

Slack: TBD

Courseworks: TBD

Google Colab: TBD

Office hours: TBD

More Details:

Data and Methods: We will source data from Yahoo Finance, Factset, OpenBB, WRDS, and others. During class, we will only discuss and use known financial strategies, textbook algorithms, open-source software, and freely available datasets.

Course Prerequisites: This course is computer-intensive and assumes a working knowledge of Python. Students should know basic Python packages such as Numpy, Pandas, and Matplotlib. A solid foundation in undergraduate-level Calculus, Linear Algebra, Statistics, and Data Science is required. Familiarity with Operation Research topics, web scraping (e.g., Beautifulsoup), optimization packages (e.g., Gurobi or CVXPY), and the Google Colab environment is recommended. Financial background is optional.

Topics: During class, we will focus on the main data science workflow of generating ideas, sourcing information, extracting features, combining signals, optimizing decisions, and evaluating performance. We will discuss sample statistics, forecasting, machine learning methods, data scraping, MLOps, and more.

Class Meetings: Attending each class and bringing a laptop to class is necessary. This course employs lectures and computer labs, and we will devote significant time to practicing the techniques presented during class.

Project: Students will participate in an in-class real-time financial forecasting competition (similar to the M6 competition). They will be asked to augment open-source financial data with external open-source datasets. Students will be asked to present their findings to the class by the end of the course. The forecasting competition will focus on (a) the ability to estimate future returns and uncertainty, (b) the ability to combine estimates into an investment decision, (c) the importance of a consistent investment strategy, (d) the importance of alternative datasets and proper use of data, and (e) the importance of teamwork, transparency, and learning from mistakes. The winning students are guaranteed an A+ and a special prize.

Required Textbooks: 

Quantitative Equity Portfolio Management

Modern Portfolio Theory

Recommended Reading:

Historical Overview: Fortune's Formula

Active vs. Passive Management: The Arithmetic of Active Management

Decision Making: Rational Decision-Making under Uncertainty

Biases: Seven Sins of Quantitative Investing

General Practices: Best Practices in Research for Quantitative Equity Strategies

Diversification: The Myth of Diversification and The Myth of Diversification Reconsidered

Efficient Frontier With Multiple Assets: The Golden Rule of Investing

Probability and Forecasting: Taleb-Silver Feud

Grading:

Disclaimer:

This course does not provide investment advice or pre-made trading algorithms and does not reflect the views of my affiliated entities and agencies. The primary goal of the course is to highlight the challenges that Data Science and Machine Learning methods face when working with financial data. These challenges include a short history, non-stationarity, regime change, and low signal content, which make it difficult to achieve robust results. The topics we discuss serve as a guide to using these methods to inform an investment decision through a systematic and scientific workflow.