Data-Driven Methods in Finance
Fall 2023, IEOR 4576
Course Summary:
The Data-Driven Methods in Finance course at Columbia University provides a hands-on, project-based learning experience focusing on systematic quantitative investment. The course takes students through the entire data science workflow—from conceptualization to performance evaluation. Core topics include statistics, forecasting, machine learning, data scraping, and MLOps.
Students will compete in a real-time financial forecasting competition as part of the course. Participants will make investment decisions and present their analyses using open-source financial and alternative data.
Prerequisites for the course include a background in Python, linear algebra, statistics, and probability. Familiarity with operations research topics and optimization packages is also recommended. Attendance with a laptop is mandatory, as the course is computer-intensive and incorporates both lectures and lab work.
Ideally, this course suits students aspiring to careers as quants or data scientists in the financial sector.
For supplementary materials and summaries of class discussions, you can visit our GitHub page, which will also feature essential Python functions corresponding to each topic covered.
Course Schedule:
1) Mon, September 11, 2023: First class
Lecture Topics: Course Overview
Slides: Intro
Homework: No homework today.
Portfolio Submission: Please submit your weekly portfolio (test submission) by 6 pm ET on Sunday to the "dueSep17" folder.
2) Mon, September 18, 2023:
Lecture Topics: Basic topics: CAPM, Benchmark, Alpha, Beta, Market Efficiency, Chapters 1-2 in Chincarini and Kim, 2022
Slides: Basics I
Homework + Answers: Univariate Regression Test
Portfolio Submission: Please submit your weekly portfolio (submission 1) by 6 pm ET on Sunday to the "dueSep24" folder.
3) Mon, September 25, 2023:
Lecture Topics: Factor modeling, Expected Return, Risk, Cross-sectional analysis, Chapters 2-3 in Chincarini and Kim, 2022; Chapter 1 in Joshi and Paterson, 2013
Slides: Basics II
Homework + Answers: Handout2
Portfolio Submission: Please submit your weekly portfolio (submission 2) by 6 pm ET on Sunday to the "dueOct01" folder.
4) Mon, October 2, 2023:
Lecture Topics: Factors and Factor Choice, Chapters 3-4 in Chincarini and Kim, 2022; Chapter 2 in Joshi and Paterson, 2013
Slides: Factors and Datasets
Homework + Answers: Handout3
Portfolio Submission: Please submit your weekly portfolio (submission 3) by 6 pm ET on Sunday to the "dueOct08" folder.
5) Mon, October 9, 2023:
Lecture Topics: Stock Screening and Ranking, Chapter 5 in Chincarini and Kim, 2022; Chapter 3-4 in Joshi and Paterson, 2013
Slides: Stock Screening & RankingÂ
Homework + Answers: Aggregate_Z_Score
Portfolio Submission: Please submit your weekly portfolio (submission 4) by 6 pm ET on Sunday to the "dueOct15" folder.
6) Mon, October 16, 2023:
Lecture Topics: Fundamental and Economic Factor Models, Chapters 6-7 in Chincarini and Kim, 2022; Chapter 5 in Joshi and Paterson, 2013
Slides: Fundamental Factor Models
Homework + Answers: Fundamental_Factor_Model
Portfolio Submission: Please submit your weekly portfolio (submission 5) by 6 pm ET on Sunday to the "dueOct22" folder.
7) Mon, October 23, 2023:
Lecture Topics: Forecasting Factor Premiums and Exposures, Chapters 7-9 in Chincarini and Kim, 2022; Backtesting
Slides: Economic Factor ModelsÂ
Homework + Answers:
 Backtest an "inertia" feature following the backtesting notebook
Portfolio Submission: Please submit your weekly portfolio (submission 6) by 6 pm ET on Sunday to the "dueOct29" folder.
8) Mon, October 30, 2023:
Lecture Topics: Portfolio Weights, Chapter 9 in Chincarini and Kim, 2022; Chapter 7 in Joshi and Paterson, 2013
Slides: Portfolio Weights
Homework + Answers: Mean_Variance_Efficient_Portfolios
Portfolio Submission: Please submit your weekly portfolio (submission 7) by 6 pm ET on Sunday to the "dueNov05" folder.
9) Mon, November 06, 2023: Academic holiday - no class
10) Mon, November 13, 2023:Â
Lecture Topics: Rebalancing, Transactions Costs, and Tax Management, Chapters 10-11 in Chincarini and Kim, 2022; Utility Foundation Chapter 4, Francis and Kim, 2013.Â
Homework + Answers:Â
Portfolio Submission: Please submit your weekly portfolio (submission 8) by 6 pm ET on Sunday to the "dueNov19" folder.
11) Mon, November 20, 2023:Â
Lecture Topics: Leverage, Market Neutral, and Bayesian, Chapters 12-14 in Chincarini and Kim, 2022; Non-mean-variance investment decisions Chapters 9-10, Francis and Kim, 2013. Â
Homework + Answers:Â
Portfolio Submission: Please submit your weekly portfolio (submission 9) by 6 pm ET on Sunday to the "dueNov26" folder.
12) Mon, November 27, 2023:Â
Slides: Performance Assessment, Backtesting, and Performance Attribution
Homework: no HW
Portfolio Submission: Please submit your weekly portfolio (submission 10) by 6 pm ET on Sunday to the "dueDec03" folder.
13) Mon, December 04, 2023: Final presentations
14) Mon, December 11, 2023: Final exam
Past slides:Â
Past exams: Spring_2023, Fall_2023
Final Projects:
Competition Standing:
Logistics:Â
Course Title: Data-Driven Methods in Finance (IEOR 4576)
Department: Industrial Engineering and Operations Research (IEOR)
Term: Fall 2024 (previous offering: Fall 2023, Spring 2023, Fall 2022)
Instructor: Dr. Naftali Cohen
Email: naftali.cohen(at)columbia.edu
Level: Graduate, 3 credits
Time: TBD
Dates: TBD
Location: TBD
Slack: TBD
Courseworks: TBD
Google Colab: TBD
Office hours: TBD
More Details:
Data and Methods: We will source data from Yahoo Finance, Factset, OpenBB, WRDS, and others. During class, we will only discuss and use known financial strategies, textbook algorithms, open-source software, and freely available datasets.
Course Prerequisites: This course is computer-intensive and assumes a working knowledge of Python. Students should know basic Python packages such as Numpy, Pandas, and Matplotlib. A solid foundation in undergraduate-level Calculus, Linear Algebra, Statistics, and Data Science is required. Familiarity with Operation Research topics, web scraping (e.g., Beautifulsoup), optimization packages (e.g., Gurobi or CVXPY), and the Google Colab environment is recommended. Financial background is optional.
Topics: During class, we will focus on the main data science workflow of generating ideas, sourcing information, extracting features, combining signals, optimizing decisions, and evaluating performance. We will discuss sample statistics, forecasting, machine learning methods, data scraping, MLOps, and more.
Class Meetings: Attending each class and bringing a laptop to class is necessary. This course employs lectures and computer labs, and we will devote significant time to practicing the techniques presented during class.
Project: Students will participate in an in-class real-time financial forecasting competition (similar to the M6 competition). They will be asked to augment open-source financial data with external open-source datasets. Students will be asked to present their findings to the class by the end of the course. The forecasting competition will focus on (a) the ability to estimate future returns and uncertainty, (b) the ability to combine estimates into an investment decision, (c) the importance of a consistent investment strategy, (d) the importance of alternative datasets and proper use of data, and (e) the importance of teamwork, transparency, and learning from mistakes. The winning students are guaranteed an A+ and a special prize.
Required Textbooks:Â
Quantitative Equity Portfolio Management
Recommended Reading:
Historical Overview: Fortune's Formula
Active vs. Passive Management: The Arithmetic of Active Management
Decision Making: Rational Decision-Making under Uncertainty
Biases: Seven Sins of Quantitative Investing
General Practices: Best Practices in Research for Quantitative Equity Strategies
Diversification: The Myth of Diversification and The Myth of Diversification Reconsidered
Efficient Frontier With Multiple Assets: The Golden Rule of Investing
Probability and Forecasting: Taleb-Silver Feud
Grading:
HW (best 8 submissions) 40%
Final exam: 30%
Project presentation: 30%
Disclaimer:
This course does not provide investment advice or pre-made trading algorithms and does not reflect the views of my affiliated entities and agencies. The primary goal of the course is to highlight the challenges that Data Science and Machine Learning methods face when working with financial data. These challenges include a short history, non-stationarity, regime change, and low signal content, which make it difficult to achieve robust results. The topics we discuss serve as a guide to using these methods to inform an investment decision through a systematic and scientific workflow.