Skip to main content Skip to secondary navigation
Main content start

Asaf Manela (Washington University in St. Louis): Chronologically Consistent Large Language Models

Event Details:

Thursday, April 24, 2025
9:00am - 10:00am PDT

The Stanford AFTLab invites you to the AI & Big Data in Finance Research Forum (ABFR) webinar:

The webinar is on April 24, 9-10am Pacific Time (12-1pm ET)

Presenter: Asaf Manela (Washington University in St. Louis)

Discussant: Suproteem Sarkar (Harvard University) 

Zoom webinar link:https://stanford.zoom.us/j/95194538101?pwd=ybV8uqQdptW3X7ANovTGg5Z7Gnzo2X.1

Webinar ID: 951 9453 8101

Passcode: 879344

For more information, please visit our website: https://www.abfr-forum.org

To stay up to date please join our mailing list: https://groups.google.com/u/0/g/abfr-forum

Title: Chronologically Consistent Large Language Models 

Authors: Songrun He (Washington University in St. Louis), Linying Lv (Washington University in St. Louis), Asaf Manela (Washington University in St. Louis), Jimmy Wu (Washington University in St. Louis)

Abstract: Large language models are increasingly used in social sciences, but their training data can introduce lookahead bias and training leakage. A good chronologically consistent language model requires efficient use of training data to maintain accuracy despite time-restricted data. Here, we overcome this challenge by training a suite of chronologically consistent large language models, ChronoBERT and ChronoGPT, which incorporate only the text data that would have been available at each point in time. Despite this strict temporal constraint, our models achieve strong performance on natural language processing benchmarks, outperforming or matching widely used models (e.g., BERT), and remain competitive with larger open-weight models. Lookahead bias is model and application-specific because even if a chronologically consistent language model has poorer language comprehension, a regression or prediction model applied on top of the language model can compensate. In an asset pricing application predicting next-day stock returns from financial news, we find that ChronoBERT’s real-time outputs achieve a Sharpe ratio comparable to state-of-the-art models, indicating that lookahead bias is modest. Our results demonstrate a scalable, practical framework to mitigate training leakage, ensuring more credible backtests and predictions across finance and other social science domains.

Bio of speaker: Asaf Manela is a financial economist working on empirical asset pricing and financial intermediation, often using text as data. He is an Associate Professor of Finance at Washington University in St. Louis and at Reichman University. He serves as an Associate Editor of the Journal of Finance. His research has been published in leading finance journals such as the Journal of Finance, Journal of Financial Economics, and Review of Financial Studies. He was named one of the world’s best 40 business school professors under the age of 40 by Poets&Quants. He received his PhD in Finance and MBA from the University of Chicago. Before pursuing an academic career, he worked as a software engineer. He holds a BA in Economics and Computer Science from Boston University.

Bio of discussant: Suproteem Sarkar is a PhD candidate in economics at Harvard who studies finance and machine learning. He will join the University of Chicago Booth School of Business as an Assistant Professor of Finance and Applied AI in 2025. His research covers topics in asset valuation, language modeling, and machine learning in economics. He completed his SM in applied mathematics and AB in computer science at Harvard, and also spent time at Microsoft and Google.

Related Topics

Explore More Events

No events at this time. Please check back later.