Large Dimensional Latent Factor Modeling with Missing Observations and Applications to Causal Inference
This paper develops the inferential theory for latent factor models estimated from large dimensional panel data with missing observations. We estimate a latent factor model by applying principal component analysis to an adjusted covariance matrix estimated from partially observed panel data. We derive the asymptotic distribution for the estimated factors, loadings and the imputed values under a general approximate factor model. The key application is to estimate counterfactual outcomes in causal inference from panel data. The unobserved control group is modeled as missing values, which are inferred from the latent factor model. The inferential theory for the imputed values allows us to test for individual treatment effects at any time. We apply our method to portfolio investment strategies and find that around 14% of their average returns are significantly reduced by the academic publication of these strategies.