Robust model averaging prediction of longitudinal response with ultrahigh-dimensional covariates
報告人簡介
栗家量,新加坡國立大學統計與應用概率系教授,同時在杜克大學-新加坡國大醫學院兼職教授。栗教授,2001年在中國科學技術大學獲得統計學學士學位,分別于2005年和2006年在美國威斯康星大學麥迪遜分校獲得公共健康學碩士學位和統計學博士學位。現在研究興趣包括工具變量、子集分析、變點模型、結構方程、精準醫學、診斷醫學、模型平均、非參、生存分析等。已發表論文160余篇,他是ASA和IMS的Fellow和ISI的Elected Member。
內容簡介
Model averaging is an attractive ensemble technique to construct fast and accurate prediction. Despite of having been widely practiced in cross-sectional data analysis, its application to longitudinal data is rather limited so far. We consider model averaging for longitudinal response when the number of covariates is ultrahigh. To this end, we propose a novel two-stage procedure in which variable screening is first conducted and then followed by model averaging. In both stages, a robust rank-based estimation function is introduced to cope with potential outliers and heavy-tailed error distributions, while the longitudinal correlation is modelled by a modified Cholesky decomposition method and properly incorporated to achieve efficiency. Asymptotic properties of our proposed methods are rigorously established, including screening consistency and convergence of the model averaging predictor, with uncertainties in the screening step and selected model set both taken into account. Extensive simulation studies demonstrate that our method outperforms existing competitors, resulting in significant improvements in screening and prediction performance. Finally, we apply our proposed framework to analyse a human microbiome dataset, showing the capability of our procedure in resolving robust prediction using massive metabolites.