Título: Variable Selection and Prediction in High Dimensional Models
Autores: Barut, Ahmet Emre
Fecha: 2013-09-16
2013-09-16
2013
Publicador: Universidad de Princenton
Fuente:
Tipo: Academic dissertations (Ph.D.)
Tema: Classification
Fisher Discriminant
Generalized Linear Models
High Dimensional Models
Penalized Estimators
Statistics
Statistics
Mathematics
Biostatistics
Descripción: The aim of this thesis is to develop methods for variable selection and statistical prediction for high dimensional statistical problems. Along with proposing new and innovative procedures, this thesis also focuses on the theoretical properties of the proposed methods and establishes bounds on the statistical error of resulting estimators. The main body of the thesis is divided into three parts. In Chapter 1, a variable screening method for generalized linear models is discussed. The emphasis of the chapter is to provide a procedure to reduce the number of variables in a reliable and fast manner. Then, Chapter 2 considers the linear regression problem in high dimensions when the noise has heavy tails. To perform robust variable selection, a new method, called adaptive robust Lasso, is introduced. Finally, in Chapter 3, the subject is high dimensional classification problems. In this chapter, a robust approach for this problem is proposed and theoretical properties for this approach are established. Overall, the methods proposed in this thesis collectively attempt to solve many of the issues arising in high dimensional statistics, from screening to variable selection. In Chapter 1, we study the variable screening problem for generalized linear models. In many applications, researchers often have some prior knowledge that a certain set of variables is related to the response. In such a situation, a natural assessment on the relative importance of the other predictors is the conditional contributions of the individual predictors in presence of the known set of variables. This results in conditional sure independence screening (CSIS). We propose and study CSIS in the context of generalized linear models. For ultrahigh-dimensional statistical problems, we give conditions under which sure screening is possible and derive an upper bound on the number of selected variables. We also spell out the situation under which CSIS yields model selection consistency. In Chapter 2, we consider the heavy-tailed high dimensional linear regression problem. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, we investigate the model selection oracle property and establish the asymptotic normality of a quantile regression based method called WR-Lasso. We show that only mild conditions on the model error distribution are needed. Our theoretical results also reveal that adaptive choice of the weight vector is essential for the WR-Lasso to enjoy these nice asymptotic properties. To make the WR-Lasso practically feasible, we propose a two-step procedure, called adaptive robust Lasso (AR-Lasso), in which the weight vector in the second step is constructed based on the L_1 penalized quantile regression estimate from the first step. In Chapter 3, we provide an analysis about the issue of measurement errors in high dimensional linear classification problems. For such settings, we propose a new estimator called the robust sparse linear discriminant, that recovers the sparsity signal and adapts to the unknown noise level simultaneously. In contrast to the existing methods, we show that this new method has low risk properties even in the case of measurement errors. Moreover, we propose a new algorithm that recovers the solution paths for a continuum of regularization parameter values.
Idioma: Inglés

Artículos similares:

Engineering solutions for a carbon-constrained world por Celia, M. A.,Nordbotten, J. M.
Impact of capillary forces on large-scale migration of CO2 por Nordbotten, Jan M.,Dahle, Helge K.
Impact of geological heterogeneity on early-stage CO2 plume migration por Ashraf, Meisam,Lie, Knut-Andreas,Nilsen, Halvor M.,Nordbotten, Jan M.,Skorstad, Arne
A model-oriented benchmark problem for CO2 storage por Dahle, Helge K.,Eigestad, Geir T.,Nordbotten, Jan M.,Pruess, K.
CO2 trapping in sloping aqiufers: High resolution numerical simulations por Elenius, Maria,Tchelepi, Hamdi,Johannsen, Klaus
Report from CO2 storage workshop por Dahle, Helge K.,Lien, Martha,Nordbotten, Jan M.,Lie, Knut-Andreas,Braathen, Alvar,Helmig, Rainer,Class, Holger,Celia, Michael A.
Summary of Princeton Workshop on Geological Storage of CO2 por Celia, Michael A.,Nordbotten, Jan M.,Bachu, Stefan,Kavetski, Dmitri,Gasda, Sarah
10