Speaker(s):Zhigang Yao (National University of Singapore, Department of Statistics and Data Science)
While classical statistics has dealt with observations which are real numbers or elements of a real vector space, nowadays many statistical problems of high interest in the sciences deal with the analysis of data which consist of more complex objects, taking values in spaces which are not naturally (Euclidean) vector spaces but which still feature some geometric structure. I will discuss the problem of finding principal components for multivariate datasets that lie on an embedded nonlinear Riemannian manifold within a higher-dimensional space. The aim is to extend the geometric interpretation of principal component analysis (PCA) while being able to capture the non-geodesic form of variation in the data. I will introduce the concept of a principal submanifold: a manifold passing through the center of the data and at any point on the manifold extending in the direction of highest variation in the space spanned by the eigenvectors of the local tangent space PCA. We show that principal submanifolds yield the usual principal components in Euclidean space. We illustrate how to find, use, and interpret principal submanifolds, by which a principal boundary can be further defined for data sets on manifolds.