Project leaders: Professors Xiuyuan Cheng and Hau-Tieng Wu
Project manager: Didong Li
Team members: Ajay Dheeraj, Inchan Hwang, Tyler Lian, and Joseph Saldutti
Accompanying the rise of large data in high dimensions (buzzword: ``big data") is an increasing necessity for effective dimension reduction methods to make that data more amenable to study. In general, the goal of dimension reduction is to reduce the dimension of the data while preserving or extracting its key features or other properties of interest. Spectral graph-based kernel techniques like diffusion maps rely on the convergence of the graph Laplacian matrix to the Laplace-Beltrami operator on a manifold to preserve local geometry information of the original data to the low dimension embedding. Extending previous research on variable bandwidth kernels, this project studied the converge of a new proposed kernel. The main desired advantage of this new kernel is to remove the computational dependence of the intrinsic dimension of the given data (which is difficult to derive reliably) by utilizing a k-nearest neighbors method.
The images above come from running our algorithm on the MNIST data set. They cluster each number into groups. In the images, the axes have no real meaning and each color represents a different number. This is an real-life example of how our algorithm can effectively cluster real data in reduced dimensions. The pictures are MNIST_2_numbers (separation of two different numbers) and MNIST_all (separation of all numbers in the MNIST data set). They are 3d plots.