|Abstract:||The images of an object may look very different under different illumination conditions or viewing directions. This thesis considers the problem of clustering images of various objects into disjoint subsets according to object identity.
The clustering problem has been studied for decades and currently it is one of the most active research fields in machine learning. Compared to other data types considered in the clustering research, images are practically very imporant, but they are very difficult to be handled due to lack of understandings on their intrinsic properties.
First, the hypergraph partitioning problem, i.e., the problem of clustering in domains where the orders of affinity relations are higher than pairwise, is addressed, and a new two-step algorithm for solving this problem is proposed. The proposed algorithm first builds a graph approximation of the hypergraph with a novel approximation scheme, then a spectral graph partitioning algorithm is applied to generate the final clustering result. In addition to the thorough experiments, a theoretical analysis relates the proposed algorithm to existing hypergraph partitioning algorithms and presents the reasons for its superior performance.
Given powerful general clustering algorithms, the key step for solving the image clustering problem is to design an effective similarity measure between images. In this thesis, according to the conditions under which images are taken, three different similarity measures are presented. When the relative position between objects and the camera is fixed but the illumination in the environment changes, the conic affinity and the gradient affinity gives excellent clustering result. For each image in the clustering dataset, the conic affinity assigns non-negative weights to other images in its neighborhood, which represents the likelihood of being from the same object. The weights represent the global geometric relations among images in the image space. In contrast, the gradient affinity works in a pairwise manner. It directly compares two images based on the probabilistic distribution of the image gradient vectors of an object under different illumination.
Dealing with viewing direction changes is harder since these images do not have a simple global structure in the image space. Instead of considering global relation between images, the local linear structure between nearby images is exploited to determine the closeness between images. By considering affine warps of the images, small 3D pose variation can be handled robustly, which enhances the clustering result over pose variaition.
When the lighting and pose variation occurs simultaneously, we do not have an algorithm that presents reasonably good clustering performance yet. A few candidate algorithms are tested extensively and the result are given in this thesis. Finding an effective similarity measure for both changes still remains as an open problem.