Award Date
5-1-2020
Degree Type
Thesis
Degree Name
Master of Science in Computer Science
Department
Computer Science
First Committee Member
Justin Zhan
Second Committee Member
Ju-Yeon Jo
Third Committee Member
Fatma Nasoz
Fourth Committee Member
Ge Kan
Number of Pages
78
Abstract
A feature fusion multi-modal neural network (MMN) is a network that combines different modalities at the feature level to perform a specific task. In this paper, we study the problem of training the fusion procedure for MMN. A recent study has found that training a multi-modal network that incorporates late fusion produces a network that has not learned the proper parameters for feature extraction. These late fusion models perform very well during training but fall short to its single modality counterpart when testing. We hypothesize that jointly trained MMN have weight space that is too large for effective training. To remedy this problem, we design a set of procedures that systematically narrow the search space so that the optimizer would only consider weights that are known to generalize well. As part of our systematic narrowing procedure, we enforce a weight constraint on the weights between the pre-fusion and fusion layers. Due to our given constraints on the network, modern methods cannot optimize our network without breaking our conditions. To remedy the problem, we create a simplex projection module that will be used after applying modern training frameworks. Our module will re-optimize our network such that the weight constraints are enforced. This new framework, which we call Projection Feature Mixture Model outperforms its single modality model as well as standard jointly trained MMN. In this paper, we provide a theoretical analysis to show advantages of utilizing MMN.
Keywords
Classification; Computer Vision; Deep Learning; Multi-Modal; Neural Network; Projection
Disciplines
Computer Sciences
File Format
File Size
1.6 MB
Degree Grantor
University of Nevada, Las Vegas
Language
English
Repository Citation
Ng, Henry, "Towards Multi-Modal Data Classification" (2020). UNLV Theses, Dissertations, Professional Papers, and Capstones. 3937.
http://dx.doi.org/10.34917/19412144
Rights
IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/