Award Date

5-1-2020

Degree Type

Thesis

Degree Name

Master of Science in Computer Science

Department

Computer Science

First Committee Member

Justin Zhan

Second Committee Member

Ju-Yeon Jo

Third Committee Member

Fatma Nasoz

Fourth Committee Member

Ge Kan

Number of Pages

Abstract

A feature fusion multi-modal neural network (MMN) is a network that combines different modalities at the feature level to perform a specific task. In this paper, we study the problem of training the fusion procedure for MMN. A recent study has found that training a multi-modal network that incorporates late fusion produces a network that has not learned the proper parameters for feature extraction. These late fusion models perform very well during training but fall short to its single modality counterpart when testing. We hypothesize that jointly trained MMN have weight space that is too large for effective training. To remedy this problem, we design a set of procedures that systematically narrow the search space so that the optimizer would only consider weights that are known to generalize well. As part of our systematic narrowing procedure, we enforce a weight constraint on the weights between the pre-fusion and fusion layers. Due to our given constraints on the network, modern methods cannot optimize our network without breaking our conditions. To remedy the problem, we create a simplex projection module that will be used after applying modern training frameworks. Our module will re-optimize our network such that the weight constraints are enforced. This new framework, which we call Projection Feature Mixture Model outperforms its single modality model as well as standard jointly trained MMN. In this paper, we provide a theoretical analysis to show advantages of utilizing MMN.

Keywords

Classification; Computer Vision; Deep Learning; Multi-Modal; Neural Network; Projection

Disciplines

Computer Sciences

File Format

pdf

File Size

1.6 MB

Degree Grantor

University of Nevada, Las Vegas

Language

English

Repository Citation

Ng, Henry, "Towards Multi-Modal Data Classification" (2020). UNLV Theses, Dissertations, Professional Papers, and Capstones. 3937.
http://dx.doi.org/10.34917/19412144

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/

Download

Included in

Computer Sciences Commons

COinS

Digital Scholarship@UNLV

UNLV Theses, Dissertations, Professional Papers, and Capstones

Towards Multi-Modal Data Classification

Award Date

Degree Type

Degree Name

Department

First Committee Member

Second Committee Member

Third Committee Member

Fourth Committee Member

Number of Pages

Abstract

Keywords

Disciplines

File Format

File Size

Degree Grantor

Language

Repository Citation

Rights

Included in

Browse

Digital Scholarship@UNLV

UNLV Theses, Dissertations, Professional Papers, and Capstones

Towards Multi-Modal Data Classification

Author

Award Date

Degree Type

Degree Name

Department

First Committee Member

Second Committee Member

Third Committee Member

Fourth Committee Member

Number of Pages

Abstract

Keywords

Disciplines

File Format

File Size

Degree Grantor

Language

Repository Citation

Rights

Included in

Share

Browse