Award Date

12-1-2021

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science

First Committee Member

Fatma Nasoz

Second Committee Member

Mingon Kang

Third Committee Member

Beiyu Lin

Fourth Committee Member

Kazem Taghva

Fifth Committee Member

Brendan Morris

Number of Pages

97

Abstract

State-of-the-art image captioning models can successfully produce a diverse set of accurate captions. Previous research has focused on improving caption diversity while maintaining a high level of fidelity. We shift the focus from accuracy and diversity to controllability. We use a modified version of the traditional encoder-decoder network that allows the model to produce a meaningful and structured latent space. We then explore the latent space using several latent cartographic methods: lerp, slerp, analogy completion, attribute vector rotation, and interpolation graphs. Additionally, we discuss different categories of latent space and provide modifications for each of the cartographic methods. Finally, we show that it is possible to generate a set of diverse and accurate captions with desired real space semantics by sampling from different areas of the latent space.

Keywords

Computer Vision; Generative Networks; Image Captioning; Latent Space; Machine Learning; Natural Language Processing

Disciplines

Computer Sciences

File Format

pdf

File Size

16200 KB

Degree Grantor

University of Nevada, Las Vegas

Language

English

Rights

IN COPYRIGHT. For more information about this rights statement, please visit http://rightsstatements.org/vocab/InC/1.0/

Available for download on Thursday, December 15, 2022


Share

COinS