Please use this identifier to cite or link to this item: https://elibrary.khec.edu.np:8080/handle/123456789/439
Title: IMAGE CAPTIONING USING CNN AND DEEP STACKED LSTM
Authors: Aganja, Aakriti
Upadhyaya, Chiranjevi
Gansi, Oshin
Suwal, Sujata
Advisor: Er. Raisha Shrestha
Keywords: Image captioning, Deep neural network approach, Encoder-decoder approach, LSTM, RNN, CNN.
Issue Date: 2021
College Name: Khwopa College of Engineering
Level: Bacherlor's Degree
Degree: B.E. Computer
Department Name: Department of Computer
Abstract: In image captioning, the content describing an image are generated automatically which involves computer vision and NLP (Natural Language Processing). Researches in the field of the computer vision and natural processing language has sky rocketed in past few years resulting remarkable contributions and achievements. This has resulted positive impacts in the field of image captioning as well. Numbers of the models and methods have been developed to attain maximum accuracy in the caption. In this work, deep neural network using Encoder-decoder approach was implemented for image captioning. In Encoder-decoder approach, VGG16 was used as encoder to extract features from images in form of dense vectors, Bidirectional Stacked LSTM as decoder uses extracted dense vectors as input for the sentence generation. The caption of the image generated mainly focused in describing the object’s gestures and action including some information about the environment. Both English And Nepali caption was generated .The BLEU score obtained from the generated captioning was 0.576
URI: https://elibrary.khec.edu.np/handle/123456789/439
Appears in Collections:TU Computer Report

Files in This Item:
File Description SizeFormat 
image_captioning.pdf
  Restricted Access
6 MBAdobe PDFView/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.