Please use this identifier to cite or link to this item: https://elibrary.khec.edu.np:8080/handle/123456789/874
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorEr. Milan Chikanbanjar-
dc.contributor.authorRasik Maharjan; Rizen Bikram Prajapati; Shreejan Hakuduwal; Swostika Shrestha;-
dc.date.accessioned2025-02-14T07:21:24Z-
dc.date.available2025-02-14T07:21:24Z-
dc.date.issued2024-
dc.identifier.urihttps://elibrary.khec.edu.np:8080/handle/123456789/874-
dc.description.abstractImage captioning is a multidisciplinary artificial intelligence (AI) research field that combines computer vision, natural language processing (NLP), and machine learning techniques. It aims to automatically generate textual descriptions for images, bridging the semantic gap between visual content and natural language. It has gained significant attention due to its potential applications in areas such as assistive technologies for visually impaired individuals, content-based image retrieval, and enhancing the accessibility of visual content on the web. Most previous works are based on the RNN-CNN approach, which produces inferior results compared to image captioning using the Transformer model. In this paper, we propose a model for image captioning using CNN and Transformer architecture. The image features are extracted using the convolutional neural network architecture Inception V3. Instead of using traditional recurrent neural network (RNN) as decoder, we present a Transformer architecture. The Transformer decoder leverages self-attention mechanisms for caption generation, enabling it to effectively recognize important objects, their attributes, and the relationships among objects in an image. The model is trained on the Flickr-8K dataset using the Cross-Entropy Loss Function. Our approach aims to generate syntactically and semantically correct sentences that accurately describe the image content.-
dc.format.extent55 p-
dc.subjectComputer Vision-
dc.subjectConvolutional Neural Networks Flickr8k Inception-
dc.titleImage Captioning using Transformer-
dc.typeReport-
local.college.nameKhwopa Engineering College-
local.degree.departmentDepartment of Computer Engineering-
local.college.batch2076-
local.degree.nameBE Computer-
local.degree.levelBachelor's Degree-
local.item.accessionnumberD.1439-
Appears in Collections:PU Computer Report

Files in This Item:
File Description SizeFormat 
Image captioning using Transformer.pdf
  Restricted Access
13.12 MBAdobe PDFThumbnail
View/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.