<?xml version='1.0' encoding='UTF-8'?><metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns="http://dublincore.org/documents/dcmi-terms/"><dcterms:title>Image-Guided Object Detection using OWL-ViTand Enhanced Query Embedding Extraction</dcterms:title><dcterms:identifier>https://doi.org/10.7910/DVN/PRHQMK</dcterms:identifier><dcterms:creator>Melih Serin</dcterms:creator><dcterms:publisher>Harvard Dataverse</dcterms:publisher><dcterms:issued>2024-04-14</dcterms:issued><dcterms:modified>2024-04-14T22:49:15Z</dcterms:modified><dcterms:description>Computer vision has been receiving increasing attention with the recent complex Generative AI models released by tech industry giants, such as OpenAI and Google. However, there is a specific subfield that we wanted to focus on, that is, Image-Guided Object Detection. A detailed literature survey directed us towards a successful study called Simple Open-Vocabulary Object Detection with Vision Transformers (OWL-ViT) [1], which is a multifunctional complex model that can also perform image-guided object detection as a side function. In this study, some experiments have been conducted utilizing OWL-ViT architecture as the base model and manipulated the necessary parts to achieve a better one-shot performance. Code and models are available on GitHub.</dcterms:description><dcterms:subject>Engineering</dcterms:subject><dcterms:subject>Open-Vocabulary Object Detection with Vision Transformers (OWL-ViT)</dcterms:subject><dcterms:subject>Object Detection</dcterms:subject><dcterms:subject>Vision Transformers</dcterms:subject><dcterms:subject>End-to-End Training</dcterms:subject><dcterms:subject>Generalized Intersection over Union (gIoU) Loss</dcterms:subject><dcterms:isReferencedBy>10.5281/zenodo.10938342</dcterms:isReferencedBy><dcterms:date>2024-04-14</dcterms:date><dcterms:contributor>KUUJE</dcterms:contributor><dcterms:dateSubmitted>2024-04-14</dcterms:dateSubmitted><dcterms:license>CC0 1.0</dcterms:license></metadata>