Xebia_GPTVision - Documentation (O11)

Stable version 1.0.0 (Compatible with OutSystems 11)

Uploaded

on 17 December 2024

5.0

(3 ratings)

Documentation

1.0.0

About The Connector & Demo

The GPT Vision Connector is designed to leverage OpenAI's advanced capabilities for understanding visual content. It doesn’t just identify objects in an image – it goes much deeper to provide a comprehensive understanding of what’s happening in the picture. Accurately identifies the colors present in the image, understands the relationships between objects and can explain actions or interactions (e.g., "What are they doing?") and provides highly specific and detailed interpretations of the image. Alongside, we've crafted a functional demo application showcasing its use.

Pre-requisites

Here is the step-by-step documentation for getting the API key for "gpt-4o” model from Open AI.

Click on the below URL to proceed further

https://openai.com/

Create an OpenAI account‍

Verify your account‍

Log into your account‍

Navigate to the API section.

Generate a new API key.

Save your API key.

Configuring OutSystems Demo Application

Then add that key as Site property of our demo application to continue our services.

About the Demo Application

Step 1: Access the Demo Page

1.Select File: Use the "Select File" option and the upload icon to upload an image. The uploaded image will then be displayed in the image preview.

2.Dropdown Box: Uploaded files will appear in the dropdown,

3.Delete Icon: You can delete the selected file from the dropdown list by clicking the delete icon.

4.User Question: Type your question in the "User Question" box, and the answer will be displayed in the "Answer" section.

5.Clear Chat: To reset the "Answer" section, use the "Clear Chat" button

This setup allows you to upload, select, delete, and interact with documents seamlessly.

Step 2: Upload an Image File

Upload an image file using the "Select File" option. The chosen image will appear in the preview section below. Click the upload icon to complete the process, which may take a few seconds. Once uploaded, the image will be available in the "Select document" dropdown.

Step 3: Ask your Question

Enter your question in the "User Question" box on the left side of the page, where the prompt says, "How can we help you?" For example, type "How many people are there?" and click the send icon to submit your query.

Step 4: Generate the answer

After submitting your query, OpenAI GPT Vision will process it and display the response in the "Answer" section, which also shows the chat history. To clear all chats, click the "Clear Chat" button, or delete a specific response using the delete icon next to it.

Example Prompts and Responses:

1.Identifying Objects

Prompt: "How many people are there?"

Response: "There are three people in the image."

GPT Vision can accurately identify and count objects in the image.

2.Explaining Relationships

Prompt: "What are the doing?"

Response: "They are harvesting tea leaves in a tea plantation."

It not only recognizes objects but also understands their actions and relationships within the image.

3.Describing Colors

Prompt: "Can you list the colors you see in the image?"

Response: "The image contains the following colors: Green (tea plants and trees) - White (shirts) - Red (scarves) - Purple (head covering) - Blue (scarf) - Yellow/Orange (bag) - Brown (tree trunks) - Black (clothing elements) - Beige (parts of clothing)

GPT Vision provides a detailed breakdown of the colors in the image, highlighting even subtle variations.

Note: Please refer to the image for prompt output

Use Cases:

1.Image Analysis and Description

Generate detailed descriptions of images for accessibility (e.g., for visually impaired users).

Identify objects, people, or actions in photos.

Analyze complex visuals like charts, graphs, and infographics.

2. Retail and E-commerce

Identify products and categorize them based on visuals.

Analyze images to detect product colors, patterns, and styles.

Provide detailed item descriptions for online listings.

3. Education and Training

Explain diagrams, illustrations, and educational visuals.

Provide detailed feedback on visual assignments or creative works.

Support interactive learning with visual-based question-answering.