Version: Next

Image Analysis Agent

This guide demonstrates how to create an Image Analysis Agent that can analyze images and provide descriptions using the chat protocol. The agent is compatible with ASI:1 Chat and can process images to provide detailed analysis.

Live profile example: Image Analysis Agent Profile

Overview

In this example, you'll learn how to build a uAgent that can:

Accept images through the chat protocol
Analyze images using GPT-4 Vision
Provide detailed descriptions and analysis
Handle various image formats and sizes

For a basic understanding of how to set up an ASI:One compatible agent, please refer to the ASI:One Compatible Agents guide first.

Message Flow

The communication between the User, Chat Interface, and Image Analyser Agent proceeds as follows:

User Query
- The user submits a query along with an image through ASI:1 Chat.
Image Upload & Query Forwarding
- 2.1: The Chat Interface uploads the image to the Agent Storage.
- 2.2: The Chat Interface forwards the user's query with a reference to the uploaded image to the Image Analyser Agent as a ChatMessage.
Image Retrieval
- The Image Analyser Agent extracts a valid image URL from ResourceContent and includes it in the model input.
Image Analysis
- 4.1: The agent passes the query and image to the Image Analysis Function.
- 4.2: The Image Analysis Function processes the image and returns a response.
Response & Acknowledgement
- 5.1: The agent sends the analysis result back to the Chat Interface as a ChatMessage.
- 5.2: The agent also sends a ChatAcknowledgement to confirm receipt and processing of the message.
User Receives Response
- The Chat Interface delivers the analysis result to the user.

ASI Chat Protocol Flow

Implementation

In this example we will create an agent and its associated files on Agentverse that communicate using the chat protocol with the Chat Interface Refer to the Hosted Agents section to understand the detailed steps for agent creation on Agentverse.

Create a new agent named "Image Analysis Agent" on Agentverse and create the following files:

agent.py            # Main agent file with integrated chat protocol and message handlers for ChatMessage and ChatAcknowledgement
image_analysis.py   # Image analysis function

To create a new file on Agentverse:

Click on the New File icon
Assign a name to the File
Directory Structure

1. Image Analysis Implementation

The image_analysis.py file implements the logic for passing both text and image inputs to the OpenAI Responses API. It supports either base64 image resources or direct image URLs and returns the generated analysis.

image_analysis.py
import os
from typing import Any
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

MAX_TOKENS = int(os.getenv("MAX_TOKENS", "1024"))
MODEL_ENGINE = os.getenv("IMAGE_MODEL_ENGINE", "gpt-4.1-mini")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

if not OPENAI_API_KEY:
    raise ValueError(
        "OPENAI_API_KEY is required. Create one at https://platform.openai.com/api-keys"
    )

client = OpenAI(api_key=OPENAI_API_KEY)


def get_image_analysis(content: list[dict[str, Any]]) -> str:
    processed_content: list[dict[str, Any]] = []

    for item in content:
        item_type = item.get("type")
        if item_type == "text":
            text = item.get("text", "")
            if text:
                processed_content.append({"type": "input_text", "text": text})
        elif item_type == "resource":
            mime_type = item.get("mime_type", "")
            image_b64 = item.get("contents", "")
            if not mime_type.startswith("image/"):
                return f"Unsupported mime type: {mime_type}"
            if not image_b64:
                return "Image content is empty."
            processed_content.append(
                {
                    "type": "input_image",
                    "image_url": f"data:{mime_type};base64,{image_b64}",
                }
            )
        elif item_type == "resource_url":
            image_url = item.get("url", "")
            if not image_url:
                return "Image URL is empty."
            processed_content.append(
                {
                    "type": "input_image",
                    "image_url": image_url,
                }
            )

    if not processed_content:
        return "Please send a text prompt and an image attachment."

    try:
        response = client.responses.create(
            model=MODEL_ENGINE,
            input=[{"role": "user", "content": processed_content}],
            max_output_tokens=MAX_TOKENS,
        )
        if response.output_text:
            return response.output_text
        return "I could not generate an analysis for this image."
    except Exception as err:
        return f"An error occurred while analyzing the image: {err}"

2. Image Analysis Agent Setup

The agent.py file is the core of your application with integrated chat protocol functionality and contains message handlers for ChatMessage and ChatAcknowledgement protocols. Think of it as the main control center that:

Handles message handlers for ChatMessage and ChatAcknowledgement protocols
Processes images and provides analysis using integrated chat protocol

Here's the complete implementation with integrated chat protocol:

agent.py
from datetime import datetime, timezone
from urllib.parse import urlparse
from uuid import uuid4

from uagents import Agent, Context, Protocol
from uagents_core.contrib.protocols.chat import (
    ChatAcknowledgement,
    ChatMessage,
    MetadataContent,
    ResourceContent,
    StartSessionContent,
    TextContent,
    chat_protocol_spec,
)

from image_analysis import get_image_analysis

agent = Agent()
chat_proto = Protocol(spec=chat_protocol_spec)


def create_text_chat(text: str) -> ChatMessage:
    return ChatMessage(
        timestamp=datetime.now(timezone.utc),
        msg_id=uuid4(),
        content=[TextContent(type="text", text=text)],
    )


def create_metadata_chat(metadata: dict[str, str]) -> ChatMessage:
    return ChatMessage(
        timestamp=datetime.now(timezone.utc),
        msg_id=uuid4(),
        content=[MetadataContent(type="metadata", metadata=metadata)],
    )


def extract_image_url(item: ResourceContent) -> str | None:
    resources = item.resource if isinstance(item.resource, list) else [item.resource]
    for resource in resources:
        uri = getattr(resource, "uri", None)
        if isinstance(uri, str):
            parsed = urlparse(uri)
            if parsed.scheme in {"http", "https"} and parsed.netloc:
                return uri

        metadata = getattr(resource, "metadata", None) or {}
        if isinstance(metadata, dict):
            for key in ("url", "uri", "source", "image_url"):
                candidate = metadata.get(key)
                if isinstance(candidate, str):
                    parsed = urlparse(candidate)
                    if parsed.scheme in {"http", "https"} and parsed.netloc:
                        return candidate
    return None


@chat_proto.on_message(ChatMessage)
async def handle_message(ctx: Context, sender: str, msg: ChatMessage):
    ctx.logger.info(f"Got a message from {sender}")
    
    await ctx.send(
        sender,
        ChatAcknowledgement(
            acknowledged_msg_id=msg.msg_id, 
            timestamp=datetime.now(timezone.utc),
        ),
    )

    prompt_content: list[dict[str, str]] = []

    for item in msg.content:
        if isinstance(item, StartSessionContent):
            ctx.logger.info(f"Got a start session message from {sender}")
            await ctx.send(sender, create_metadata_chat({"attachments": "true"}))
        elif isinstance(item, TextContent):
            ctx.logger.info(f"Got text content from {sender}: {item.text}")
            prompt_content.append({"type": "text", "text": item.text})
        elif isinstance(item, ResourceContent):
            ctx.logger.info(f"Got resource content from {sender}")
            image_url = extract_image_url(item)
            if not image_url:
                await ctx.send(
                    sender,
                    create_text_chat(
                        "Attachment URL not found. Please re-upload the image and try again."
                    ),
                )
                return
            ctx.logger.info(f"Using image URL={image_url}")
            prompt_content.append({"type": "resource_url", "url": image_url})

    if not prompt_content:
        await ctx.send(
            sender, create_text_chat("Please send a question and attach an image.")
        )
        return

    try:
        response = get_image_analysis(prompt_content)
        await ctx.send(sender, create_text_chat(response))
    except Exception as err:
        ctx.logger.error(f"Image analysis error: {err}")
        await ctx.send(
            sender,
            create_text_chat("Sorry, I couldn't analyze the image. Please try again later."),
        )


@chat_proto.on_message(ChatAcknowledgement)
async def handle_ack(ctx: Context, sender: str, msg: ChatAcknowledgement):
    ctx.logger.info(f"Got an acknowledgement from {sender} for {msg.acknowledged_msg_id}")


agent.include(chat_proto, publish_manifest=True)

if __name__ == "__main__":
    agent.run()

Key Features:

Integrated Architecture: All chat protocol functionality is contained in the main agent file for simplicity.
Image Processing: Supports both text and image inputs and sends image URLs to OpenAI's multimodal Responses API.
Robust Error Handling: Includes comprehensive error handling for both storage operations and image analysis.
Session Management: Properly tracks chat sessions and handles session initiation with attachment support.
Attachment URL Handling: Extracts valid image URLs from ResourceContent URIs and metadata fields.

Adding a README to your Agent

Go to the Overview section in the Editor.
Click on Edit and add a good description for your Agent so that it can be easily searchable by the ASI1 LLM. Please refer the Importance of Good Readme section for more details.
Make sure the Agent has the right AgentChatProtocol.

Query your Agent

Start your Agent
Open ASI:1 Chat and test the agent directly.

Upload an image and type your query, for example: Please analyze this image.
The agent returns the image analysis result directly in the chat.

ASI:1 Image Analysis Result

Full Example Repository

For a complete working reference (including agent.py, image_analysis.py, setup steps, and dependencies), use this repository:

Image Analysis Agent - Full Example on GitHub

Overview​

Message Flow​

Implementation​

1. Image Analysis Implementation​

2. Image Analysis Agent Setup​

Key Features:​

Adding a README to your Agent​

Query your Agent​

Full Example Repository​