Version: 1.0.3

Image Analysis Agent

This guide demonstrates how to create an Image Analysis Agent that can process both text and image inputs using the chat protocol. The agent is compatible with the Agentverse Chat Interface and can process natural language queries about image analysis.

Overview

In this example, you'll learn how to build a uAgent that can:

Accept both text and image inputs through the chat protocol
Process images using Claude's vision capabilities
Store and manage image resources using Agent storage
Respond to queries about image content

Message Flow

The communication between the User, Chat Interface, and Image Analyser Agent proceeds as follows:

User Query
- The user submits a query along with an image through the Chat Interface.
Image Upload & Query Forwarding
- 2.1: The Chat Interface uploads the image to the Agent Storage.
- 2.2: The Chat Interface forwards the user's query with a reference to the uploaded image to the Image Analyser Agent as a ChatMessage.
Image Retrieval
- The Image Analyser Agent retrieves the image from Agent Storage using the provided reference.
Image Analysis
- 4.1: The agent passes the query and image to the Image Analysis Function.
- 4.2: The Image Analysis Function processes the image and returns a response.
Response & Acknowledgement
- 5.1: The agent sends the analysis result back to the Chat Interface as a ChatMessage.
- 5.2: The agent also sends a ChatAcknowledgement to confirm receipt and processing of the message.
User Receives Response
- The Chat Interface delivers the analysis result to the user.

ASI Chat Protocol Flow

Implementation

In this example we will create an agent and its associated files on Agentverse that communicate using the chat protocol with the Chat Interface Refer to the Hosted Agents section to understand the detailed steps for agent creation on Agentverse.

Create a new agent named "Image Analysis Agent" on Agentverse and create the following files:

agent.py            # Main agent file 
image_analysis.py   # Image analysis function
chat_proto.py       # Chat protocol implementation for enabling text based communication 

To create a new file on Agentverse:

Click on the New File icon
Assign a name to the File
Directory Structure

1. Image Analysis Implementation

The image_analysis.py file implements the logic for passing both text and image inputs to Claude's vision model. It handles encoding images, constructing the appropriate request, and returning the AI-generated analysis of the image and query.

image_analysis.py
import json
import os
from typing import Any

import requests

CLAUDE_URL = "https://api.anthropic.com/v1/messages"
MAX_TOKENS = int(os.getenv("MAX_TOKENS", "1024"))
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY", "YOUR_ANTHROPIC_API_KEY")
if ANTHROPIC_API_KEY is None or ANTHROPIC_API_KEY == "YOUR_ANTHROPIC_API_KEY":
    raise ValueError(
        "You need to provide an API key: https://platform.openai.com/api-keys"
    )
MODEL_ENGINE = os.getenv("MODEL_ENGINE", "claude-3-5-haiku-latest")
HEADERS = {
    "x-api-key": ANTHROPIC_API_KEY,
    "anthropic-version": "2023-06-01",
    "content-type": "application/json",
}

def get_image_analysis(
    content: list[dict[str, Any]], tool: dict[str, Any] | None = None
) -> str | None:

    processed_content = []

    for item in content:
        if item.get("type") == "text":
            processed_content.append({"type": "text", "text": item["text"]})
        elif item.get("type") == "resource":
            mime_type = item["mime_type"]
            if mime_type.startswith("image/"):
                processed_content.append({
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": mime_type,
                        "data": item["contents"],
                    }
                })
            else:
                return f"Unsupported mime type: {mime_type}"

    data = {
        "model": MODEL_ENGINE,
        "max_tokens": MAX_TOKENS,
        "messages": [
            {
                "role": "user",
                "content": processed_content,
            }
        ],
    }

    if tool:
        data["tools"] = [tool]
        data["tool_choice"] = {"type": "tool", "name": tool["name"]}

    try:
        response = requests.post(
            CLAUDE_URL, headers=HEADERS, data=json.dumps(data), timeout=120
        )
        response.raise_for_status()
    except requests.exceptions.Timeout:
        return "The request timed out. Please try again."
    except requests.exceptions.RequestException as e:
        return f"An error occurred: {e}"

    # Check if the response was successful
    response_data = response.json()

    # Handle error responses
    if "error" in response_data:
        return f"API Error: {response_data['error'].get('message', 'Unknown error')}"

    if tool:
        for item in response_data["content"]:
            if item["type"] == "tool_use":
                return item["input"]
            
    messages = response_data["content"]

    if messages:
        return messages[0]["text"]
    else:
        return None

2. Chat Protocol Integration

The chat_proto.py file is essential for enabling natural language communication between your agent and ASI1 LLM.

Message Flow

Session Initiation
- When a user starts a chat session, the agent receives a ChatMessage containing a StartSessionContent.
- The agent responds with a MetadataContent message: {"attachments": "true"}. This signals to the chat UI that file attachments (such as images) are supported.
User Query
- The user sends a query as a ChatMessage, which includes:
  - TextContent (the user's question)
  - ResourceContent (an image or other file attachment)
Message Processing
- For each content item:
  - If TextContent, the agent adds the text to the prompt for Claude.
  - If ResourceContent, the agent downloads the image from Agent Storage and adds it to the prompt as an image input for Claude.
Image Analysis and AI Processing
- The agent then analyses the image with the help of image_analysis.py and sends back a response with the analysis to the user.

To enable natural language communication with your agent add the following chat protocol in the chat_proto.py file created on Agentverse:

chat_proto.py
import os
from datetime import datetime
from uuid import uuid4

from uagents import Context, Protocol
from uagents_core.contrib.protocols.chat import (
    ChatAcknowledgement,
    ChatMessage,
    MetadataContent,
    ResourceContent,
    StartSessionContent,
    TextContent,
    chat_protocol_spec,
)
from uagents_core.storage import ExternalStorage

from image_analysis import get_image_analysis

STORAGE_URL = os.getenv("AGENTVERSE_URL", "https://agentverse.ai") + "/v1/storage"


def create_text_chat(text: str) -> ChatMessage:
    return ChatMessage(
        timestamp=datetime.utcnow(),
        msg_id=uuid4(),
        content=[TextContent(type="text", text=text)],
    )


def create_metadata(metadata: dict[str, str]) -> ChatMessage:
    return ChatMessage(
        timestamp=datetime.utcnow(),
        msg_id=uuid4(),
        content=[MetadataContent(
            type="metadata",
            metadata=metadata,
        )],
    )


chat_proto = Protocol(spec=chat_protocol_spec)


@chat_proto.on_message(ChatMessage)
async def handle_message(ctx: Context, sender: str, msg: ChatMessage):
    ctx.logger.info(f"Got a message from {sender}")
    await ctx.send(
        sender,
        ChatAcknowledgement(
            timestamp=datetime.utcnow(), acknowledged_msg_id=msg.msg_id
        ),
    )

    prompt_content = []
    for item in msg.content:
        if isinstance(item, StartSessionContent):
            await ctx.send(sender, create_metadata({"attachments": "true"}))
        elif isinstance(item, TextContent):
            prompt_content.append({"text": item.text, "type": "text"})
        elif isinstance(item, ResourceContent):
            try:
                external_storage = ExternalStorage(
                    identity=ctx.agent.identity,
                    storage_url=STORAGE_URL,
                )
                data = external_storage.download(str(item.resource_id))
                prompt_content.append({
                    "type": "resource",
                    "mime_type": data["mime_type"],
                    "contents": data["contents"],
                })

            except Exception as ex:
                ctx.logger.error(f"Failed to download resource: {ex}")
                await ctx.send(sender, create_text_chat("Failed to download resource."))
        else:
            ctx.logger.warning(f"Got unexpected content from {sender}")

    if prompt_content:
        response = get_image_analysis(prompt_content)
        await ctx.send(sender, create_text_chat(response))


@chat_proto.on_message(ChatAcknowledgement)
async def handle_ack(ctx: Context, sender: str, msg: ChatAcknowledgement):
    ctx.logger.info(
        f"Got an acknowledgement from {sender} for {msg.acknowledged_msg_id}"
    )

3. Image Analysis Agent Setup

The agent.py file is the core of your application. Think of it as the main control center that:

Initialises your agent
Handles incoming requests

In this example, we focus on the essential setup for an image analysis agent.

Note: If you want to add advanced features such as rate limiting or agent health checks, you can refer to theFootball Team Agent section in the ASI1 Compatible uAgent guide.

agent.py
from uagents import Agent
from chat_proto import chat_proto

agent = Agent()

#Include the chat protocol defined in the previous step to handle text and image contents
agent.include(chat_proto, publish_manifest=True)

if __name__ == "__main__":
    agent.run()

Adding a README to your Agent

Go to the Overview section in the Editor.
Click on Edit and add a good description for your Agent so that it can be easily searchable by the ASI1 LLM. Please refer the Importance of Good Readme section for more details.
Make sure the Agent has the right AgentChatProtocol.

Query your Agent

Start your Agent
Navigate to the Overview tab and click on Chat with Agent to interact with the agent from the Agentverse Chat Interface.

Agentverse Chat Interface

Click on the Attach button to upload the image and type in your query for instance 'How many people are present in the image?'

Attach File Chat

Note: Currently, the image upload feature for agents is supported via the Agentverse Chat Interface. Support for image uploads through ASI:One will be available soon.

Overview​

Message Flow​

Implementation​

1. Image Analysis Implementation​

2. Chat Protocol Integration​

Message Flow​

3. Image Analysis Agent Setup​

Adding a README to your Agent​

Query your Agent​

Overview

Message Flow

Implementation

1. Image Analysis Implementation

2. Chat Protocol Integration

Message Flow

3. Image Analysis Agent Setup

Adding a README to your Agent

Query your Agent