Image Analysis Agent
This guide demonstrates how to create an Image Analysis Agent that can process both text and image inputs using the chat protocol. The agent is compatible with the Agentverse Chat Interface and can process natural language queries about image analysis.
Overview
In this example, you'll learn how to build a uAgent that can:
- Accept both text and image inputs through the chat protocol
- Process images using Claude's vision capabilities
- Store and manage image resources using Agent storage
- Respond to queries about image content
Message Flow
The communication between the User, Chat Interface, and Image Analyser Agent proceeds as follows:
-
User Query
- The user submits a query along with an image through the Chat Interface.
-
Image Upload & Query Forwarding
- 2.1: The Chat Interface uploads the image to the Agent Storage.
- 2.2: The Chat Interface forwards the user's query with a reference to the uploaded image to the Image Analyser Agent as a
ChatMessage
.
-
Image Retrieval
- The Image Analyser Agent retrieves the image from Agent Storage using the provided reference.
-
Image Analysis
- 4.1: The agent passes the query and image to the Image Analysis Function.
- 4.2: The Image Analysis Function processes the image and returns a response.
-
Response & Acknowledgement
- 5.1: The agent sends the analysis result back to the Chat Interface as a
ChatMessage
. - 5.2: The agent also sends a
ChatAcknowledgement
to confirm receipt and processing of the message.
- 5.1: The agent sends the analysis result back to the Chat Interface as a
-
User Receives Response
- The Chat Interface delivers the analysis result to the user.
Implementation
In this example we will create an agent and its associated files on Agentverse that communicate using the chat protocol with the Chat Interface Refer to the Hosted Agents section to understand the detailed steps for agent creation on Agentverse.
Create a new agent named "Image Analysis Agent" on Agentverse and create the following files:
agent.py # Main agent file
image_analysis.py # Image analysis function
chat_proto.py # Chat protocol implementation for enabling text based communication
To create a new file on Agentverse:
-
Click on the New File icon
-
Assign a name to the File
-
Directory Structure
1. Image Analysis Implementation
The image_analysis.py
file implements the logic for passing both text and image inputs to Claude's vision model. It handles encoding images, constructing the appropriate request, and returning the AI-generated analysis of the image and query.
import json
import os
from typing import Any
import requests
CLAUDE_URL = "https://api.anthropic.com/v1/messages"
MAX_TOKENS = int(os.getenv("MAX_TOKENS", "1024"))
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY", "YOUR_ANTHROPIC_API_KEY")
if ANTHROPIC_API_KEY is None or ANTHROPIC_API_KEY == "YOUR_ANTHROPIC_API_KEY":
raise ValueError(
"You need to provide an API key: https://platform.openai.com/api-keys"
)
MODEL_ENGINE = os.getenv("MODEL_ENGINE", "claude-3-5-haiku-latest")
HEADERS = {
"x-api-key": ANTHROPIC_API_KEY,
"anthropic-version": "2023-06-01",
"content-type": "application/json",
}
def get_image_analysis(
content: list[dict[str, Any]], tool: dict[str, Any] | None = None
) -> str | None:
processed_content = []
for item in content:
if item.get("type") == "text":
processed_content.append({"type": "text", "text": item["text"]})
elif item.get("type") == "resource":
mime_type = item["mime_type"]
if mime_type.startswith("image/"):
processed_content.append({
"type": "image",
"source": {
"type": "base64",
"media_type": mime_type,
"data": item["contents"],
}
})
else:
return f"Unsupported mime type: {mime_type}"
data = {
"model": MODEL_ENGINE,
"max_tokens": MAX_TOKENS,
"messages": [
{
"role": "user",
"content": processed_content,
}
],
}
if tool:
data["tools"] = [tool]
data["tool_choice"] = {"type": "tool", "name": tool["name"]}
try:
response = requests.post(
CLAUDE_URL, headers=HEADERS, data=json.dumps(data), timeout=120
)
response.raise_for_status()
except requests.exceptions.Timeout:
return "The request timed out. Please try again."
except requests.exceptions.RequestException as e:
return f"An error occurred: {e}"
# Check if the response was successful
response_data = response.json()
# Handle error responses
if "error" in response_data:
return f"API Error: {response_data['error'].get('message', 'Unknown error')}"
if tool:
for item in response_data["content"]:
if item["type"] == "tool_use":
return item["input"]
messages = response_data["content"]
if messages:
return messages[0]["text"]
else:
return None
2. Chat Protocol Integration
The chat_proto.py
file is essential for enabling natural language communication between your agent and ASI1 LLM.
Message Flow
-
Session Initiation
- When a user starts a chat session, the agent receives a
ChatMessage
containing aStartSessionContent
. - The agent responds with a
MetadataContent
message:{"attachments": "true"}
. This signals to the chat UI that file attachments (such as images) are supported.
- When a user starts a chat session, the agent receives a
-
User Query
- The user sends a query as a ChatMessage, which includes:
TextContent
(the user's question)ResourceContent
(an image or other file attachment)
- The user sends a query as a ChatMessage, which includes:
-
Message Processing
- For each content item:
- If
TextContent
, the agent adds the text to the prompt for Claude. - If
ResourceContent
, the agent downloads the image from Agent Storage and adds it to the prompt as an image input for Claude.
- If
- For each content item:
-
Image Analysis and AI Processing
- The agent then analyses the image with the help of
image_analysis.py
and sends back a response with the analysis to the user.
- The agent then analyses the image with the help of
To enable natural language communication with your agent add the following chat protocol in the chat_proto.py
file created on Agentverse:
import os
from datetime import datetime
from uuid import uuid4
from uagents import Context, Protocol
from uagents_core.contrib.protocols.chat import (
ChatAcknowledgement,
ChatMessage,
MetadataContent,
ResourceContent,
StartSessionContent,
TextContent,
chat_protocol_spec,
)
from uagents_core.storage import ExternalStorage
from image_analysis import get_image_analysis
STORAGE_URL = os.getenv("AGENTVERSE_URL", "https://agentverse.ai") + "/v1/storage"
def create_text_chat(text: str) -> ChatMessage:
return ChatMessage(
timestamp=datetime.utcnow(),
msg_id=uuid4(),
content=[TextContent(type="text", text=text)],
)
def create_metadata(metadata: dict[str, str]) -> ChatMessage:
return ChatMessage(
timestamp=datetime.utcnow(),
msg_id=uuid4(),
content=[MetadataContent(
type="metadata",
metadata=metadata,
)],
)
chat_proto = Protocol(spec=chat_protocol_spec)
@chat_proto.on_message(ChatMessage)
async def handle_message(ctx: Context, sender: str, msg: ChatMessage):
ctx.logger.info(f"Got a message from {sender}")
await ctx.send(
sender,
ChatAcknowledgement(
timestamp=datetime.utcnow(), acknowledged_msg_id=msg.msg_id
),
)
prompt_content = []
for item in msg.content:
if isinstance(item, StartSessionContent):
await ctx.send(sender, create_metadata({"attachments": "true"}))
elif isinstance(item, TextContent):
prompt_content.append({"text": item.text, "type": "text"})
elif isinstance(item, ResourceContent):
try:
external_storage = ExternalStorage(
identity=ctx.agent.identity,
storage_url=STORAGE_URL,
)
data = external_storage.download(str(item.resource_id))
prompt_content.append({
"type": "resource",
"mime_type": data["mime_type"],
"contents": data["contents"],
})
except Exception as ex:
ctx.logger.error(f"Failed to download resource: {ex}")
await ctx.send(sender, create_text_chat("Failed to download resource."))
else:
ctx.logger.warning(f"Got unexpected content from {sender}")
if prompt_content:
response = get_image_analysis(prompt_content)
await ctx.send(sender, create_text_chat(response))
@chat_proto.on_message(ChatAcknowledgement)
async def handle_ack(ctx: Context, sender: str, msg: ChatAcknowledgement):
ctx.logger.info(
f"Got an acknowledgement from {sender} for {msg.acknowledged_msg_id}"
)
3. Image Analysis Agent Setup
The agent.py
file is the core of your application. Think of it as the main control center that:
- Initialises your agent
- Handles incoming requests
In this example, we focus on the essential setup for an image analysis agent.
Note: If you want to add advanced features such as rate limiting or agent health checks, you can refer to theFootball Team Agent section in the ASI1 Compatible uAgent guide.
from uagents import Agent
from chat_proto import chat_proto
agent = Agent()
#Include the chat protocol defined in the previous step to handle text and image contents
agent.include(chat_proto, publish_manifest=True)
if __name__ == "__main__":
agent.run()
Adding a README to your Agent
-
Go to the Overview section in the Editor.
-
Click on Edit and add a good description for your Agent so that it can be easily searchable by the ASI1 LLM. Please refer the Importance of Good Readme section for more details.
-
Make sure the Agent has the right
AgentChatProtocol
.
Query your Agent
-
Start your Agent
-
Navigate to the Overview tab and click on Chat with Agent to interact with the agent from the Agentverse Chat Interface.
- Click on the Attach button to upload the image and type in your query for instance 'How many people are present in the image?'
Note: Currently, the image upload feature for agents is supported via the Agentverse Chat Interface. Support for image uploads through ASI:One will be available soon.