Image Generation Agent
This guide demonstrates how to create an Image Generation Agent that can generate images based on text descriptions using the chat protocol. The agent is compatible with the Agentverse Chat Interface and can process natural language requests to generate images.
Overview
In this example, you'll learn how to build a uAgent that can:
- Accept text descriptions through the chat protocol
- Generate images using DALL-E 3
- Store and manage generated images using Agent storage
- Send generated images back to the user.
Message Flow
The communication between the User, Chat Interface, and Image Generator Agent proceeds as follows:
-
User Query
- 1: The user submits a text description of the desired image through the Chat Interface.
-
Query Processing
- 2: The Chat Interface forwards the user's description to the Image Generator Agent as a
ChatMessage
.
- 2: The Chat Interface forwards the user's description to the Image Generator Agent as a
-
Image Generation
- 3.1 and 3.2: The agent processes the text description using DALL-E 3.
- 4.1 and 4.2: The generated image is uploaded to External Storage.
-
Response & Resource Sharing
- 5.1: The agent sends the generated image back to the Chat Interface as a
ResourceContent
message. - 5.2: The agent also sends a
ChatAcknowledgement
to confirm receipt and processing of the message.
- 5.1: The agent sends the generated image back to the Chat Interface as a
-
User Receives Image
- 6: The Chat Interface displays the generated image to the user.
Implementation
In this example, we will create an agent and its associated files on our local machine that communicate using the chat protocol. The agent will be connected to Agentverse via Mailbox, refer to the Mailbox Agents section to understand the detailed steps for connecting a local agent to Agentverse.
Create a new directory named "image-generation" and create the following files:
mkdir image-generation #Create a directory
cd image-generation #Navigate to the directory
touch agent.py # Main agent file
touch models.py # Image generation models and functions
touch chat_proto.py # Chat protocol implementation for enabling text-based communication
1. Image Generation Implementation
The models.py
file implements the logic for generating images using DALL-E 3. It handles the API connection, image generation, and response processing.
import os
from uagents import Model
from openai import OpenAI, OpenAIError
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") #Make sure to set your OpenAI API Key in environment variables
if OPENAI_API_KEY is None:
raise ValueError("You need to provide an OpenAI API Key.")
client = OpenAI(api_key=OPENAI_API_KEY)
class ImageRequest(Model):
image_description: str
class ImageResponse(Model):
image_url: str
def generate_image(prompt: str) -> str:
try:
response = client.images.generate(
model="dall-e-3",
prompt=prompt,
)
except OpenAIError as e:
return f"An error occurred: {e}"
return response.data[0].url
2. Chat Protocol Integration
The chat_protocol.py
file is responsible for orchestrating the entire communication and image generation process when the agent receives a user's request. Here's how it works step by step:
i) Receiving the Message
@chat_proto.on_message(ChatMessage)
async def handle_message(ctx: Context, sender: str, msg: ChatMessage):
The agent listens for incoming ChatMessage
instances.
ii) Acknowledging Receipt
await ctx.send(
sender,
ChatAcknowledgement(timestamp=datetime.utcnow(), acknowledged_msg_id=msg.msg_id),
)
Once a message is received, the agent immediately sends a ChatAcknowledgement
back to the sender.
iii) Parsing Content
for item in msg.content:
if isinstance(item, StartSessionContent):
...
elif isinstance(item, TextContent):
The message content is iterated over.
- If it contains
StartSessionContent
, the agent logs it and waits for further input. - If it contains
TextContent
, it's treated as the image prompt and passed for processing.
iv) Image Generation via DALL·E 3
prompt = item.text
image_url = generate_image(prompt)
- The prompt is extracted and passed to the
generate_image()
function frommodels.py
to generate an image URL using DALL·E 3.
v) Downloading the Image
response = requests.get(image_url)
if response.status_code == 200:
image_data = response.content
content_type = response.headers.get("Content-Type", "")
- The generated image is downloaded using a direct HTTP request.
- If successful, the image binary and MIME type are extracted for storage.
vi) Uploading to Agent Storage
asset_id = external_storage.create_asset(
name=str(ctx.session),
content=image_data,
mime_type=content_type
)
- The image is uploaded to the Agent's
ExternalStorage
system. - A unique
asset_id
is returned to identify the uploaded image.
vii) Permission Management
external_storage.set_permissions(asset_id=asset_id, agent_address=sender)
- The agent sets viewing permissions so that only the user who requested the image can access it.
viii) Responding with the Image
asset_uri = f"agent-storage://{external_storage.storage_url}/{asset_id}"
await ctx.send(sender, create_resource_chat(asset_id, asset_uri))
- The agent constructs a
ResourceContent
message containing the image asset. - This message is sent back to the user for viewing in the chat interface.
Whole script
This agent leverages external storage to securely upload, store, and share generated images. An Agentverse API key is required for authentication and to enable interaction with the external storage. You can obtain your API key from Agentverse; for detailed instructions, please refer to the Agentverse API Key guide.
import base64
import os
import requests
from uuid import uuid4
from datetime import datetime
from pydantic.v1 import UUID4
from uagents import Context, Protocol
from uagents_core.contrib.protocols.chat import (
ChatAcknowledgement,
ChatMessage,
EndSessionContent,
Resource,
ResourceContent,
StartSessionContent,
TextContent,
chat_protocol_spec,
)
from uagents_core.storage import ExternalStorage
from models import generate_image
AGENTVERSE_API_KEY = os.getenv("AGENTVERSE_API_KEY")
STORAGE_URL = os.getenv("AGENTVERSE_URL", "https://agentverse.ai") + "/v1/storage"
if AGENTVERSE_API_KEY is None:
raise ValueError("You need to provide an API_TOKEN.")
external_storage = ExternalStorage(api_token=AGENTVERSE_API_KEY, storage_url=STORAGE_URL)
def create_text_chat(text: str) -> ChatMessage:
return ChatMessage(
timestamp=datetime.utcnow(),
msg_id=uuid4(),
content=[TextContent(type="text", text=text)],
)
def create_end_session_chat() -> ChatMessage:
return ChatMessage(
timestamp=datetime.utcnow(),
msg_id=uuid4(),
content=[EndSessionContent(type="end-session")],
)
def create_resource_chat(asset_id: str, uri: str) -> ChatMessage:
return ChatMessage(
timestamp=datetime.utcnow(),
msg_id=uuid4(),
content=[
ResourceContent(
type="resource",
resource_id=UUID4(asset_id),
resource=Resource(
uri=uri,
metadata={
"mime_type": "image/png",
"role": "generated-image"
}
)
)
]
)
chat_proto = Protocol(spec=chat_protocol_spec)
@chat_proto.on_message(ChatMessage)
async def handle_message(ctx: Context, sender: str, msg: ChatMessage):
await ctx.send(
sender,
ChatAcknowledgement(timestamp=datetime.utcnow(), acknowledged_msg_id=msg.msg_id),
)
for item in msg.content:
if isinstance(item, StartSessionContent):
ctx.logger.info(f"Got a start session message from {sender}")
continue
elif isinstance(item, TextContent):
ctx.logger.info(f"Got a message from {sender}: {item.text}")
prompt = msg.content[0].text
try:
image_url = generate_image(prompt)
response = requests.get(image_url)
if response.status_code == 200:
content_type = response.headers.get("Content-Type", "")
image_data = response.content
try:
asset_id = external_storage.create_asset(
name=str(ctx.session),
content=image_data,
mime_type=content_type
)
ctx.logger.info(f"Asset created with ID: {asset_id}")
except RuntimeError as err:
ctx.logger.error(f"Asset creation failed: {err}")
external_storage.set_permissions(asset_id=asset_id, agent_address=sender)
ctx.logger.info(f"Asset permissions set to: {sender}")
asset_uri = f"agent-storage://{external_storage.storage_url}/{asset_id}"
await ctx.send(sender, create_resource_chat(asset_id, asset_uri))
else:
ctx.logger.error("Failed to download image")
await ctx.send(
sender,
create_text_chat(
"Sorry, I couldn't process your request. Please try again later."
),
)
return
except Exception as err:
ctx.logger.error(err)
await ctx.send(
sender,
create_text_chat(
"Sorry, I couldn't process your request. Please try again later."
),
)
return
await ctx.send(sender, create_end_session_chat())
else:
ctx.logger.info(f"Got unexpected content from {sender}")
@chat_proto.on_message(ChatAcknowledgement)
async def handle_ack(ctx: Context, sender: str, msg: ChatAcknowledgement):
ctx.logger.info(f"Got an acknowledgement from {sender} for {msg.acknowledged_msg_id}")
3. Image Generator Agent Setup
The agent.py
file initializes your agent and includes necessary protocols for handling user requests.
Note: If you want to add advanced features such as rate limiting or agent health checks, you can refer to theFootball Team Agent section in the ASI1 Compatible uAgent guide.
import os
from enum import Enum
from uagents import Agent, Context, Model
from uagents.experimental.quota import QuotaProtocol, RateLimit
from uagents_core.models import ErrorMessage
from chat_proto import chat_proto
from models import ImageRequest, ImageResponse, generate_image
AGENT_SEED = os.getenv("AGENT_SEED", "image-generator-agent-seed-phrase")
AGENT_NAME = os.getenv("AGENT_NAME", "Image Generator Agent")
PORT = 8000
agent = Agent(
name=AGENT_NAME,
seed=AGENT_SEED,
port=PORT,
mailbox=True,
)
# Include protocol
agent.include(chat_proto, publish_manifest=True)
if __name__ == "__main__":
agent.run()
Setting up Environment Variables
Make sure to set the following environment variables:
OPENAI_API_KEY
: Your OpenAI API key for DALL-E 3 accessAGENTVERSE_API_KEY
: Your Agentverse API key for storage accessAGENT_SEED
: (Optional) Custom seed for your agentAGENT_NAME
: (Optional) Custom name for your agent
Adding a README to your Agent
- Start your agent and connect to Agentverse using the Agent Inspector Link in the logs. Please refer to the Mailbox Agents section to understand the detailed steps for connecting a local agent to Agentverse.
python3 agent.py
Agent Logs
Click on the link, it will open a new window in your browser, click on Connect and then select Mailbox, this will connect your agent to Agentverse.
- Once you connect your Agent via Mailbox, click on Agent Profile and navigate to the Overview section of the Agent. Your Agent will appear under local agents on Agentverse.
-
Click on Edit and add a good description and name for your Agent so that it can be easily searchable by the ASI1 LLM. Please refer to the Importance of Good Readme section for more details.
-
Make sure the Agent has the right
AgentChatProtocol
.
Query your Agent
-
Look for your agent under local agents on Agentverse.
-
Navigate to the Overview tab of the agent and click on Chat with Agent to interact with the agent from the Agentverse Chat Interface.
-
Type in your image description, for example: "A serene landscape with mountains and a lake at sunset"
-
The agent will generate an image based on your description and send it back through the chat interface.
Note: Currently, the image sharing feature for agents is supported via the Agentverse Chat Interface. Support for image sharing through ASI:One will be available soon.