AWS Bedrock Converse API
Official Documentation: AWS Bedrock Converse API
Last Updated: January 2026
Table of Contents
- Overview
- Authentication & Setup
- Message Format
- Request Structure
- Response Structure
- Streaming with ConverseStream
- Tool Use (Function Calling)
- Prompt Caching
- Python Implementation
- Error Handling
1. Overview
The AWS Bedrock Converse API provides a unified interface for conversational AI that works consistently across all supported foundation models.
Why Use Converse API?
- Single codebase works with multiple models (Claude, Nova, Llama, Mistral, etc.)
- Consistent message format across all models
- Built-in tool calling (function calling) support
- Streaming responses for real-time interactions
- Prompt caching to reduce costs and latency
- Guardrails integration for content safety
Supported Models
| Provider | Models |
|----------|--------|
| Anthropic | Claude Sonnet 4.5, Claude Opus 4.5, Claude Haiku 4.5 |
| Amazon | Nova Premier, Nova Pro, Nova Lite |
| Meta | Llama 3.x models |
| Mistral | Mistral Large, Mistral models |
| Cohere | Command R+, Command models |
2. Authentication & Setup
Required Permissions
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "arn:aws:bedrock:*::foundation-model/*"
}
]
}
Python Client Setup with Boto3
import boto3
from botocore.config import Config
# Method 1: Default credentials (from ~/.aws/credentials or environment)
client = boto3.client(
service_name='bedrock-runtime',
region_name='us-east-1'
)
# Method 2: Explicit credentials
client = boto3.client(
service_name='bedrock-runtime',
region_name='us-east-1',
aws_access_key_id='YOUR_ACCESS_KEY',
aws_secret_access_key='YOUR_SECRET_KEY',
aws_session_token='YOUR_SESSION_TOKEN' # Optional
)
# Method 3: Using AWS profile
session = boto3.Session(profile_name='your-profile')
client = session.client(
service_name='bedrock-runtime',
region_name='us-east-1'
)
# Method 4: With custom configuration
config = Config(
region_name='us-east-1',
user_agent_extra='shello-cli/1.0',
read_timeout=300 # Important for streaming
)
client = boto3.client(
service_name='bedrock-runtime',
config=config
)
Supported AWS Regions
BEDROCK_REGIONS = [
'us-east-1', 'us-east-2', 'us-west-1', 'us-west-2',
'ap-south-1', 'ap-northeast-1', 'ap-northeast-2', 'ap-southeast-1', 'ap-southeast-2',
'ca-central-1',
'eu-central-1', 'eu-west-1', 'eu-west-2', 'eu-west-3',
'sa-east-1'
]
3. Message Format
Message Structure
Messages follow a consistent format across all models:
messages = [
{
"role": "user", # or "assistant"
"content": [
{"text": "Your message here"}
]
}
]
Content Types
1. Text Content
{
"role": "user",
"content": [
{"text": "Hello, how are you?"}
]
}
2. Image Content
import base64
# Read image file
with open('image.jpg', 'rb') as f:
image_bytes = f.read()
{
"role": "user",
"content": [
{"text": "What's in this image?"},
{
"image": {
"format": "jpeg", # or "png", "gif", "webp"
"source": {
"bytes": image_bytes
}
}
}
]
}
3. Mixed Content (Text + Images)
{
"role": "user",
"content": [
{"text": "Compare these two images:"},
{
"image": {
"format": "jpeg",
"source": {"bytes": image1_bytes}
}
},
{
"image": {
"format": "jpeg",
"source": {"bytes": image2_bytes}
}
},
{"text": "What are the differences?"}
]
}
Multi-Turn Conversations
messages = [
{
"role": "user",
"content": [{"text": "Create a list of 3 pop songs."}]
},
{
"role": "assistant",
"content": [{
"text": "1. 'As It Was' by Harry Styles\n2. 'Easy On Me' by Adele\n3. 'Unholy' by Sam Smith"
}]
},
{
"role": "user",
"content": [{"text": "Make them all by UK artists."}]
}
]
4. Request Structure
Complete Request Example
response = client.converse(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
messages=messages,
system=[
{"text": "You are a helpful AI assistant."}
],
inferenceConfig={
'maxTokens': 2048,
'temperature': 0.7,
'topP': 0.9,
'stopSequences': ['END', 'STOP']
},
additionalModelRequestFields={
'top_k': 250 # Model-specific parameters
},
toolConfig={
'tools': [...], # Tool definitions
'toolChoice': {'auto': {}}
},
guardrailConfig={
'guardrailIdentifier': 'your-guardrail-id',
'guardrailVersion': '1'
}
)
Request Fields
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| modelId | string | ✅ Yes | Model identifier or ARN |
| messages | array | ✅ Yes | Conversation messages |
| system | array | ❌ No | System prompts/instructions |
| inferenceConfig | object | ❌ No | Inference parameters |
| toolConfig | object | ❌ No | Tool definitions |
| additionalModelRequestFields | object | ❌ No | Model-specific parameters |
| guardrailConfig | object | ❌ No | Guardrail configuration |
| additionalModelResponseFieldPaths | array | ❌ No | Additional response fields |
Model IDs
# Anthropic Claude models
'anthropic.claude-3-5-sonnet-20241022-v2:0'
'anthropic.claude-3-5-haiku-20241022-v1:0'
'anthropic.claude-3-opus-20240229-v1:0'
# Amazon Nova models
'amazon.nova-premier-v1:0'
'amazon.nova-pro-v1:0'
'amazon.nova-lite-v1:0'
# Cross-region inference profiles
'us.anthropic.claude-3-5-sonnet-20241022-v2:0'
'eu.anthropic.claude-3-5-sonnet-20241022-v2:0'
Inference Configuration
inferenceConfig = {
'maxTokens': 2048, # Maximum tokens to generate (required for most models)
'temperature': 0.7, # 0.0 to 1.0 (higher = more creative)
'topP': 0.9, # Nucleus sampling (0.0 to 1.0)
'stopSequences': ['END'] # Stop generation at these sequences
}
System Prompts
system = [
{
"text": "You are an expert Python developer. "
"Provide clear, concise code examples."
}
]
5. Response Structure
Converse Response
{
'output': {
'message': {
'role': 'assistant',
'content': [
{'text': 'Generated response text here...'}
]
}
},
'stopReason': 'end_turn', # or 'max_tokens', 'stop_sequence', 'tool_use'
'usage': {
'inputTokens': 125,
'outputTokens': 60,
'totalTokens': 185,
'cacheReadInputTokens': 0, # Prompt caching
'cacheWriteInputTokens': 0 # Prompt caching
},
'metrics': {
'latencyMs': 1175
}
}
Stop Reasons
| Stop Reason | Description |
|-------------|-------------|
| end_turn | Model completed its response naturally |
| max_tokens | Reached maximum token limit |
| stop_sequence | Hit a stop sequence |
| tool_use | Model wants to use a tool |
| content_filtered | Content blocked by guardrails |
Extracting Response Text
response = client.converse(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
messages=messages
)
# Extract the text
response_text = response['output']['message']['content'][0]['text']
# Get token usage
input_tokens = response['usage']['inputTokens']
output_tokens = response['usage']['outputTokens']
total_cost = calculate_cost(input_tokens, output_tokens)
6. Streaming with ConverseStream
Basic Streaming Example
response = client.converse_stream(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
messages=messages,
inferenceConfig={'maxTokens': 2048, 'temperature': 0.7}
)
# Process the stream
for event in response['stream']:
if 'messageStart' in event:
print(f"Role: {event['messageStart']['role']}")
elif 'contentBlockDelta' in event:
delta = event['contentBlockDelta']['delta']
if 'text' in delta:
print(delta['text'], end='', flush=True)
elif 'messageStop' in event:
stop_reason = event['messageStop']['stopReason']
print(f"\nStop reason: {stop_reason}")
elif 'metadata' in event:
usage = event['metadata']['usage']
print(f"\nTokens used: {usage['totalTokens']}")
Stream Event Types
| Event | Description | Fields |
|-------|-------------|--------|
| messageStart | Start of message | role |
| contentBlockStart | Start of content block | contentBlockIndex, start |
| contentBlockDelta | Content chunk | contentBlockIndex, delta |
| contentBlockStop | End of content block | contentBlockIndex |
| messageStop | End of message | stopReason |
| metadata | Usage and metrics | usage, metrics |
Complete Streaming Implementation
def stream_response(client, model_id, messages):
"""Stream response from Bedrock with proper event handling."""
response = client.converse_stream(
modelId=model_id,
messages=messages,
inferenceConfig={'maxTokens': 2048, 'temperature': 0}
)
full_text = ""
input_tokens = 0
output_tokens = 0
try:
for event in response['stream']:
# Handle content deltas
if 'contentBlockDelta' in event:
delta = event['contentBlockDelta']['delta']
if 'text' in delta:
text_chunk = delta['text']
full_text += text_chunk
yield {'type': 'text', 'text': text_chunk}
# Handle metadata (token usage)
elif 'metadata' in event:
usage = event['metadata']['usage']
input_tokens = usage.get('inputTokens', 0)
output_tokens = usage.get('outputTokens', 0)
yield {
'type': 'usage',
'inputTokens': input_tokens,
'outputTokens': output_tokens,
'totalTokens': usage.get('totalTokens', 0)
}
# Handle stop event
elif 'messageStop' in event:
yield {
'type': 'stop',
'stopReason': event['messageStop']['stopReason']
}
except Exception as e:
yield {'type': 'error', 'error': str(e)}
return full_text
# Usage
for chunk in stream_response(client, model_id, messages):
if chunk['type'] == 'text':
print(chunk['text'], end='', flush=True)
elif chunk['type'] == 'usage':
print(f"\n\nTokens: {chunk['totalTokens']}")
7. Tool Use (Function Calling)
Tool Definition Format
tools = [
{
"toolSpec": {
"name": "get_weather",
"description": "Get the current weather for a location",
"inputSchema": {
"json": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g., 'San Francisco'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
}
}
]
Tool Configuration
toolConfig = {
"tools": tools,
"toolChoice": {
"auto": {} # Let model decide when to use tools
# OR
# "any": {} # Force model to use a tool
# OR
# "tool": {"name": "get_weather"} # Force specific tool
}
}
Complete Tool Use Flow
def handle_tool_use(client, model_id, messages, tools):
"""Handle tool use with Bedrock Converse API."""
# Initial request with tools
response = client.converse(
modelId=model_id,
messages=messages,
toolConfig={"tools": tools, "toolChoice": {"auto": {}}}
)
# Check if model wants to use a tool
stop_reason = response['stopReason']
if stop_reason == 'tool_use':
# Extract tool use requests
content = response['output']['message']['content']
for block in content:
if 'toolUse' in block:
tool_use = block['toolUse']
tool_name = tool_use['name']
tool_input = tool_use['input']
tool_use_id = tool_use['toolUseId']
print(f"Model wants to use tool: {tool_name}")
print(f"Tool input: {tool_input}")
# Execute the tool
tool_result = execute_tool(tool_name, tool_input)
# Add assistant message with tool use
messages.append({
"role": "assistant",
"content": [block]
})
# Add tool result
messages.append({
"role": "user",
"content": [
{
"toolResult": {
"toolUseId": tool_use_id,
"content": [
{"text": json.dumps(tool_result)}
]
}
}
]
})
# Continue conversation with tool result
return handle_tool_use(client, model_id, messages, tools)
# Return final response
return response['output']['message']['content'][0]['text']
def execute_tool(tool_name, tool_input):
"""Execute the actual tool function."""
if tool_name == "get_weather":
location = tool_input['location']
unit = tool_input.get('unit', 'celsius')
# Call actual weather API here
return {
"location": location,
"temperature": 22,
"unit": unit,
"condition": "sunny"
}
return {"error": "Unknown tool"}
Tool Result Format
{
"role": "user",
"content": [
{
"toolResult": {
"toolUseId": "tool_use_id_from_model",
"content": [
{"text": '{"temperature": 22, "condition": "sunny"}'}
],
"status": "success" # or "error"
}
}
]
}
8. Prompt Caching
Prompt caching reduces costs and latency by caching parts of your prompt that don’t change between requests.
How It Works
- Add
cachePointobjects to mark cacheable content - First request: Writes to cache (charged at cache write rate)
- Subsequent requests: Reads from cache (charged at lower cache read rate)
- Cache TTL: 5 minutes
Adding Cache Points
# System prompt with cache point
system = [
{"text": "You are a helpful AI assistant with extensive knowledge."},
{"cachePoint": {"type": "default"}}
]
# Messages with cache points
messages = [
{
"role": "user",
"content": [
{"text": "Here is a long document to analyze:\n\n" + long_document},
{"cachePoint": {"type": "default"}}
]
},
{
"role": "assistant",
"content": [{"text": "I've read the document. What would you like to know?"}]
},
{
"role": "user",
"content": [
{"text": "What are the main themes?"},
{"cachePoint": {"type": "default"}}
]
}
]
Cache Strategy
Best practices for cache points:
- System prompts - Always cache if they don’t change
- Long context - Cache large documents, code bases, etc.
- Conversation history - Cache earlier turns in long conversations
- Last 2 user messages - Common pattern for multi-turn chats
def add_cache_points(messages, system_prompt=None):
"""Add cache points to optimize costs."""
# Cache system prompt
system = None
if system_prompt:
system = [
{"text": system_prompt},
{"cachePoint": {"type": "default"}}
]
# Find user message indices
user_indices = [i for i, msg in enumerate(messages) if msg['role'] == 'user']
# Add cache points to last 2 user messages
if len(user_indices) >= 2:
for idx in user_indices[-2:]:
messages[idx]['content'].append({"cachePoint": {"type": "default"}})
return messages, system
Cache Usage in Response
response = client.converse(
modelId=model_id,
messages=messages,
system=system
)
usage = response['usage']
print(f"Input tokens: {usage['inputTokens']}")
print(f"Cache write tokens: {usage.get('cacheWriteInputTokens', 0)}")
print(f"Cache read tokens: {usage.get('cacheReadInputTokens', 0)}")
print(f"Output tokens: {usage['outputTokens']}")
Cost Savings Example
# Without caching
# Request 1: 10,000 input tokens @ $3/M = $0.03
# Request 2: 10,000 input tokens @ $3/M = $0.03
# Total: $0.06
# With caching
# Request 1: 10,000 tokens written to cache @ $3.75/M = $0.0375
# Request 2: 10,000 tokens read from cache @ $0.30/M = $0.003
# Total: $0.0405 (32.5% savings)
9. Python Implementation
Complete ShelloBedrockClient Class
import boto3
import json
from typing import List, Dict, Any, Optional, Generator
from botocore.config import Config
from botocore.exceptions import ClientError
class ShelloBedrockClient:
"""AWS Bedrock client for Shello CLI with Converse API support."""
def __init__(
self,
model: str = "anthropic.claude-3-sonnet-20240229-v1:0",
region: str = "us-east-1",
aws_access_key: Optional[str] = None,
aws_secret_key: Optional[str] = None,
aws_session_token: Optional[str] = None,
aws_profile: Optional[str] = None,
endpoint_url: Optional[str] = None,
use_prompt_cache: bool = False,
debug: bool = False
):
self._model = model
self._region = region
self._use_prompt_cache = use_prompt_cache
self._debug = debug
# Initialize boto3 client
self._client = self._create_client(
region, aws_access_key, aws_secret_key,
aws_session_token, aws_profile, endpoint_url
)
def _create_client(self, region, access_key, secret_key,
session_token, profile, endpoint_url):
"""Create boto3 Bedrock Runtime client."""
config = Config(
region_name=region,
user_agent_extra='shello-cli/1.0',
read_timeout=300 # Important for streaming
)
if profile:
session = boto3.Session(profile_name=profile)
return session.client(
'bedrock-runtime',
config=config,
endpoint_url=endpoint_url
)
elif access_key and secret_key:
return boto3.client(
'bedrock-runtime',
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
aws_session_token=session_token,
config=config,
endpoint_url=endpoint_url
)
else:
# Use default credential chain
return boto3.client(
'bedrock-runtime',
config=config,
endpoint_url=endpoint_url
)
def chat(
self,
messages: List[Dict[str, Any]],
system_prompt: Optional[str] = None,
tools: Optional[List[Dict]] = None,
**kwargs
) -> Dict[str, Any]:
"""Send a chat completion request."""
# Format messages for Bedrock
bedrock_messages = self._format_messages(messages)
# Prepare system prompt
system = None
if system_prompt:
system = [{"text": system_prompt}]
if self._use_prompt_cache:
system.append({"cachePoint": {"type": "default"}})
# Prepare request
request_params = {
'modelId': self._model,
'messages': bedrock_messages
}
if system:
request_params['system'] = system
if tools:
request_params['toolConfig'] = self._format_tool_config(tools)
# Add inference config if provided
if 'temperature' in kwargs or 'max_tokens' in kwargs:
request_params['inferenceConfig'] = {
'maxTokens': kwargs.get('max_tokens', 2048),
'temperature': kwargs.get('temperature', 0.7),
'topP': kwargs.get('top_p', 0.9)
}
try:
response = self._client.converse(**request_params)
return self._parse_response(response)
except ClientError as e:
raise Exception(f"Bedrock API error: {str(e)}")
def chat_stream(
self,
messages: List[Dict[str, Any]],
system_prompt: Optional[str] = None,
tools: Optional[List[Dict]] = None,
**kwargs
) -> Generator[Dict[str, Any], None, None]:
"""Stream chat completion response."""
# Format messages
bedrock_messages = self._format_messages(messages)
# Prepare system prompt
system = None
if system_prompt:
system = [{"text": system_prompt}]
if self._use_prompt_cache:
system.append({"cachePoint": {"type": "default"}})
# Prepare request
request_params = {
'modelId': self._model,
'messages': bedrock_messages
}
if system:
request_params['system'] = system
if tools:
request_params['toolConfig'] = self._format_tool_config(tools)
if 'temperature' in kwargs or 'max_tokens' in kwargs:
request_params['inferenceConfig'] = {
'maxTokens': kwargs.get('max_tokens', 2048),
'temperature': kwargs.get('temperature', 0.7)
}
try:
response = self._client.converse_stream(**request_params)
for event in response['stream']:
if 'contentBlockDelta' in event:
delta = event['contentBlockDelta']['delta']
if 'text' in delta:
yield {'type': 'text', 'text': delta['text']}
elif 'metadata' in event:
usage = event['metadata']['usage']
yield {
'type': 'usage',
'inputTokens': usage.get('inputTokens', 0),
'outputTokens': usage.get('outputTokens', 0),
'cacheReadTokens': usage.get('cacheReadInputTokens', 0),
'cacheWriteTokens': usage.get('cacheWriteInputTokens', 0)
}
elif 'messageStop' in event:
yield {
'type': 'stop',
'stopReason': event['messageStop']['stopReason']
}
except ClientError as e:
yield {'type': 'error', 'error': str(e)}
def _format_messages(self, messages: List[Dict]) -> List[Dict]:
"""Convert Shello messages to Bedrock format."""
bedrock_messages = []
for msg in messages:
role = "user" if msg["role"] == "user" else "assistant"
content = []
if isinstance(msg["content"], str):
content = [{"text": msg["content"]}]
elif isinstance(msg["content"], list):
for item in msg["content"]:
if item["type"] == "text":
content.append({"text": item["text"]})
elif item["type"] == "image":
content.append(self._format_image(item))
bedrock_messages.append({"role": role, "content": content})
return bedrock_messages
def _format_image(self, item: Dict) -> Dict:
"""Format image content for Bedrock."""
return {
"image": {
"format": "jpeg", # Detect from data if needed
"source": {"bytes": item["source"]["data"]}
}
}
def _format_tool_config(self, tools: List[Dict]) -> Dict:
"""Convert Shello tools to Bedrock format."""
bedrock_tools = []
for tool in tools:
bedrock_tools.append({
"toolSpec": {
"name": tool["function"]["name"],
"description": tool["function"]["description"],
"inputSchema": {
"json": tool["function"]["parameters"]
}
}
})
return {
"tools": bedrock_tools,
"toolChoice": {"auto": {}}
}
def _parse_response(self, response: Dict) -> Dict:
"""Parse Bedrock response to standard format."""
return {
"content": response['output']['message']['content'][0]['text'],
"role": response['output']['message']['role'],
"stopReason": response['stopReason'],
"usage": response['usage'],
"metrics": response.get('metrics', {})
}
def set_model(self, model: str):
"""Change the current model."""
self._model = model
def get_current_model(self) -> str:
"""Get the current model name."""
return self._model
10. Error Handling
Common Errors
from botocore.exceptions import ClientError
try:
response = client.converse(modelId=model_id, messages=messages)
except ClientError as e:
error_code = e.response['Error']['Code']
error_message = e.response['Error']['Message']
if error_code == 'ValidationException':
# Invalid request parameters
if 'too long' in error_message.lower():
print("Context window exceeded - truncate messages")
else:
print(f"Validation error: {error_message}")
elif error_code == 'ThrottlingException':
# Rate limit exceeded
print("Rate limited - implement exponential backoff")
elif error_code == 'ModelTimeoutException':
# Model took too long to respond
print("Model timeout - retry with shorter context")
elif error_code == 'ModelNotReadyException':
# Model is still loading
print("Model not ready - wait and retry")
elif error_code == 'AccessDeniedException':
# Insufficient permissions
print("Access denied - check IAM permissions")
elif error_code == 'ResourceNotFoundException':
# Model not found
print(f"Model not found: {model_id}")
else:
print(f"Unexpected error: {error_code} - {error_message}")
Retry Logic with Exponential Backoff
import time
from functools import wraps
def retry_with_backoff(max_retries=3, base_delay=1):
"""Decorator for retry logic with exponential backoff."""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except ClientError as e:
error_code = e.response['Error']['Code']
# Don't retry on validation errors
if error_code == 'ValidationException':
raise
# Retry on throttling and timeouts
if error_code in ['ThrottlingException', 'ModelTimeoutException']:
if attempt < max_retries - 1:
delay = base_delay * (2 ** attempt)
print(f"Retry {attempt + 1}/{max_retries} after {delay}s")
time.sleep(delay)
continue
raise
raise Exception(f"Max retries ({max_retries}) exceeded")
return wrapper
return decorator
@retry_with_backoff(max_retries=3)
def call_bedrock(client, model_id, messages):
return client.converse(modelId=model_id, messages=messages)
Context Window Management
def truncate_messages(messages, max_tokens=100000):
"""Truncate messages to fit within context window."""
# Simple token estimation (4 chars ≈ 1 token)
total_chars = sum(
len(json.dumps(msg)) for msg in messages
)
estimated_tokens = total_chars // 4
if estimated_tokens <= max_tokens:
return messages
# Keep system message and recent messages
if len(messages) > 2:
# Keep first (system) and last 2 messages
return [messages[0]] + messages[-2:]
return messages
Cost Calculation
Pricing (per million tokens)
| Model | Input | Output | Cache Write | Cache Read |
|-------|-------|--------|-------------|------------|
| Claude Sonnet 4.5 | 15.00 | 0.30 |
| Claude Opus 4.5 | 25.00 | 0.50 |
| Claude Haiku 4.5 | 5.00 | 0.10 |
| Nova Pro | 3.20 | - | $0.20 |
| Nova Lite | 0.24 | - | $0.015 |
Cost Calculator
def calculate_cost(
model_name: str,
input_tokens: int,
output_tokens: int,
cache_write_tokens: int = 0,
cache_read_tokens: int = 0
) -> float:
"""Calculate cost for Bedrock API usage."""
# Pricing per million tokens
pricing = {
'claude-sonnet-4.5': {
'input': 3.00, 'output': 15.00,
'cache_write': 3.75, 'cache_read': 0.30
},
'claude-opus-4.5': {
'input': 5.00, 'output': 25.00,
'cache_write': 6.25, 'cache_read': 0.50
},
'claude-haiku-4.5': {
'input': 1.00, 'output': 5.00,
'cache_write': 1.25, 'cache_read': 0.10
},
'nova-pro': {
'input': 0.80, 'output': 3.20,
'cache_write': 0, 'cache_read': 0.20
},
'nova-lite': {
'input': 0.06, 'output': 0.24,
'cache_write': 0, 'cache_read': 0.015
}
}
# Get pricing for model
model_pricing = pricing.get(model_name, pricing['claude-sonnet-4.5'])
# Calculate costs
input_cost = (input_tokens / 1_000_000) * model_pricing['input']
output_cost = (output_tokens / 1_000_000) * model_pricing['output']
cache_write_cost = (cache_write_tokens / 1_000_000) * model_pricing['cache_write']
cache_read_cost = (cache_read_tokens / 1_000_000) * model_pricing['cache_read']
total_cost = input_cost + output_cost + cache_write_cost + cache_read_cost
return round(total_cost, 6)
References
- AWS Bedrock Converse API Documentation
- Converse API Reference
- ConverseStream API Reference
- Boto3 Bedrock Runtime Documentation
- Tool Use Examples
- Prompt Caching
Content rephrased for compliance with licensing restrictions