AWS Bedrock Converse API

Official Documentation: AWS Bedrock Converse API

Last Updated: January 2026

Overview
Authentication & Setup
Message Format
Request Structure
Response Structure
Streaming with ConverseStream
Tool Use (Function Calling)
Prompt Caching
Python Implementation
Error Handling

1. Overview

The AWS Bedrock Converse API provides a unified interface for conversational AI that works consistently across all supported foundation models.

Why Use Converse API?

Single codebase works with multiple models (Claude, Nova, Llama, Mistral, etc.)
Consistent message format across all models
Built-in tool calling (function calling) support
Streaming responses for real-time interactions
Prompt caching to reduce costs and latency
Guardrails integration for content safety

Supported Models

| Provider | Models |

|----------|--------|

| Anthropic | Claude Sonnet 4.5, Claude Opus 4.5, Claude Haiku 4.5 |

| Amazon | Nova Premier, Nova Pro, Nova Lite |

| Meta | Llama 3.x models |

| Mistral | Mistral Large, Mistral models |

| Cohere | Command R+, Command models |

2. Authentication & Setup

Required Permissions

 
{
 
  "Version": "2012-10-17",
 
  "Statement": [
 
    {
 
      "Effect": "Allow",
 
      "Action": [
 
        "bedrock:InvokeModel",
 
        "bedrock:InvokeModelWithResponseStream"
 
      ],
 
      "Resource": "arn:aws:bedrock:*::foundation-model/*"
 
    }
 
  ]
 
}

Python Client Setup with Boto3

 
import boto3
 
from botocore.config import Config
 
 
 
# Method 1: Default credentials (from ~/.aws/credentials or environment)
 
client = boto3.client(
 
    service_name='bedrock-runtime',
 
    region_name='us-east-1'
 
)
 
 
 
# Method 2: Explicit credentials
 
client = boto3.client(
 
    service_name='bedrock-runtime',
 
    region_name='us-east-1',
 
    aws_access_key_id='YOUR_ACCESS_KEY',
 
    aws_secret_access_key='YOUR_SECRET_KEY',
 
    aws_session_token='YOUR_SESSION_TOKEN'  # Optional
 
)
 
 
 
# Method 3: Using AWS profile
 
session = boto3.Session(profile_name='your-profile')
 
client = session.client(
 
    service_name='bedrock-runtime',
 
    region_name='us-east-1'
 
)
 
 
 
# Method 4: With custom configuration
 
config = Config(
 
    region_name='us-east-1',
 
    user_agent_extra='shello-cli/1.0',
 
    read_timeout=300  # Important for streaming
 
)
 
client = boto3.client(
 
    service_name='bedrock-runtime',
 
    config=config
 
)

Supported AWS Regions

 
BEDROCK_REGIONS = [
 
    'us-east-1', 'us-east-2', 'us-west-1', 'us-west-2',
 
    'ap-south-1', 'ap-northeast-1', 'ap-northeast-2', 'ap-southeast-1', 'ap-southeast-2',
 
    'ca-central-1',
 
    'eu-central-1', 'eu-west-1', 'eu-west-2', 'eu-west-3',
 
    'sa-east-1'
 
]

3. Message Format

Message Structure

Messages follow a consistent format across all models:

 
messages = [
 
    {
 
        "role": "user",  # or "assistant"
 
        "content": [
 
            {"text": "Your message here"}
 
        ]
 
    }
 
]

Content Types

1. Text Content

 
{
 
    "role": "user",
 
    "content": [
 
        {"text": "Hello, how are you?"}
 
    ]
 
}

2. Image Content

 
import base64
 
 
 
# Read image file
 
with open('image.jpg', 'rb') as f:
 
    image_bytes = f.read()
 
 
 
{
 
    "role": "user",
 
    "content": [
 
        {"text": "What's in this image?"},
 
        {
 
            "image": {
 
                "format": "jpeg",  # or "png", "gif", "webp"
 
                "source": {
 
                    "bytes": image_bytes
 
                }
 
            }
 
        }
 
    ]
 
}

3. Mixed Content (Text + Images)

 
{
 
    "role": "user",
 
    "content": [
 
        {"text": "Compare these two images:"},
 
        {
 
            "image": {
 
                "format": "jpeg",
 
                "source": {"bytes": image1_bytes}
 
            }
 
        },
 
        {
 
            "image": {
 
                "format": "jpeg",
 
                "source": {"bytes": image2_bytes}
 
            }
 
        },
 
        {"text": "What are the differences?"}
 
    ]
 
}

Multi-Turn Conversations

 
messages = [
 
    {
 
        "role": "user",
 
        "content": [{"text": "Create a list of 3 pop songs."}]
 
    },
 
    {
 
        "role": "assistant",
 
        "content": [{
 
            "text": "1. 'As It Was' by Harry Styles\n2. 'Easy On Me' by Adele\n3. 'Unholy' by Sam Smith"
 
        }]
 
    },
 
    {
 
        "role": "user",
 
        "content": [{"text": "Make them all by UK artists."}]
 
    }
 
]

4. Request Structure

Complete Request Example

 
response = client.converse(
 
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
 
    messages=messages,
 
    system=[
 
        {"text": "You are a helpful AI assistant."}
 
    ],
 
    inferenceConfig={
 
        'maxTokens': 2048,
 
        'temperature': 0.7,
 
        'topP': 0.9,
 
        'stopSequences': ['END', 'STOP']
 
    },
 
    additionalModelRequestFields={
 
        'top_k': 250  # Model-specific parameters
 
    },
 
    toolConfig={
 
        'tools': [...],  # Tool definitions
 
        'toolChoice': {'auto': {}}
 
    },
 
    guardrailConfig={
 
        'guardrailIdentifier': 'your-guardrail-id',
 
        'guardrailVersion': '1'
 
    }
 
)

Request Fields

|-------|------|----------|-------------|

Model IDs

 
# Anthropic Claude models
 
'anthropic.claude-3-5-sonnet-20241022-v2:0'
 
'anthropic.claude-3-5-haiku-20241022-v1:0'
 
'anthropic.claude-3-opus-20240229-v1:0'
 
 
 
# Amazon Nova models
 
'amazon.nova-premier-v1:0'
 
'amazon.nova-pro-v1:0'
 
'amazon.nova-lite-v1:0'
 
 
 
# Cross-region inference profiles
 
'us.anthropic.claude-3-5-sonnet-20241022-v2:0'
 
'eu.anthropic.claude-3-5-sonnet-20241022-v2:0'

Inference Configuration

 
inferenceConfig = {
 
    'maxTokens': 2048,        # Maximum tokens to generate (required for most models)
 
    'temperature': 0.7,       # 0.0 to 1.0 (higher = more creative)
 
    'topP': 0.9,              # Nucleus sampling (0.0 to 1.0)
 
    'stopSequences': ['END']  # Stop generation at these sequences
 
}

System Prompts

 
system = [
 
    {
 
        "text": "You are an expert Python developer. "
 
                "Provide clear, concise code examples."
 
    }
 
]

5. Response Structure

Converse Response

 
{
 
    'output': {
 
        'message': {
 
            'role': 'assistant',
 
            'content': [
 
                {'text': 'Generated response text here...'}
 
            ]
 
        }
 
    },
 
    'stopReason': 'end_turn',  # or 'max_tokens', 'stop_sequence', 'tool_use'
 
    'usage': {
 
        'inputTokens': 125,
 
        'outputTokens': 60,
 
        'totalTokens': 185,
 
        'cacheReadInputTokens': 0,      # Prompt caching
 
        'cacheWriteInputTokens': 0      # Prompt caching
 
    },
 
    'metrics': {
 
        'latencyMs': 1175
 
    }
 
}

Stop Reasons

| Stop Reason | Description |

|-------------|-------------|

| end_turn | Model completed its response naturally |

| max_tokens | Reached maximum token limit |

| stop_sequence | Hit a stop sequence |

| tool_use | Model wants to use a tool |

| content_filtered | Content blocked by guardrails |

Extracting Response Text

 
response = client.converse(
 
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
 
    messages=messages
 
)
 
 
 
# Extract the text
 
response_text = response['output']['message']['content'][0]['text']
 
 
 
# Get token usage
 
input_tokens = response['usage']['inputTokens']
 
output_tokens = response['usage']['outputTokens']
 
total_cost = calculate_cost(input_tokens, output_tokens)

6. Streaming with ConverseStream

Basic Streaming Example

 
response = client.converse_stream(
 
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
 
    messages=messages,
 
    inferenceConfig={'maxTokens': 2048, 'temperature': 0.7}
 
)
 
 
 
# Process the stream
 
for event in response['stream']:
 
    if 'messageStart' in event:
 
        print(f"Role: {event['messageStart']['role']}")
 
    elif 'contentBlockDelta' in event:
 
        delta = event['contentBlockDelta']['delta']
 
        if 'text' in delta:
 
            print(delta['text'], end='', flush=True)
 
    elif 'messageStop' in event:
 
        stop_reason = event['messageStop']['stopReason']
 
        print(f"\nStop reason: {stop_reason}")
 
    elif 'metadata' in event:
 
        usage = event['metadata']['usage']
 
        print(f"\nTokens used: {usage['totalTokens']}")

Stream Event Types

| Event | Description | Fields |

|-------|-------------|--------|

| messageStart | Start of message | role |

| contentBlockStart | Start of content block | contentBlockIndex, start |

| contentBlockDelta | Content chunk | contentBlockIndex, delta |

| contentBlockStop | End of content block | contentBlockIndex |

| messageStop | End of message | stopReason |

| metadata | Usage and metrics | usage, metrics |

Complete Streaming Implementation

 
def stream_response(client, model_id, messages):
 
    """Stream response from Bedrock with proper event handling."""
 
    response = client.converse_stream(
 
        modelId=model_id,
 
        messages=messages,
 
        inferenceConfig={'maxTokens': 2048, 'temperature': 0}
 
    )
 
    full_text = ""
 
    input_tokens = 0
 
    output_tokens = 0
 
    try:
 
        for event in response['stream']:
 
            # Handle content deltas
 
            if 'contentBlockDelta' in event:
 
                delta = event['contentBlockDelta']['delta']
 
                if 'text' in delta:
 
                    text_chunk = delta['text']
 
                    full_text += text_chunk
 
                    yield {'type': 'text', 'text': text_chunk}
 
            # Handle metadata (token usage)
 
            elif 'metadata' in event:
 
                usage = event['metadata']['usage']
 
                input_tokens = usage.get('inputTokens', 0)
 
                output_tokens = usage.get('outputTokens', 0)
 
                yield {
 
                    'type': 'usage',
 
                    'inputTokens': input_tokens,
 
                    'outputTokens': output_tokens,
 
                    'totalTokens': usage.get('totalTokens', 0)
 
                }
 
            # Handle stop event
 
            elif 'messageStop' in event:
 
                yield {
 
                    'type': 'stop',
 
                    'stopReason': event['messageStop']['stopReason']
 
                }
 
    except Exception as e:
 
        yield {'type': 'error', 'error': str(e)}
 
    return full_text
 
 
 
# Usage
 
for chunk in stream_response(client, model_id, messages):
 
    if chunk['type'] == 'text':
 
        print(chunk['text'], end='', flush=True)
 
    elif chunk['type'] == 'usage':
 
        print(f"\n\nTokens: {chunk['totalTokens']}")

7. Tool Use (Function Calling)

Tool Definition Format

 
tools = [
 
    {
 
        "toolSpec": {
 
            "name": "get_weather",
 
            "description": "Get the current weather for a location",
 
            "inputSchema": {
 
                "json": {
 
                    "type": "object",
 
                    "properties": {
 
                        "location": {
 
                            "type": "string",
 
                            "description": "City name, e.g., 'San Francisco'"
 
                        },
 
                        "unit": {
 
                            "type": "string",
 
                            "enum": ["celsius", "fahrenheit"],
 
                            "description": "Temperature unit"
 
                        }
 
                    },
 
                    "required": ["location"]
 
                }
 
            }
 
        }
 
    }
 
]

Tool Configuration

 
toolConfig = {
 
    "tools": tools,
 
    "toolChoice": {
 
        "auto": {}  # Let model decide when to use tools
 
        # OR
 
        # "any": {}  # Force model to use a tool
 
        # OR
 
        # "tool": {"name": "get_weather"}  # Force specific tool
 
    }
 
}

Complete Tool Use Flow

 
def handle_tool_use(client, model_id, messages, tools):
 
    """Handle tool use with Bedrock Converse API."""
 
    # Initial request with tools
 
    response = client.converse(
 
        modelId=model_id,
 
        messages=messages,
 
        toolConfig={"tools": tools, "toolChoice": {"auto": {}}}
 
    )
 
    # Check if model wants to use a tool
 
    stop_reason = response['stopReason']
 
    if stop_reason == 'tool_use':
 
        # Extract tool use requests
 
        content = response['output']['message']['content']
 
        for block in content:
 
            if 'toolUse' in block:
 
                tool_use = block['toolUse']
 
                tool_name = tool_use['name']
 
                tool_input = tool_use['input']
 
                tool_use_id = tool_use['toolUseId']
 
                print(f"Model wants to use tool: {tool_name}")
 
                print(f"Tool input: {tool_input}")
 
                # Execute the tool
 
                tool_result = execute_tool(tool_name, tool_input)
 
                # Add assistant message with tool use
 
                messages.append({
 
                    "role": "assistant",
 
                    "content": [block]
 
                })
 
                # Add tool result
 
                messages.append({
 
                    "role": "user",
 
                    "content": [
 
                        {
 
                            "toolResult": {
 
                                "toolUseId": tool_use_id,
 
                                "content": [
 
                                    {"text": json.dumps(tool_result)}
 
                                ]
 
                            }
 
                        }
 
                    ]
 
                })
 
                # Continue conversation with tool result
 
                return handle_tool_use(client, model_id, messages, tools)
 
    # Return final response
 
    return response['output']['message']['content'][0]['text']
 
 
 
 
def execute_tool(tool_name, tool_input):
 
    """Execute the actual tool function."""
 
    if tool_name == "get_weather":
 
        location = tool_input['location']
 
        unit = tool_input.get('unit', 'celsius')
 
        # Call actual weather API here
 
        return {
 
            "location": location,
 
            "temperature": 22,
 
            "unit": unit,
 
            "condition": "sunny"
 
        }
 
    return {"error": "Unknown tool"}

Tool Result Format

 
{
 
    "role": "user",
 
    "content": [
 
        {
 
            "toolResult": {
 
                "toolUseId": "tool_use_id_from_model",
 
                "content": [
 
                    {"text": '{"temperature": 22, "condition": "sunny"}'}
 
                ],
 
                "status": "success"  # or "error"
 
            }
 
        }
 
    ]
 
}

8. Prompt Caching

Prompt caching reduces costs and latency by caching parts of your prompt that don’t change between requests.

How It Works

Add cachePoint objects to mark cacheable content
First request: Writes to cache (charged at cache write rate)
Subsequent requests: Reads from cache (charged at lower cache read rate)
Cache TTL: 5 minutes

Adding Cache Points

 
# System prompt with cache point
 
system = [
 
    {"text": "You are a helpful AI assistant with extensive knowledge."},
 
    {"cachePoint": {"type": "default"}}
 
]
 
 
 
# Messages with cache points
 
messages = [
 
    {
 
        "role": "user",
 
        "content": [
 
            {"text": "Here is a long document to analyze:\n\n" + long_document},
 
            {"cachePoint": {"type": "default"}}
 
        ]
 
    },
 
    {
 
        "role": "assistant",
 
        "content": [{"text": "I've read the document. What would you like to know?"}]
 
    },
 
    {
 
        "role": "user",
 
        "content": [
 
            {"text": "What are the main themes?"},
 
            {"cachePoint": {"type": "default"}}
 
        ]
 
    }
 
]

Cache Strategy

Best practices for cache points:

System prompts - Always cache if they don’t change
Long context - Cache large documents, code bases, etc.
Conversation history - Cache earlier turns in long conversations
Last 2 user messages - Common pattern for multi-turn chats

 
def add_cache_points(messages, system_prompt=None):
 
    """Add cache points to optimize costs."""
 
    # Cache system prompt
 
    system = None
 
    if system_prompt:
 
        system = [
 
            {"text": system_prompt},
 
            {"cachePoint": {"type": "default"}}
 
        ]
 
    # Find user message indices
 
    user_indices = [i for i, msg in enumerate(messages) if msg['role'] == 'user']
 
    # Add cache points to last 2 user messages
 
    if len(user_indices) >= 2:
 
        for idx in user_indices[-2:]:
 
            messages[idx]['content'].append({"cachePoint": {"type": "default"}})
 
    return messages, system

Cache Usage in Response

 
response = client.converse(
 
    modelId=model_id,
 
    messages=messages,
 
    system=system
 
)
 
 
 
usage = response['usage']
 
print(f"Input tokens: {usage['inputTokens']}")
 
print(f"Cache write tokens: {usage.get('cacheWriteInputTokens', 0)}")
 
print(f"Cache read tokens: {usage.get('cacheReadInputTokens', 0)}")
 
print(f"Output tokens: {usage['outputTokens']}")

Cost Savings Example

 
# Without caching
 
# Request 1: 10,000 input tokens @ $3/M = $0.03
 
# Request 2: 10,000 input tokens @ $3/M = $0.03
 
# Total: $0.06
 
 
 
# With caching
 
# Request 1: 10,000 tokens written to cache @ $3.75/M = $0.0375
 
# Request 2: 10,000 tokens read from cache @ $0.30/M = $0.003
 
# Total: $0.0405 (32.5% savings)

9. Python Implementation

Complete ShelloBedrockClient Class

 
import boto3
 
import json
 
from typing import List, Dict, Any, Optional, Generator
 
from botocore.config import Config
 
from botocore.exceptions import ClientError
 
 
 
class ShelloBedrockClient:
 
    """AWS Bedrock client for Shello CLI with Converse API support."""
 
    def __init__(
 
        self,
 
        model: str = "anthropic.claude-3-sonnet-20240229-v1:0",
 
        region: str = "us-east-1",
 
        aws_access_key: Optional[str] = None,
 
        aws_secret_key: Optional[str] = None,
 
        aws_session_token: Optional[str] = None,
 
        aws_profile: Optional[str] = None,
 
        endpoint_url: Optional[str] = None,
 
        use_prompt_cache: bool = False,
 
        debug: bool = False
 
    ):
 
        self._model = model
 
        self._region = region
 
        self._use_prompt_cache = use_prompt_cache
 
        self._debug = debug
 
        # Initialize boto3 client
 
        self._client = self._create_client(
 
            region, aws_access_key, aws_secret_key,
 
            aws_session_token, aws_profile, endpoint_url
 
        )
 
    def _create_client(self, region, access_key, secret_key,
 
                       session_token, profile, endpoint_url):
 
        """Create boto3 Bedrock Runtime client."""
 
        config = Config(
 
            region_name=region,
 
            user_agent_extra='shello-cli/1.0',
 
            read_timeout=300  # Important for streaming
 
        )
 
        if profile:
 
            session = boto3.Session(profile_name=profile)
 
            return session.client(
 
                'bedrock-runtime',
 
                config=config,
 
                endpoint_url=endpoint_url
 
            )
 
        elif access_key and secret_key:
 
            return boto3.client(
 
                'bedrock-runtime',
 
                aws_access_key_id=access_key,
 
                aws_secret_access_key=secret_key,
 
                aws_session_token=session_token,
 
                config=config,
 
                endpoint_url=endpoint_url
 
            )
 
        else:
 
            # Use default credential chain
 
            return boto3.client(
 
                'bedrock-runtime',
 
                config=config,
 
                endpoint_url=endpoint_url
 
            )
 
    def chat(
 
        self,
 
        messages: List[Dict[str, Any]],
 
        system_prompt: Optional[str] = None,
 
        tools: Optional[List[Dict]] = None,
 
        **kwargs
 
    ) -> Dict[str, Any]:
 
        """Send a chat completion request."""
 
        # Format messages for Bedrock
 
        bedrock_messages = self._format_messages(messages)
 
        # Prepare system prompt
 
        system = None
 
        if system_prompt:
 
            system = [{"text": system_prompt}]
 
            if self._use_prompt_cache:
 
                system.append({"cachePoint": {"type": "default"}})
 
        # Prepare request
 
        request_params = {
 
            'modelId': self._model,
 
            'messages': bedrock_messages
 
        }
 
        if system:
 
            request_params['system'] = system
 
        if tools:
 
            request_params['toolConfig'] = self._format_tool_config(tools)
 
        # Add inference config if provided
 
        if 'temperature' in kwargs or 'max_tokens' in kwargs:
 
            request_params['inferenceConfig'] = {
 
                'maxTokens': kwargs.get('max_tokens', 2048),
 
                'temperature': kwargs.get('temperature', 0.7),
 
                'topP': kwargs.get('top_p', 0.9)
 
            }
 
        try:
 
            response = self._client.converse(**request_params)
 
            return self._parse_response(response)
 
        except ClientError as e:
 
            raise Exception(f"Bedrock API error: {str(e)}")
 
    def chat_stream(
 
        self,
 
        messages: List[Dict[str, Any]],
 
        system_prompt: Optional[str] = None,
 
        tools: Optional[List[Dict]] = None,
 
        **kwargs
 
    ) -> Generator[Dict[str, Any], None, None]:
 
        """Stream chat completion response."""
 
        # Format messages
 
        bedrock_messages = self._format_messages(messages)
 
        # Prepare system prompt
 
        system = None
 
        if system_prompt:
 
            system = [{"text": system_prompt}]
 
            if self._use_prompt_cache:
 
                system.append({"cachePoint": {"type": "default"}})
 
        # Prepare request
 
        request_params = {
 
            'modelId': self._model,
 
            'messages': bedrock_messages
 
        }
 
        if system:
 
            request_params['system'] = system
 
        if tools:
 
            request_params['toolConfig'] = self._format_tool_config(tools)
 
        if 'temperature' in kwargs or 'max_tokens' in kwargs:
 
            request_params['inferenceConfig'] = {
 
                'maxTokens': kwargs.get('max_tokens', 2048),
 
                'temperature': kwargs.get('temperature', 0.7)
 
            }
 
        try:
 
            response = self._client.converse_stream(**request_params)
 
            for event in response['stream']:
 
                if 'contentBlockDelta' in event:
 
                    delta = event['contentBlockDelta']['delta']
 
                    if 'text' in delta:
 
                        yield {'type': 'text', 'text': delta['text']}
 
                elif 'metadata' in event:
 
                    usage = event['metadata']['usage']
 
                    yield {
 
                        'type': 'usage',
 
                        'inputTokens': usage.get('inputTokens', 0),
 
                        'outputTokens': usage.get('outputTokens', 0),
 
                        'cacheReadTokens': usage.get('cacheReadInputTokens', 0),
 
                        'cacheWriteTokens': usage.get('cacheWriteInputTokens', 0)
 
                    }
 
                elif 'messageStop' in event:
 
                    yield {
 
                        'type': 'stop',
 
                        'stopReason': event['messageStop']['stopReason']
 
                    }
 
        except ClientError as e:
 
            yield {'type': 'error', 'error': str(e)}
 
    def _format_messages(self, messages: List[Dict]) -> List[Dict]:
 
        """Convert Shello messages to Bedrock format."""
 
        bedrock_messages = []
 
        for msg in messages:
 
            role = "user" if msg["role"] == "user" else "assistant"
 
            content = []
 
            if isinstance(msg["content"], str):
 
                content = [{"text": msg["content"]}]
 
            elif isinstance(msg["content"], list):
 
                for item in msg["content"]:
 
                    if item["type"] == "text":
 
                        content.append({"text": item["text"]})
 
                    elif item["type"] == "image":
 
                        content.append(self._format_image(item))
 
            bedrock_messages.append({"role": role, "content": content})
 
        return bedrock_messages
 
    def _format_image(self, item: Dict) -> Dict:
 
        """Format image content for Bedrock."""
 
        return {
 
            "image": {
 
                "format": "jpeg",  # Detect from data if needed
 
                "source": {"bytes": item["source"]["data"]}
 
            }
 
        }
 
    def _format_tool_config(self, tools: List[Dict]) -> Dict:
 
        """Convert Shello tools to Bedrock format."""
 
        bedrock_tools = []
 
        for tool in tools:
 
            bedrock_tools.append({
 
                "toolSpec": {
 
                    "name": tool["function"]["name"],
 
                    "description": tool["function"]["description"],
 
                    "inputSchema": {
 
                        "json": tool["function"]["parameters"]
 
                    }
 
                }
 
            })
 
        return {
 
            "tools": bedrock_tools,
 
            "toolChoice": {"auto": {}}
 
        }
 
    def _parse_response(self, response: Dict) -> Dict:
 
        """Parse Bedrock response to standard format."""
 
        return {
 
            "content": response['output']['message']['content'][0]['text'],
 
            "role": response['output']['message']['role'],
 
            "stopReason": response['stopReason'],
 
            "usage": response['usage'],
 
            "metrics": response.get('metrics', {})
 
        }
 
    def set_model(self, model: str):
 
        """Change the current model."""
 
        self._model = model
 
    def get_current_model(self) -> str:
 
        """Get the current model name."""
 
        return self._model

10. Error Handling

Common Errors

 
from botocore.exceptions import ClientError
 
 
 
try:
 
    response = client.converse(modelId=model_id, messages=messages)
 
except ClientError as e:
 
    error_code = e.response['Error']['Code']
 
    error_message = e.response['Error']['Message']
 
    if error_code == 'ValidationException':
 
        # Invalid request parameters
 
        if 'too long' in error_message.lower():
 
            print("Context window exceeded - truncate messages")
 
        else:
 
            print(f"Validation error: {error_message}")
 
    elif error_code == 'ThrottlingException':
 
        # Rate limit exceeded
 
        print("Rate limited - implement exponential backoff")
 
    elif error_code == 'ModelTimeoutException':
 
        # Model took too long to respond
 
        print("Model timeout - retry with shorter context")
 
    elif error_code == 'ModelNotReadyException':
 
        # Model is still loading
 
        print("Model not ready - wait and retry")
 
    elif error_code == 'AccessDeniedException':
 
        # Insufficient permissions
 
        print("Access denied - check IAM permissions")
 
    elif error_code == 'ResourceNotFoundException':
 
        # Model not found
 
        print(f"Model not found: {model_id}")
 
    else:
 
        print(f"Unexpected error: {error_code} - {error_message}")

Retry Logic with Exponential Backoff

 
import time
 
from functools import wraps
 
 
 
def retry_with_backoff(max_retries=3, base_delay=1):
 
    """Decorator for retry logic with exponential backoff."""
 
    def decorator(func):
 
        @wraps(func)
 
        def wrapper(*args, **kwargs):
 
            for attempt in range(max_retries):
 
                try:
 
                    return func(*args, **kwargs)
 
                except ClientError as e:
 
                    error_code = e.response['Error']['Code']
 
                    # Don't retry on validation errors
 
                    if error_code == 'ValidationException':
 
                        raise
 
                    # Retry on throttling and timeouts
 
                    if error_code in ['ThrottlingException', 'ModelTimeoutException']:
 
                        if attempt < max_retries - 1:
 
                            delay = base_delay * (2 ** attempt)
 
                            print(f"Retry {attempt + 1}/{max_retries} after {delay}s")
 
                            time.sleep(delay)
 
                            continue
 
                    raise
 
            raise Exception(f"Max retries ({max_retries}) exceeded")
 
        return wrapper
 
    return decorator
 
 
 
@retry_with_backoff(max_retries=3)
 
def call_bedrock(client, model_id, messages):
 
    return client.converse(modelId=model_id, messages=messages)

Context Window Management

 
def truncate_messages(messages, max_tokens=100000):
 
    """Truncate messages to fit within context window."""
 
    # Simple token estimation (4 chars ≈ 1 token)
 
    total_chars = sum(
 
        len(json.dumps(msg)) for msg in messages
 
    )
 
    estimated_tokens = total_chars // 4
 
    if estimated_tokens <= max_tokens:
 
        return messages
 
    # Keep system message and recent messages
 
    if len(messages) > 2:
 
        # Keep first (system) and last 2 messages
 
        return [messages[0]] + messages[-2:]
 
    return messages

Cost Calculation

Pricing (per million tokens)

|-------|-------|--------|-------------|------------|

| Claude Sonnet 4.5 | $3.00∣$ 15.00 | $3.75∣$ 0.30 |

| Claude Opus 4.5 | $5.00∣$ 25.00 | $6.25∣$ 0.50 |

| Claude Haiku 4.5 | $1.00∣$ 5.00 | $1.25∣$ 0.10 |

| Nova Pro | $0.80∣$ 3.20 | - | $0.20 |

| Nova Lite | $0.06∣$ 0.24 | - | $0.015 |

Cost Calculator

 
def calculate_cost(
 
    model_name: str,
 
    input_tokens: int,
 
    output_tokens: int,
 
    cache_write_tokens: int = 0,
 
    cache_read_tokens: int = 0
 
) -> float:
 
    """Calculate cost for Bedrock API usage."""
 
    # Pricing per million tokens
 
    pricing = {
 
        'claude-sonnet-4.5': {
 
            'input': 3.00, 'output': 15.00,
 
            'cache_write': 3.75, 'cache_read': 0.30
 
        },
 
        'claude-opus-4.5': {
 
            'input': 5.00, 'output': 25.00,
 
            'cache_write': 6.25, 'cache_read': 0.50
 
        },
 
        'claude-haiku-4.5': {
 
            'input': 1.00, 'output': 5.00,
 
            'cache_write': 1.25, 'cache_read': 0.10
 
        },
 
        'nova-pro': {
 
            'input': 0.80, 'output': 3.20,
 
            'cache_write': 0, 'cache_read': 0.20
 
        },
 
        'nova-lite': {
 
            'input': 0.06, 'output': 0.24,
 
            'cache_write': 0, 'cache_read': 0.015
 
        }
 
    }
 
    # Get pricing for model
 
    model_pricing = pricing.get(model_name, pricing['claude-sonnet-4.5'])
 
    # Calculate costs
 
    input_cost = (input_tokens / 1_000_000) * model_pricing['input']
 
    output_cost = (output_tokens / 1_000_000) * model_pricing['output']
 
    cache_write_cost = (cache_write_tokens / 1_000_000) * model_pricing['cache_write']
 
    cache_read_cost = (cache_read_tokens / 1_000_000) * model_pricing['cache_read']
 
    total_cost = input_cost + output_cost + cache_write_cost + cache_read_cost
 
    return round(total_cost, 6)

References

Content rephrased for compliance with licensing restrictions

Om's Brain

Explorer

1 Converse API