12-Angry-Agent / PRD.md
Blu3Orange
init
28a6bb1

A newer version of the Gradio SDK is available: 6.2.0

Upgrade

12 ANGRY AGENTS - Product Requirements Document

Overview

Concept: AI-powered jury deliberation simulation where 11 AI agents + 1 human player debate real criminal cases. A Judge narrator (ElevenLabs) orchestrates the experience.

Track: MCP in Action - Creative (potentially also Consumer)

Core Value Prop: True autonomous agent behavior - AI jurors reason, argue, persuade, and change their minds based on deliberation.


Sponsor Integration

Sponsor Prize Integration Priority
LlamaIndex $1,000 Case database RAG HIGH
ElevenLabs Airpods + $2K Judge narrator voice HIGH
Blaxel $2,500 Sandboxed agent execution MEDIUM
Modal $2,500 Agent compute MEDIUM
Gemini $10K credits Agent reasoning HIGH

User Experience Flow

1. CASE PRESENTATION
   └─> Judge (ElevenLabs) narrates case summary
   └─> Evidence displayed via LlamaIndex RAG
   └─> Player reads case file

2. SIDE SELECTION
   └─> Player chooses: DEFEND (not guilty) or PROSECUTE (guilty)
   └─> Player commits - cannot change

3. INITIAL VOTE
   └─> All 12 jurors vote (randomized split based on case)
   └─> Vote tally shown: e.g., "7-5 GUILTY"

4. DELIBERATION LOOP
   └─> Random 1-4 agents speak per round
   └─> Player gets turn (choose strategy → AI crafts argument)
   └─> Conviction scores shift based on arguments
   └─> Votes may flip
   └─> Repeat until: votes stabilize OR player calls vote

5. FINAL VERDICT
   └─> Judge announces verdict (ElevenLabs)
   └─> Deliberation transcript available
   └─> No "win/lose" - just the experience

Technical Architecture

System Overview

┌─────────────────────────────────────────────────────────────────────┐
│                        12 ANGRY AGENTS                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                      GRADIO UI LAYER                         │   │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │   │
│  │  │  Jury Box    │  │  Chat View   │  │  Case File   │      │   │
│  │  │  (12 seats)  │  │  (dialogue)  │  │  (evidence)  │      │   │
│  │  └──────────────┘  └──────────────┘  └──────────────┘      │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                              │                                       │
│                              ▼                                       │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                   ORCHESTRATOR AGENT                         │   │
│  │  ┌──────────────────────────────────────────────────────┐   │   │
│  │  │  GameStateManager                                     │   │   │
│  │  │  - current_phase: presentation|deliberation|verdict   │   │   │
│  │  │  - round_number: int                                  │   │   │
│  │  │  - votes: Dict[agent_id, "guilty"|"not_guilty"]      │   │   │
│  │  │  - conviction_scores: Dict[agent_id, float]          │   │   │
│  │  │  - speaking_queue: List[agent_id]                    │   │   │
│  │  │  - deliberation_log: List[Turn]                      │   │   │
│  │  └──────────────────────────────────────────────────────┘   │   │
│  │                                                              │   │
│  │  ┌──────────────────────────────────────────────────────┐   │   │
│  │  │  TurnManager                                          │   │   │
│  │  │  - select_speakers(1-4 random)                       │   │   │
│  │  │  - check_vote_stability()                            │   │   │
│  │  │  - process_vote_changes()                            │   │   │
│  │  └──────────────────────────────────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                              │                                       │
│         ┌────────────────────┼────────────────────┐                 │
│         ▼                    ▼                    ▼                 │
│  ┌─────────────┐    ┌─────────────────┐    ┌─────────────┐        │
│  │   JUDGE     │    │  JUROR AGENTS   │    │   PLAYER    │        │
│  │   AGENT     │    │   (11 total)    │    │   AGENT     │        │
│  │             │    │                 │    │             │        │
│  │ ElevenLabs  │    │ ┌─────────────┐ │    │ Hybrid I/O  │        │
│  │ TTS Output  │    │ │ AgentConfig │ │    │ Strategy    │        │
│  │             │    │ │ - persona   │ │    │ Selection   │        │
│  │ Narration   │    │ │ - model     │ │    │             │        │
│  │ Verdicts    │    │ │ - tools[]   │ │    │ Argument    │        │
│  │ Summaries   │    │ │ - memory    │ │    │ Crafting    │        │
│  └─────────────┘    │ └─────────────┘ │    └─────────────┘        │
│                      │                 │                            │
│                      │ ┌─────────────┐ │                            │
│                      │ │ JurorMemory │ │                            │
│                      │ │ - case_view │ │                            │
│                      │ │ - arguments │ │                            │
│                      │ │ - reactions │ │                            │
│                      │ │ - conviction│ │                            │
│                      │ └─────────────┘ │                            │
│                      └─────────────────┘                            │
│                              │                                       │
│         ┌────────────────────┼────────────────────┐                 │
│         ▼                    ▼                    ▼                 │
│  ┌─────────────┐    ┌─────────────────┐    ┌─────────────┐        │
│  │ LLAMAINDEX  │    │    LITELLM      │    │   BLAXEL    │        │
│  │             │    │                 │    │             │        │
│  │ Case RAG    │    │ Model Router    │    │ Sandbox     │        │
│  │ Evidence    │    │ - Gemini        │    │ Execution   │        │
│  │ Precedents  │    │ - Claude        │    │             │        │
│  │             │    │ - GPT-4         │    │ Agent Tools │        │
│  └─────────────┘    │ - Local         │    │ (future)    │        │
│                      └─────────────────┘    └─────────────┘        │
│                                                                      │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                     MCP SERVER LAYER                         │   │
│  │  Tools exposed for external AI agents to play as juror       │   │
│  │  - mcp_join_jury(case_id) -> seat_assignment                │   │
│  │  - mcp_view_evidence(case_id) -> evidence_list              │   │
│  │  - mcp_make_argument(argument_type, content) -> response    │   │
│  │  - mcp_cast_vote(vote) -> confirmation                      │   │
│  │  - mcp_view_deliberation() -> transcript                    │   │
│  └─────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

Data Models

GameState

@dataclass
class GameState:
    """Central game state - managed by Orchestrator."""

    # Session
    session_id: str
    case_id: str
    phase: Literal["setup", "presentation", "side_selection",
                   "initial_vote", "deliberation", "final_vote", "verdict"]

    # Rounds
    round_number: int = 0
    max_rounds: int = 20  # Safety limit
    stability_threshold: int = 3  # Rounds without vote change to end
    rounds_without_change: int = 0

    # Votes
    votes: Dict[str, Literal["guilty", "not_guilty"]] = field(default_factory=dict)
    vote_history: List[Dict[str, str]] = field(default_factory=list)

    # Conviction scores (0.0 = certain not guilty, 1.0 = certain guilty)
    conviction_scores: Dict[str, float] = field(default_factory=dict)

    # Deliberation
    speaking_queue: List[str] = field(default_factory=list)
    deliberation_log: List[DeliberationTurn] = field(default_factory=list)

    # Player
    player_side: Literal["defend", "prosecute"] | None = None
    player_seat: int = 7  # Which seat is the player


@dataclass
class DeliberationTurn:
    """A single turn in deliberation."""

    round_number: int
    speaker_id: str
    speaker_name: str
    argument_type: str  # "evidence", "emotional", "logical", "question", etc.
    content: str
    target_id: str | None = None  # Who they're addressing
    impact: Dict[str, float] = field(default_factory=dict)  # conviction changes
    timestamp: datetime = field(default_factory=datetime.now)

Agent Configuration

@dataclass
class JurorConfig:
    """Configuration for a single juror agent."""

    # Identity
    juror_id: str
    seat_number: int
    name: str
    emoji: str  # For display until sprites ready

    # Personality (affects reasoning style)
    archetype: str  # "rationalist", "empath", "cynic", etc.
    personality_prompt: str  # Detailed persona prompt

    # Behavior modifiers
    stubbornness: float  # 0.0-1.0, how hard to convince
    volatility: float    # 0.0-1.0, how much conviction swings
    influence: float     # 0.0-1.0, how persuasive to others
    verbosity: float     # 0.0-1.0, how long their arguments are

    # Model configuration
    model_provider: str  # "gemini", "openai", "anthropic", "local"
    model_id: str        # Specific model ID
    temperature: float = 0.7

    # Tools (future expansion)
    tools: List[str] = field(default_factory=list)  # ["web_search", "case_lookup"]

    # Memory
    memory_window: int = 10  # How many turns to remember in detail


@dataclass
class JurorMemory:
    """Memory state for a single juror."""

    juror_id: str

    # Case understanding
    case_summary: str
    key_evidence: List[str]
    evidence_interpretations: Dict[str, str]  # evidence_id -> interpretation

    # Deliberation memory
    arguments_heard: List[ArgumentMemory]
    arguments_made: List[str]

    # Relationships
    opinions_of_others: Dict[str, float]  # juror_id -> trust/agreement (-1 to 1)

    # Internal state
    current_conviction: float  # 0.0-1.0
    conviction_history: List[float]
    reasoning_chain: List[str]  # Why they believe what they believe
    doubts: List[str]  # Things that could change their mind


@dataclass
class ArgumentMemory:
    """Memory of a single argument heard."""

    speaker_id: str
    content_summary: str
    argument_type: str
    persuasiveness: float  # How convincing it was to this juror
    counter_points: List[str]  # Thoughts against it
    round_heard: int

Case Data Model

@dataclass
class CriminalCase:
    """A criminal case for deliberation."""

    case_id: str
    title: str
    summary: str  # 2-3 paragraph overview

    # Charges
    charges: List[str]

    # Evidence
    evidence: List[Evidence]

    # Witnesses
    witnesses: List[Witness]

    # Arguments
    prosecution_arguments: List[str]
    defense_arguments: List[str]

    # Defendant
    defendant: Defendant

    # Metadata
    difficulty: Literal["clear_guilty", "clear_innocent", "ambiguous"]
    themes: List[str]  # ["eyewitness", "circumstantial", "forensic", etc.]

    # For display
    year: int
    jurisdiction: str


@dataclass
class Evidence:
    """A piece of evidence."""

    evidence_id: str
    type: str  # "physical", "testimonial", "documentary", "forensic"
    description: str
    strength_prosecution: float  # 0.0-1.0
    strength_defense: float      # 0.0-1.0
    contestable: bool
    contest_reason: str | None


@dataclass
class Witness:
    """A witness in the case."""

    witness_id: str
    name: str
    role: str  # "eyewitness", "expert", "character", etc.
    testimony_summary: str
    credibility_issues: List[str]
    side: Literal["prosecution", "defense", "neutral"]

The 11 Juror Archetypes

jurors:
  - id: "juror_1"
    name: "Marcus Webb"
    archetype: "rationalist"
    emoji: "🧠"
    personality: |
      You are a retired engineer. You believe only in hard evidence and logical
      deduction. Emotional appeals annoy you. You often say "Show me the data."
      You change your mind only when presented with irrefutable logical arguments.
    stubbornness: 0.8
    volatility: 0.2
    influence: 0.7
    initial_lean: "neutral"

  - id: "juror_2"
    name: "Sarah Chen"
    archetype: "empath"
    emoji: "💗"
    personality: |
      You are a social worker. You always consider the human element - the
      defendant's background, circumstances, potential for redemption. You're
      easily moved by personal stories but skeptical of cold statistics.
    stubbornness: 0.4
    volatility: 0.7
    influence: 0.5
    initial_lean: "defense"

  - id: "juror_3"
    name: "Frank Russo"
    archetype: "cynic"
    emoji: "😤"
    personality: |
      You are a retired cop. You've "seen it all" and believe most defendants
      are guilty. You're impatient with naive arguments. You trust law
      enforcement evidence highly. Hard to convince toward not guilty.
    stubbornness: 0.9
    volatility: 0.1
    influence: 0.6
    initial_lean: "prosecution"

  - id: "juror_4"
    name: "Linda Park"
    archetype: "conformist"
    emoji: "😐"
    personality: |
      You are an accountant who avoids conflict. You tend to agree with whoever
      spoke last or with the majority. You rarely initiate arguments but will
      echo others. Easy to sway but also easy to sway back.
    stubbornness: 0.2
    volatility: 0.8
    influence: 0.2
    initial_lean: "majority"

  - id: "juror_5"
    name: "David Okonkwo"
    archetype: "contrarian"
    emoji: "🙄"
    personality: |
      You are a philosophy professor. You play devil's advocate constantly.
      If everyone says guilty, you argue not guilty. You value intellectual
      discourse over reaching conclusions. You ask probing questions.
    stubbornness: 0.6
    volatility: 0.5
    influence: 0.8
    initial_lean: "minority"

  - id: "juror_6"
    name: "Betty Morrison"
    archetype: "impatient"
    emoji: "⏰"
    personality: |
      You are a busy restaurant owner. You want this over quickly. You make
      snap judgments and get frustrated with long debates. You often say
      "Can we just vote already?" You're persuaded by confident, brief arguments.
    stubbornness: 0.5
    volatility: 0.6
    influence: 0.3
    initial_lean: "first_impression"

  - id: "juror_7"
    name: "[PLAYER]"
    archetype: "player"
    emoji: "👤"
    personality: "Human player"
    stubbornness: null
    volatility: null
    influence: 0.6
    initial_lean: "player_choice"

  - id: "juror_8"
    name: "Dr. James Wright"
    archetype: "detail_obsessed"
    emoji: "🔍"
    personality: |
      You are a forensic accountant. You focus on tiny inconsistencies in
      testimony and evidence. You often derail discussions with minutiae.
      A single contradiction can completely change your view.
    stubbornness: 0.7
    volatility: 0.4
    influence: 0.5
    initial_lean: "neutral"

  - id: "juror_9"
    name: "Pastor Williams"
    archetype: "moralist"
    emoji: "⚖️"
    personality: |
      You are a church leader. You see things in black and white - right and
      wrong. You believe in justice but also redemption. Moral arguments
      resonate with you more than technical ones.
    stubbornness: 0.7
    volatility: 0.3
    influence: 0.6
    initial_lean: "gut_feeling"

  - id: "juror_10"
    name: "Nancy Cooper"
    archetype: "pragmatist"
    emoji: "💼"
    personality: |
      You are a business consultant. You think about consequences - what
      happens if we convict an innocent person? What if we free a guilty one?
      You weigh costs and benefits. You're persuaded by outcome-focused arguments.
    stubbornness: 0.5
    volatility: 0.5
    influence: 0.6
    initial_lean: "calculated"

  - id: "juror_11"
    name: "Miguel Santos"
    archetype: "storyteller"
    emoji: "📖"
    personality: |
      You are a novelist. You think in narratives - does the prosecution's
      story make sense? Does the defense's? You're swayed by coherent
      narratives and suspicious of stories with plot holes.
    stubbornness: 0.4
    volatility: 0.6
    influence: 0.7
    initial_lean: "best_story"

  - id: "juror_12"
    name: "Robert Kim"
    archetype: "wildcard"
    emoji: "🎲"
    personality: |
      You are a retired jazz musician. Your logic is unpredictable - you
      might fixate on something no one else noticed, or suddenly change
      your mind for unclear reasons. You're creative but inconsistent.
    stubbornness: 0.3
    volatility: 0.9
    influence: 0.4
    initial_lean: "random"

Conviction Score Mechanics

How Conviction Changes

def calculate_conviction_change(
    juror: JurorConfig,
    juror_memory: JurorMemory,
    argument: DeliberationTurn,
    game_state: GameState
) -> float:
    """
    Calculate how much an argument shifts a juror's conviction.

    Returns: delta to add to conviction score (-0.3 to +0.3 typically)
    """

    # Base impact from argument strength (determined by LLM)
    base_impact = evaluate_argument_strength(argument)  # -1.0 to 1.0

    # Personality modifiers
    archetype_modifier = get_archetype_modifier(
        juror.archetype,
        argument.argument_type
    )
    # e.g., "rationalist" gets 1.5x from "logical" arguments, 0.5x from "emotional"

    # Stubbornness reduces all changes
    stubbornness_modifier = 1.0 - (juror.stubbornness * 0.7)

    # Volatility adds randomness
    volatility_noise = random.gauss(0, juror.volatility * 0.1)

    # Relationship modifier - trust the speaker?
    trust = juror_memory.opinions_of_others.get(argument.speaker_id, 0.0)
    trust_modifier = 1.0 + (trust * 0.3)  # -30% to +30%

    # Conviction resistance - harder to move extremes
    current = juror_memory.current_conviction
    extreme_resistance = 1.0 - (abs(current - 0.5) * 0.5)

    # Calculate final delta
    delta = (
        base_impact
        * archetype_modifier
        * stubbornness_modifier
        * trust_modifier
        * extreme_resistance
        + volatility_noise
    )

    # Clamp to reasonable range
    return max(-0.3, min(0.3, delta))


def check_vote_flip(juror_memory: JurorMemory) -> bool:
    """Check if conviction score warrants a vote change."""

    current_vote_is_guilty = juror_memory.conviction_history[-1] > 0.5
    new_conviction = juror_memory.current_conviction

    # Hysteresis - need to cross threshold by margin to flip
    if current_vote_is_guilty and new_conviction < 0.4:
        return True  # Flip to not guilty
    elif not current_vote_is_guilty and new_conviction > 0.6:
        return True  # Flip to guilty

    return False

Archetype Argument Modifiers

ARCHETYPE_MODIFIERS = {
    "rationalist": {
        "logical": 1.5,
        "evidence": 1.3,
        "emotional": 0.4,
        "moral": 0.6,
        "narrative": 0.7,
        "question": 1.2,
    },
    "empath": {
        "logical": 0.6,
        "evidence": 0.8,
        "emotional": 1.5,
        "moral": 1.3,
        "narrative": 1.2,
        "question": 0.9,
    },
    "cynic": {
        "logical": 0.8,
        "evidence": 1.4,  # Trusts evidence
        "emotional": 0.3,
        "moral": 0.5,
        "narrative": 0.6,
        "question": 0.7,
    },
    # ... etc for all archetypes
}

Agent Memory Architecture

Memory Layers

┌─────────────────────────────────────────────────────────────┐
│                    JUROR MEMORY SYSTEM                       │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  LAYER 1: CASE KNOWLEDGE (LlamaIndex)               │   │
│  │  - Full case file indexed                            │   │
│  │  - Evidence details retrievable                      │   │
│  │  - Witness statements searchable                     │   │
│  │  - Persistent across session                         │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                   │
│                          ▼                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  LAYER 2: DELIBERATION MEMORY (Sliding Window)      │   │
│  │  - Last N turns in full detail                       │   │
│  │  - Summarized history beyond window                  │   │
│  │  - Key moments flagged for long-term                 │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                   │
│                          ▼                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  LAYER 3: REASONING STATE (Agent Internal)          │   │
│  │  - Current conviction + reasoning chain             │   │
│  │  - Key doubts and certainties                       │   │
│  │  - Opinions of other jurors                         │   │
│  │  - Arguments to make / avoid                        │   │
│  └─────────────────────────────────────────────────────┘   │
│                          │                                   │
│                          ▼                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  LAYER 4: PERSONA (Static)                          │   │
│  │  - Archetype definition                             │   │
│  │  - Personality prompt                               │   │
│  │  - Behavior modifiers                               │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Memory Injection into Agent Prompt

def build_juror_prompt(
    juror: JurorConfig,
    memory: JurorMemory,
    game_state: GameState,
    case: CriminalCase,
    task: str  # "speak" | "react" | "vote"
) -> str:
    """Build the full prompt for a juror agent."""

    prompt = f"""
# JUROR IDENTITY
You are {juror.name}, Juror #{juror.seat_number}.
{juror.personality_prompt}

# THE CASE: {case.title}
{case.summary}

# KEY EVIDENCE YOU REMEMBER
{format_evidence_memory(memory.key_evidence, memory.evidence_interpretations)}

# YOUR CURRENT POSITION
- Conviction: {conviction_to_text(memory.current_conviction)}
- Your reasoning: {' '.join(memory.reasoning_chain[-3:])}
- Your doubts: {', '.join(memory.doubts[:3]) if memory.doubts else 'None currently'}

# RECENT DELIBERATION (Last {len(memory.arguments_heard[-juror.memory_window:])} turns)
{format_recent_turns(memory.arguments_heard[-juror.memory_window:])}

# YOUR OPINIONS OF OTHER JURORS
{format_juror_opinions(memory.opinions_of_others)}

# CURRENT VOTE TALLY
Guilty: {game_state.votes.values().count('guilty')}
Not Guilty: {game_state.votes.values().count('not_guilty')}

# YOUR TASK
{get_task_prompt(task, juror.archetype)}
"""
    return prompt

Orchestration Flow

Smolagents Integration

from smolagents import CodeAgent, Tool, LiteLLMModel
from typing import List

class JurorAgent:
    """Wrapper around smolagents CodeAgent for a juror."""

    def __init__(self, config: JurorConfig, tools: List[Tool] = None):
        self.config = config
        self.memory = JurorMemory(juror_id=config.juror_id)

        # Model via LiteLLM for flexibility
        self.model = LiteLLMModel(
            model_id=f"{config.model_provider}/{config.model_id}",
            temperature=config.temperature
        )

        # Default tools (expandable)
        default_tools = [
            self.create_evidence_lookup_tool(),
            self.create_case_query_tool(),
        ]

        self.agent = CodeAgent(
            tools=default_tools + (tools or []),
            model=self.model,
            max_steps=3,  # Limit reasoning steps
        )

    def create_evidence_lookup_tool(self) -> Tool:
        """Tool to look up specific evidence."""
        # LlamaIndex query under the hood
        pass

    def create_case_query_tool(self) -> Tool:
        """Tool to query case details."""
        # LlamaIndex query under the hood
        pass

    async def generate_argument(
        self,
        game_state: GameState,
        case: CriminalCase
    ) -> DeliberationTurn:
        """Generate this juror's argument for their turn."""

        prompt = build_juror_prompt(
            self.config,
            self.memory,
            game_state,
            case,
            task="speak"
        )

        response = await self.agent.run(prompt)

        return parse_argument_response(response, self.config, game_state)

    async def react_to_argument(
        self,
        argument: DeliberationTurn,
        game_state: GameState,
        case: CriminalCase
    ) -> float:
        """React to another juror's argument, update conviction."""

        # Update memory with new argument
        self.memory.arguments_heard.append(
            ArgumentMemory(
                speaker_id=argument.speaker_id,
                content_summary=summarize_argument(argument.content),
                argument_type=argument.argument_type,
                persuasiveness=0.0,  # Will be calculated
                counter_points=[],
                round_heard=game_state.round_number
            )
        )

        # Calculate conviction change
        delta = calculate_conviction_change(
            self.config,
            self.memory,
            argument,
            game_state
        )

        self.memory.current_conviction += delta
        self.memory.current_conviction = max(0.0, min(1.0, self.memory.current_conviction))
        self.memory.conviction_history.append(self.memory.current_conviction)

        return delta


class OrchestratorAgent:
    """Master agent that coordinates the deliberation."""

    def __init__(
        self,
        jurors: List[JurorAgent],
        judge: JudgeAgent,
        case: CriminalCase
    ):
        self.jurors = {j.config.juror_id: j for j in jurors}
        self.judge = judge
        self.case = case
        self.state = GameState(
            session_id=str(uuid4()),
            case_id=case.case_id
        )

    async def run_deliberation_round(self) -> List[DeliberationTurn]:
        """Run a single round of deliberation."""

        self.state.round_number += 1
        turns = []

        # Select 1-4 random speakers (not player unless it's their turn)
        num_speakers = random.randint(1, 4)
        available = [j for j in self.jurors.keys() if j != "juror_7"]  # Exclude player
        speakers = random.sample(available, min(num_speakers, len(available)))

        # Each speaker makes argument
        for speaker_id in speakers:
            juror = self.jurors[speaker_id]
            turn = await juror.generate_argument(self.state, self.case)
            turns.append(turn)

            # All other jurors react
            for other_id, other_juror in self.jurors.items():
                if other_id != speaker_id and other_id != "juror_7":
                    delta = await other_juror.react_to_argument(
                        turn, self.state, self.case
                    )
                    turn.impact[other_id] = delta

            # Log turn
            self.state.deliberation_log.append(turn)

        # Check for vote changes
        self._process_vote_changes()

        # Check stability
        if self._votes_changed_this_round(turns):
            self.state.rounds_without_change = 0
        else:
            self.state.rounds_without_change += 1

        return turns

    def _process_vote_changes(self):
        """Check all jurors for vote flips."""
        for juror_id, juror in self.jurors.items():
            if juror_id == "juror_7":  # Player votes manually
                continue

            if check_vote_flip(juror.memory):
                old_vote = self.state.votes[juror_id]
                new_vote = "guilty" if juror.memory.current_conviction > 0.5 else "not_guilty"
                self.state.votes[juror_id] = new_vote
                # Could trigger announcement

    def check_should_end(self) -> bool:
        """Check if deliberation should end."""

        # Unanimous verdict
        votes = list(self.state.votes.values())
        if len(set(votes)) == 1:
            return True

        # Votes stabilized
        if self.state.rounds_without_change >= self.state.stability_threshold:
            return True

        # Max rounds reached
        if self.state.round_number >= self.state.max_rounds:
            return True

        return False

ElevenLabs Integration

Judge Narrator

from elevenlabs import Voice, generate, stream

class JudgeAgent:
    """The judge/narrator - uses ElevenLabs for voice."""

    def __init__(self, voice_id: str = None):
        self.voice_id = voice_id or "judge_voice_id"  # Configure
        self.voice_settings = {
            "stability": 0.7,
            "similarity_boost": 0.8,
            "style": 0.5,  # Authoritative
        }

    async def narrate(self, text: str, stream_output: bool = True) -> bytes:
        """Generate narration audio."""

        audio = generate(
            text=text,
            voice=Voice(voice_id=self.voice_id),
            model="eleven_multilingual_v2",
            stream=stream_output
        )

        if stream_output:
            return stream(audio)
        return audio

    def get_case_presentation(self, case: CriminalCase) -> str:
        """Script for presenting the case."""
        return f"""
        Members of the jury. You are here today to determine the fate of
        {case.defendant.name}, who stands accused of {', '.join(case.charges)}.

        {case.summary}

        You will hear the evidence. You will deliberate. And you will reach
        a verdict. The burden of proof lies with the prosecution, who must
        prove guilt beyond a reasonable doubt.

        Let us begin.
        """

    def get_vote_announcement(self, votes: Dict[str, str]) -> str:
        """Script for announcing vote."""
        guilty = sum(1 for v in votes.values() if v == "guilty")
        not_guilty = 12 - guilty

        return f"""
        The current vote stands at {guilty} for guilty,
        {not_guilty} for not guilty.

        {"The jury remains divided." if guilty not in [0, 12] else ""}
        {"A unanimous verdict has been reached." if guilty in [0, 12] else ""}
        """

UI Components

Kinetic Text Animation

// For animated text display (like After Effects kinetic typography)
// Will sync with ElevenLabs audio or simulate typing

class KineticText {
    constructor(container, options = {}) {
        this.container = container;
        this.speed = options.speed || 50; // ms per character
        this.variance = options.variance || 20; // randomness
    }

    async display(text, audioUrl = null) {
        // If audio provided, sync with it
        if (audioUrl) {
            return this.displayWithAudio(text, audioUrl);
        }

        // Otherwise, simulate speaking
        return this.displaySimulated(text);
    }

    async displaySimulated(text) {
        this.container.innerHTML = '';

        for (let i = 0; i < text.length; i++) {
            const char = text[i];
            const span = document.createElement('span');
            span.textContent = char;
            span.style.opacity = '0';
            span.style.animation = 'fadeInChar 0.1s forwards';
            this.container.appendChild(span);

            // Variable delay for natural feel
            const delay = this.speed + (Math.random() - 0.5) * this.variance;
            await this.sleep(delay);
        }
    }

    sleep(ms) {
        return new Promise(resolve => setTimeout(resolve, ms));
    }
}

Gradio UI Structure

import gradio as gr

def create_ui():
    with gr.Blocks(css=CUSTOM_CSS, theme=gr.themes.Base()) as demo:

        # State
        game_state = gr.State(None)

        # Header
        gr.HTML("<h1>12 ANGRY AGENTS</h1>")

        with gr.Row():
            # Left: Jury Box
            with gr.Column(scale=1):
                gr.Markdown("### The Jury")
                jury_box = gr.HTML(render_jury_box)  # 12 seats with emojis/votes
                vote_tally = gr.HTML()  # "7-5 GUILTY"

            # Center: Deliberation
            with gr.Column(scale=2):
                gr.Markdown("### Deliberation Room")
                deliberation_chat = gr.Chatbot(
                    label="Deliberation",
                    height=400,
                    show_label=False
                )

                # Player input
                with gr.Row():
                    strategy_select = gr.Dropdown(
                        choices=[
                            "Challenge Evidence",
                            "Question Witness Credibility",
                            "Appeal to Reasonable Doubt",
                            "Present Alternative Theory",
                            "Address Specific Juror",
                            "Call for Vote"
                        ],
                        label="Your Strategy"
                    )
                    speak_btn = gr.Button("Speak", variant="primary")

                with gr.Row():
                    pass_btn = gr.Button("Pass Turn")
                    call_vote_btn = gr.Button("Call Final Vote")

            # Right: Case File
            with gr.Column(scale=1):
                gr.Markdown("### Case File")
                case_summary = gr.Markdown()

                with gr.Accordion("Evidence", open=False):
                    evidence_list = gr.HTML()

                with gr.Accordion("Witnesses", open=False):
                    witness_list = gr.HTML()

        # Audio player for Judge
        audio_output = gr.Audio(label="Judge", autoplay=True, visible=False)

        # MCP Server enabled
        demo.launch(mcp_server=True)

LlamaIndex Case Database

Index Structure

from llama_index.core import VectorStoreIndex, Document
from llama_index.core.node_parser import SentenceSplitter

class CaseDatabase:
    """LlamaIndex-powered case database."""

    def __init__(self, cases_dir: str):
        self.cases = self._load_cases(cases_dir)
        self.index = self._build_index()

    def _build_index(self) -> VectorStoreIndex:
        """Build searchable index of all cases."""

        documents = []
        for case in self.cases:
            # Index case summary
            documents.append(Document(
                text=case.summary,
                metadata={"case_id": case.case_id, "type": "summary"}
            ))

            # Index each piece of evidence
            for evidence in case.evidence:
                documents.append(Document(
                    text=f"{evidence.type}: {evidence.description}",
                    metadata={
                        "case_id": case.case_id,
                        "type": "evidence",
                        "evidence_id": evidence.evidence_id
                    }
                ))

            # Index witness testimonies
            for witness in case.witnesses:
                documents.append(Document(
                    text=f"{witness.name} ({witness.role}): {witness.testimony_summary}",
                    metadata={
                        "case_id": case.case_id,
                        "type": "witness",
                        "witness_id": witness.witness_id
                    }
                ))

        parser = SentenceSplitter(chunk_size=512, chunk_overlap=50)
        nodes = parser.get_nodes_from_documents(documents)

        return VectorStoreIndex(nodes)

    def query_evidence(self, case_id: str, query: str) -> List[str]:
        """Query evidence for a specific case."""

        query_engine = self.index.as_query_engine(
            filters={"case_id": case_id}
        )
        response = query_engine.query(query)
        return response.source_nodes

    def get_random_case(self, difficulty: str = None) -> CriminalCase:
        """Get a random case, optionally filtered by difficulty."""

        if difficulty:
            filtered = [c for c in self.cases if c.difficulty == difficulty]
            return random.choice(filtered)
        return random.choice(self.cases)

Real Case Data Sources

Primary: Old Bailey Online (Historical)

Dataset: 197,745 criminal trials from London's Central Criminal Court (1674-1913)

Access:

Data Fields:

  • Trial ID, date, defendant name/gender
  • Offence category: theft, kill, deception, violent theft, sexual, etc.
  • Verdict, punishment
  • Full trial transcript text

Why This Works:

  • Historical cases avoid sensitivity around modern defendants
  • Rich narrative transcripts perfect for agent reasoning
  • 18th-century language adds unique flavor
  • Verdicts are known (ground truth for comparison)

Integration Example:

import xml.etree.ElementTree as ET

def load_old_bailey_case(xml_path: str) -> CriminalCase:
    """Parse Old Bailey XML into CriminalCase model."""
    tree = ET.parse(xml_path)
    root = tree.getroot()

    return CriminalCase(
        case_id=root.find(".//trialAccount").get("id"),
        title=f"The Crown v. {root.find('.//persName').text}",
        summary=extract_trial_text(root),
        charges=[root.find(".//offence").get("category")],
        evidence=extract_evidence_from_transcript(root),
        difficulty=infer_difficulty_from_verdict(root),
        year=int(root.find(".//date").get("year")),
        jurisdiction="London, England"
    )

Secondary: National Registry of Exonerations (Modern)

Dataset: All U.S. exonerations since 1989 (3,000+ cases)

Access: https://www.law.umich.edu/special/exoneration/Pages/about.aspx

Data Fields:

  • Crime type, state, year of conviction/exoneration
  • Contributing factors (eyewitness misID, false confession, etc.)
  • DNA involvement, sentence served

Why This Works:

  • Dramatic "wrongful conviction" cases
  • Clear evidence of reasonable doubt
  • Tests agents' ability to weigh conflicting evidence types

Fallback: Curated YAML Cases

For demo stability, include 3-5 handcrafted cases in cases/predefined/:

  • case_001_robbery.yaml - Clear guilty (baseline test)
  • case_002_murder.yaml - Ambiguous (compelling demo)
  • case_003_exoneration.yaml - DNA reversal scenario

This ensures the demo works even if external data sources are unavailable.


File Structure

12_angry_agents/
├── app.py                      # Gradio entry point
├── PRD.md                      # This document
├── requirements.txt
├── .env.example
│
├── core/
│   ├── __init__.py
│   ├── game_state.py           # GameState, DeliberationTurn models
│   ├── orchestrator.py         # OrchestratorAgent
│   ├── conviction.py           # Conviction score mechanics
│   └── turn_manager.py         # Turn selection, stability check
│
├── agents/
│   ├── __init__.py
│   ├── base_juror.py           # JurorAgent base class
│   ├── judge.py                # JudgeAgent (ElevenLabs)
│   ├── player.py               # PlayerAgent (human interface)
│   └── configs/
│       └── jurors.yaml         # 11 juror configurations
│
├── case_db/
│   ├── __init__.py
│   ├── database.py             # CaseDatabase (LlamaIndex)
│   ├── models.py               # CriminalCase, Evidence, Witness
│   └── cases/
│       ├── case_001.yaml
│       ├── case_002.yaml
│       └── ...
│
├── memory/
│   ├── __init__.py
│   ├── juror_memory.py         # JurorMemory management
│   └── summarizer.py           # Memory compression
│
├── ui/
│   ├── __init__.py
│   ├── components.py           # Gradio components
│   ├── jury_box.py             # Jury box renderer
│   ├── chat.py                 # Deliberation chat
│   └── static/
│       ├── styles.css
│       └── kinetic.js          # Text animations
│
├── mcp/
│   ├── __init__.py
│   └── tools.py                # MCP tool definitions
│
└── tests/
    ├── test_conviction.py
    ├── test_orchestrator.py
    └── test_memory.py

Development Phases

Phase 1: Foundation (4-6 hours)

  • Project setup, dependencies
  • Data models (GameState, Case, Juror)
  • Basic Gradio UI skeleton
  • Single juror agent working

Phase 2: Multi-Agent (4-6 hours)

  • All 11 juror configs
  • Orchestrator with turn management
  • Conviction score system
  • Memory system (basic)

Phase 3: Integration (3-4 hours)

  • LlamaIndex case database
  • ElevenLabs judge narration
  • Player interaction flow
  • Vote tracking and stability

Phase 4: Polish (2-3 hours)

  • UI animations (kinetic text)
  • Jury box visualization
  • MCP server tools
  • Demo video recording

Success Metrics

  1. 11 agents deliberating autonomously - TRUE agent behavior
  2. Judge narrating with ElevenLabs - Audio wow factor
  3. Conviction scores shifting - Visible persuasion
  4. Player can influence outcome - Agency
  5. MCP tools functional - External AI can play
  6. Runs without crashes - Stability


CRITICAL: Performance Optimizations

The Latency Trap - SOLVED

Problem: If 1 speaker speaks and 11 agents react individually = 12 LLM calls per turn = SLOW

Solution: Batch Jury State Update

class JuryStateManager:
    """
    Single LLM call to update ALL silent jurors' conviction scores.
    Replaces 11 individual react_to_argument() calls.
    """

    async def batch_update_convictions(
        self,
        argument: DeliberationTurn,
        silent_jurors: List[JurorConfig],
        juror_memories: Dict[str, JurorMemory],
        game_state: GameState
    ) -> Dict[str, ConvictionUpdate]:
        """
        ONE LLM call updates all 11 jurors' reactions.
        """

        prompt = f"""
You are simulating how 11 different jurors would react to this argument.

ARGUMENT BY {argument.speaker_name}:
"{argument.content}"

For each juror below, determine:
1. conviction_delta: float (-0.3 to +0.3) - how much their guilt conviction changes
2. reaction: str - brief internal thought (10 words max)
3. persuaded: bool - did this significantly move them?

JURORS:
{self._format_juror_profiles_compact(silent_jurors, juror_memories)}

Respond in JSON:
{{
  "juror_1": {{"delta": 0.1, "reaction": "Good point about the timeline", "persuaded": false}},
  "juror_2": {{"delta": -0.2, "reaction": "Too emotional, but touching", "persuaded": true}},
  ...
}}
"""

        response = await self.model.generate(prompt)
        return parse_batch_response(response)

Result: 1 speaker + 1 batch reaction = 2 LLM calls per turn (not 12)

Active vs Passive Jurors

# Each turn, only 2-3 jurors are "active listeners" (full memory update)
# Others get simplified heuristic updates

def select_active_listeners(game_state: GameState, num: int = 3) -> List[str]:
    """Select jurors who will fully process this turn."""

    # Prioritize: jurors on the fence, jurors addressed directly, random
    candidates = []

    # On the fence (conviction 0.35-0.65)
    for jid, memory in juror_memories.items():
        if 0.35 < memory.current_conviction < 0.65:
            candidates.append((jid, 2))  # Priority 2

    # Recently changed vote
    for jid in recently_flipped:
        candidates.append((jid, 3))  # Priority 3

    # Random others
    for jid in all_jurors:
        candidates.append((jid, 1))

    # Weight and select
    return weighted_sample(candidates, num)

Context Window Bloat - SOLVED

Problem: deliberation_log grows unbounded

Solution: Aggressive Rolling Summarization

class MemorySummarizer:
    """Compresses old deliberation history."""

    SUMMARY_INTERVAL = 5  # Summarize every 5 rounds
    KEEP_RECENT = 3       # Keep last 3 turns in full detail

    async def maybe_summarize(self, memory: JurorMemory, round_num: int):
        """Compress old turns if needed."""

        if round_num % self.SUMMARY_INTERVAL != 0:
            return

        # Split: recent (keep full) vs old (summarize)
        old_turns = memory.arguments_heard[:-self.KEEP_RECENT]
        recent_turns = memory.arguments_heard[-self.KEEP_RECENT:]

        if not old_turns:
            return

        # Summarize old turns into compact form
        summary = await self._compress_turns(old_turns)

        # Replace old turns with summary object
        memory.deliberation_summary = summary
        memory.arguments_heard = recent_turns

    async def _compress_turns(self, turns: List[ArgumentMemory]) -> str:
        """LLM call to compress multiple turns into summary."""

        prompt = f"""
Summarize these {len(turns)} deliberation turns into 3-5 bullet points.
Focus on: key arguments made, who was persuasive, major position shifts.

TURNS:
{self._format_turns(turns)}

Respond with bullet points only.
"""
        return await self.model.generate(prompt)


# Memory structure with summary
@dataclass
class JurorMemory:
    # ... existing fields ...

    # Compressed history (replaces old arguments_heard entries)
    deliberation_summary: str = ""  # "• Juror 3 argued about timeline..."

    # Only recent turns in full detail
    arguments_heard: List[ArgumentMemory]  # Max ~10 entries

LLM Call Budget Per Round

Action Calls Notes
1-4 speakers generate arguments 1-4 Parallelizable
Batch conviction update 1 All 11 reactions
Memory summarization 0-1 Every 5 rounds
Judge narration (ElevenLabs) 1 Audio only
TOTAL 3-7 Down from 12-48

External Participant System (MCP + Human)

Architecture: Swappable Juror Seats

Any of the 11 AI juror seats can be replaced by:

  1. External AI Agent (via MCP) - Another AI system joins as juror
  2. Human Player (via UI) - Additional human joins
  3. Default AI (Gemini) - Predefined personality
@dataclass
class JurorSeat:
    """A seat in the jury that can be filled by different participant types."""

    seat_number: int
    participant_type: Literal["ai_default", "ai_external", "human"]
    participant_id: str | None = None

    # For AI default
    config: JurorConfig | None = None
    agent: JurorAgent | None = None

    # For external (MCP or human)
    external_connection: ExternalConnection | None = None


class JuryManager:
    """Manages the 12 jury seats with mixed participant types."""

    def __init__(self):
        self.seats: Dict[int, JurorSeat] = {}
        self._init_default_seats()

    def _init_default_seats(self):
        """Initialize all 12 seats with default AI jurors."""
        for i in range(1, 13):
            if i == 7:  # Reserved for primary player
                self.seats[i] = JurorSeat(
                    seat_number=i,
                    participant_type="human",
                    participant_id="player_1"
                )
            else:
                config = load_juror_config(i)
                self.seats[i] = JurorSeat(
                    seat_number=i,
                    participant_type="ai_default",
                    config=config,
                    agent=JurorAgent(config)
                )

    def replace_with_external(
        self,
        seat_number: int,
        participant_type: Literal["ai_external", "human"],
        participant_id: str
    ) -> bool:
        """Replace a default AI with external participant."""

        if seat_number == 7:
            return False  # Primary player seat protected

        if seat_number not in self.seats:
            return False

        self.seats[seat_number] = JurorSeat(
            seat_number=seat_number,
            participant_type=participant_type,
            participant_id=participant_id,
            external_connection=ExternalConnection(participant_id)
        )
        return True

    def get_participant_for_turn(self, seat_number: int) -> TurnHandler:
        """Get appropriate handler for a seat's turn."""

        seat = self.seats[seat_number]

        if seat.participant_type == "ai_default":
            return AITurnHandler(seat.agent)
        elif seat.participant_type == "ai_external":
            return MCPTurnHandler(seat.external_connection)
        else:  # human
            return HumanTurnHandler(seat.participant_id)

MCP Tools for External Participants

# MCP Server exposes these tools for external AI agents

def mcp_join_as_juror(
    case_id: str,
    preferred_seat: int | None = None
) -> Dict:
    """
    Join an active case as a juror.

    An external AI agent can take over any non-player seat.
    Returns seat assignment and case briefing.

    Args:
        case_id: The case to join
        preferred_seat: Preferred seat number (2-6, 8-12), or None for auto-assign

    Returns:
        seat_number: Your assigned seat
        case_briefing: Summary of the case
        your_persona: Suggested personality (can ignore)
        current_state: Vote tally, round number
    """
    pass


def mcp_get_deliberation_state(case_id: str, seat_number: int) -> Dict:
    """
    Get current state of deliberation.

    Returns:
        recent_arguments: Last 5 arguments made
        vote_tally: Current guilty/not-guilty count
        your_conviction: Your current conviction score
        pending_speakers: Who speaks next
        is_your_turn: Whether you should speak now
    """
    pass


def mcp_make_argument(
    case_id: str,
    seat_number: int,
    argument_type: str,  # "evidence", "emotional", "logical", "question"
    content: str,
    target_juror: int | None = None
) -> Dict:
    """
    Make an argument during your turn.

    Returns:
        accepted: Whether argument was processed
        reactions: Brief summary of jury reactions
        vote_changes: Any votes that flipped
    """
    pass


def mcp_cast_vote(
    case_id: str,
    seat_number: int,
    vote: Literal["guilty", "not_guilty"]
) -> Dict:
    """
    Cast or change your vote.

    Returns:
        recorded: Confirmation
        new_tally: Updated vote count
    """
    pass


def mcp_pass_turn(case_id: str, seat_number: int) -> Dict:
    """Pass your turn without speaking."""
    pass

Human Join Flow (Additional Players)

1. Primary player starts game (seat 7)
2. Game generates shareable room code
3. Additional humans can join via:
   - URL with room code
   - Gradio UI "Join as Juror" button
4. They get assigned available seat (2-6, 8-12)
5. When it's their turn, UI prompts for input
6. They see same case file, deliberation history

Model Configuration

Default: Gemini Flash 2.5

# config/models.yaml

default_model:
  provider: "gemini"
  model_id: "gemini-2.5-flash"
  temperature: 0.7
  max_tokens: 1024

# Easily swappable per-agent or globally
model_overrides:
  judge:
    provider: "gemini"
    model_id: "gemini-2.5-flash"  # Fast for narration scripts

  batch_updater:
    provider: "gemini"
    model_id: "gemini-2.5-flash"  # Handles all conviction updates

  # Individual juror overrides (optional)
  juror_5:  # The contrarian philosopher
    provider: "anthropic"
    model_id: "claude-sonnet-4-20250514"
    temperature: 0.9

LiteLLM Integration

from litellm import completion

class ModelRouter:
    """Route to any model via LiteLLM."""

    def __init__(self, config_path: str = "config/models.yaml"):
        self.config = load_yaml(config_path)
        self.default = self.config["default_model"]

    def get_model_for(self, agent_id: str) -> Dict:
        """Get model config for specific agent."""
        overrides = self.config.get("model_overrides", {})
        return overrides.get(agent_id, self.default)

    async def generate(
        self,
        agent_id: str,
        prompt: str,
        **kwargs
    ) -> str:
        """Generate completion using appropriate model."""

        config = self.get_model_for(agent_id)

        response = await completion(
            model=f"{config['provider']}/{config['model_id']}",
            messages=[{"role": "user", "content": prompt}],
            temperature=config.get("temperature", 0.7),
            max_tokens=config.get("max_tokens", 1024),
            **kwargs
        )

        return response.choices[0].message.content

Case Data Architecture

Dual Source: Real + Fallback

class CaseLoader:
    """Load cases from real data or fallback to predefined."""

    def __init__(
        self,
        real_data_path: str | None = None,
        fallback_path: str = "cases/predefined/"
    ):
        self.real_data_path = real_data_path
        self.fallback_path = fallback_path

        # Try to load real data
        self.real_cases = self._load_real_cases() if real_data_path else []
        self.fallback_cases = self._load_fallback_cases()

    def get_case(self, case_id: str = None, use_real: bool = True) -> CriminalCase:
        """Get a case, preferring real data if available."""

        if case_id:
            # Specific case requested
            return self._find_case(case_id)

        # Random case
        if use_real and self.real_cases:
            return random.choice(self.real_cases)
        return random.choice(self.fallback_cases)

    def _load_real_cases(self) -> List[CriminalCase]:
        """Load from real case database (future: LlamaIndex over court records)."""
        # TODO: Integrate with real case API/database
        # For now, returns empty - falls back to predefined
        return []

    def _load_fallback_cases(self) -> List[CriminalCase]:
        """Load predefined cases from YAML files."""
        cases = []
        for file in Path(self.fallback_path).glob("*.yaml"):
            case_data = yaml.safe_load(file.read_text())
            cases.append(CriminalCase(**case_data))
        return cases


# Future: Real case integration
class RealCaseConnector:
    """
    Connect to real case databases.
    Designed for easy integration later.
    """

    def __init__(self):
        self.sources = {
            "court_listener": CourtListenerAPI(),  # Future
            "justia": JustiaAPI(),                  # Future
            "local_files": LocalCaseFiles(),       # CSV/JSON dumps
        }

    async def search_cases(
        self,
        query: str,
        filters: Dict = None
    ) -> List[CriminalCase]:
        """Search across all connected sources."""
        pass

    async def get_case_details(
        self,
        source: str,
        case_id: str
    ) -> CriminalCase:
        """Get full case from specific source."""
        pass

Execution Environment

Local First, Blaxel Ready

# config/execution.yaml

execution:
  mode: "local"  # "local" | "blaxel" | "docker"

  local:
    # No sandbox, runs in process
    timeout_seconds: 30

  blaxel:
    api_key: "${BLAXEL_API_KEY}"
    sandbox_id: "12-angry-agents"
    persistent: true  # Keep sandbox warm

  docker:
    image: "12-angry-agents:latest"
    memory_limit: "2g"


# Usage in code
class ExecutionManager:
    """Swappable execution environment."""

    def __init__(self, config_path: str = "config/execution.yaml"):
        self.config = load_yaml(config_path)
        self.mode = self.config["execution"]["mode"]

    def get_executor(self) -> Executor:
        if self.mode == "local":
            return LocalExecutor()
        elif self.mode == "blaxel":
            return BlaxelExecutor(self.config["execution"]["blaxel"])
        elif self.mode == "docker":
            return DockerExecutor(self.config["execution"]["docker"])

    async def run_agent_code(self, code: str, context: Dict) -> str:
        """Execute agent-generated code safely."""
        executor = self.get_executor()
        return await executor.run(code, context)

Player Input: Strategy + Optional Free Text

# Hybrid input: Low friction strategy selection + optional elaboration

ARGUMENT_STRATEGIES = [
    {
        "id": "challenge_evidence",
        "label": "Challenge Evidence",
        "prompt_hint": "Point out weaknesses in a specific piece of evidence",
        "allows_free_text": True,
    },
    {
        "id": "question_witness",
        "label": "Question Witness Credibility",
        "prompt_hint": "Raise doubts about a witness's reliability",
        "allows_free_text": True,
    },
    {
        "id": "reasonable_doubt",
        "label": "Appeal to Reasonable Doubt",
        "prompt_hint": "Emphasize the burden of proof",
        "allows_free_text": False,  # AI handles this
    },
    {
        "id": "alternative_theory",
        "label": "Present Alternative Theory",
        "prompt_hint": "Suggest what might have really happened",
        "allows_free_text": True,
    },
    {
        "id": "address_juror",
        "label": "Address Specific Juror",
        "prompt_hint": "Respond to or persuade a specific juror",
        "requires_target": True,
        "allows_free_text": True,
    },
    {
        "id": "free_argument",
        "label": "Make Custom Argument",
        "prompt_hint": "Say whatever you want",
        "allows_free_text": True,
        "required_free_text": True,
    },
]


# UI Component
def player_input_ui():
    with gr.Row():
        strategy = gr.Dropdown(
            choices=[s["label"] for s in ARGUMENT_STRATEGIES],
            label="Your Strategy",
            value="Challenge Evidence"
        )

        target_juror = gr.Dropdown(
            choices=["None"] + [f"Juror {i}" for i in range(1, 13) if i != 7],
            label="Target (optional)",
            visible=False  # Show only for "address_juror"
        )

    free_text = gr.Textbox(
        label="Add details (optional)",
        placeholder="e.g., 'Focus on the timeline inconsistency'",
        max_lines=2,
        visible=True
    )

    return strategy, target_juror, free_text

Open Questions

  1. Exact ElevenLabs voice ID for judge?
  2. Should external AI participants see other AI jurors' internal conviction scores? yes configuablein code.
  3. Max simultaneous external participants (performance)? 12
  4. Case difficulty selector in UI? no/ random