translation_app / ANTIGRAVITY_PLAN.md
Athena1621's picture
feat: Introduce new backend architecture with notebooks, sources, chat, and CLaRa models, alongside database schema and updated deployment scripts, while removing old frontend, deployment files, and previous backend components.
88f8604

Antigravity Notebook - Implementation Plan

Architecture Overview

A NotebookLM clone using Apple's CLaRa-7B-Instruct for latent compression, enabling "infinite context" by compressing text 16x and reasoning across multiple sources simultaneously.

Technology Stack

  • Backend: FastAPI (Python)
  • Database: PostgreSQL (via Docker Compose)
  • ML Model: apple/CLaRa-7B-Instruct (HuggingFace Transformers)
  • Storage: Local filesystem for latent tensors
  • Frontend: Streamlit
  • Containerization: Docker Compose

Database Schema (PostgreSQL)

Tables

notebooks

CREATE TABLE notebooks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    description TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

sources

CREATE TABLE sources (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    notebook_id UUID REFERENCES notebooks(id) ON DELETE CASCADE,
    source_type VARCHAR(10) NOT NULL CHECK (source_type IN ('pdf', 'url', 'text')),
    filename VARCHAR(255),
    url TEXT,
    content_hash VARCHAR(64),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    metadata JSONB
);

latent_tensors

CREATE TABLE latent_tensors (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    source_id UUID REFERENCES sources(id) ON DELETE CASCADE,
    tensor_path VARCHAR(512) NOT NULL,
    segment_index INTEGER NOT NULL,
    token_count INTEGER NOT NULL,
    original_text_length INTEGER,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    metadata JSONB
);

chat_messages

CREATE TABLE chat_messages (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    notebook_id UUID REFERENCES notebooks(id) ON DELETE CASCADE,
    role VARCHAR(20) NOT NULL CHECK (role IN ('user', 'assistant')),
    content TEXT NOT NULL,
    sources_used JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Directory Structure

antigravity/
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py                 # FastAPI application entry
β”‚   β”œβ”€β”€ database.py             # Database connection & models
β”‚   β”œβ”€β”€ config.py               # Configuration management
β”‚   β”œβ”€β”€ models/
β”‚   β”‚   β”œβ”€β”€ clara.py            # CLaRa model wrapper
β”‚   β”‚   └── schemas.py          # Pydantic schemas
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ ingestion.py        # Multi-modal ingestion (PDF, URL, Text)
β”‚   β”‚   β”œβ”€β”€ context_manager.py  # "Infinite Context" logic
β”‚   β”‚   β”œβ”€β”€ storage.py          # Tensor storage management
β”‚   β”‚   └── retrieval.py        # CLaRa retrieval & ranking
β”‚   └── routers/
β”‚       β”œβ”€β”€ notebooks.py        # Notebook CRUD
β”‚       β”œβ”€β”€ sources.py          # Source management
β”‚       └── chat.py             # Chat with notebook context
β”œβ”€β”€ frontend/
β”‚   └── app_notebook.py         # Streamlit NotebookLM UI
└── data/
    └── latent_tensors/         # Local storage for tensors

Core Components

1. CLaRa Model Wrapper (models/clara.py)

  • Load apple/CLaRa-7B-Instruct from HuggingFace
  • Implement compression (text β†’ latent tensors)
  • Implement generation (latent tensors β†’ text)
  • Implement ranking (score tensors against query)

2. Context Manager (services/context_manager.py)

The "Brain" of NotebookLM functionality

Key Logic:

  1. Fetch all latent tensors for a notebook
  2. Calculate total token count
  3. If <= 32k tokens: Stack ALL tensors (whole-notebook reasoning)
  4. If > 32k tokens: Use CLaRa retrieval to rank and select top-N tensors

3. Multi-Modal Ingestion (services/ingestion.py)

  • PDF: PyPDF2/pdfplumber β†’ text extraction β†’ CLaRa compression
  • URL: BeautifulSoup/Trafilatura β†’ web scraping β†’ CLaRa compression
  • Text: Direct input β†’ CLaRa compression

Chunking Strategy:

  • Split text into 2048-token chunks (pre-compression)
  • Compress each chunk β†’ latent tensor
  • Store in latent_tensors/ directory
  • Record metadata in PostgreSQL

4. Storage Service (services/storage.py)

  • Save/load latent tensors to/from filesystem
  • Path structure: data/latent_tensors/{notebook_id}/{source_id}/segment_{idx}.pt
  • Track tensor metadata in database

5. Chat Router (routers/chat.py)

  • Accept notebook_id + query
  • Use ContextManager to prepare context
  • Generate response with CLaRa
  • Return answer + citation metadata (which sources were used)

API Endpoints

Notebooks

  • POST /notebooks/ - Create notebook
  • GET /notebooks/ - List notebooks
  • GET /notebooks/{id} - Get notebook details
  • DELETE /notebooks/{id} - Delete notebook

Sources

  • POST /notebooks/{id}/sources/upload - Upload PDF
  • POST /notebooks/{id}/sources/url - Add URL source
  • POST /notebooks/{id}/sources/text - Add text source
  • GET /notebooks/{id}/sources - List sources
  • DELETE /sources/{id} - Delete source

Chat

  • POST /notebooks/{id}/chat - Query notebook
  • GET /notebooks/{id}/messages - Get chat history

Streamlit UI (app_notebook.py)

Layout

Left Sidebar:

  • Notebook selector dropdown
  • "Create New Notebook" button
  • List of sources in current notebook
  • "Add Source" section:
    • File uploader (PDF)
    • Text input (URL)
    • Text area (direct text)
  • Memory usage meter: {used_tokens} / 32k tokens

Main Stage:

  • Chat interface (chat history + input box)
  • Citations display: Each assistant message shows [Source: filename.pdf] tags
  • Expandable source preview on citation click

Implementation Phases

Phase 1: Backend Foundation

  1. Docker Compose setup (PostgreSQL)
  2. Database models & migrations
  3. CLaRa model wrapper (basic load + compress)
  4. Storage service (save/load tensors)

Phase 2: Ingestion Pipeline

  1. PDF ingestion
  2. URL scraping
  3. Text chunking & compression
  4. Source management API

Phase 3: Context Manager (Critical)

  1. Implement whole-context stacking
  2. Implement retrieval-based selection
  3. Add token counting logic

Phase 4: Chat System

  1. Chat router with notebook context
  2. Citation extraction
  3. Chat history storage

Phase 5: Streamlit UI

  1. Notebook management interface
  2. Source upload/management
  3. Chat interface
  4. Citation display

Key Technical Challenges

Challenge 1: CLaRa Model Memory

  • CLaRa-7B is ~14GB in fp16
  • Solution: Load model once at startup, keep in GPU memory
  • Use torch.cuda.amp for mixed precision

Challenge 2: Tensor Storage Efficiency

  • Latent tensors can be large (100s of MBs per document)
  • Solution: Use torch.save() with compression
  • Implement lazy loading (only load when needed)

Challenge 3: Context Budget Management

  • Need accurate token counting for latent tensors
  • Solution: Store token_count in database during ingestion
  • Pre-calculate total context before generation

Challenge 4: Citation Tracking

  • Must track which source each latent segment came from
  • Solution: Maintain source_map in ContextManager
  • Include source IDs in generation metadata

Success Criteria

βœ… Users can create notebooks βœ… Users can upload PDFs, paste URLs, add text βœ… System compresses content into latent tensors βœ… Chat queries reason across ALL sources simultaneously βœ… Responses include source citations βœ… Context manager intelligently handles >32k token notebooks βœ… UI shows memory usage meter

Performance Targets

  • Ingestion: <30s for 50-page PDF
  • Chat response: <10s for full notebook query
  • Support 10-20 average-sized books per notebook

Next Steps

  1. Set up project structure
  2. Implement Docker Compose + database
  3. Build CLaRa wrapper
  4. Implement ingestion pipeline
  5. Build context manager
  6. Create APIs
  7. Build Streamlit UI