Spaces:

Athena1621
/

translation_app

Configuration error

translation_app / ANTIGRAVITY_PLAN.md

feat: Introduce new backend architecture with notebooks, sources, chat, and CLaRa models, alongside database schema and updated deployment scripts, while removing old frontend, deployment files, and previous backend components.

88f8604 2 months ago

preview code

raw

history blame contribute delete

7.89 kB

Antigravity Notebook - Implementation Plan

Architecture Overview

A NotebookLM clone using Apple's CLaRa-7B-Instruct for latent compression, enabling "infinite context" by compressing text 16x and reasoning across multiple sources simultaneously.

Technology Stack

Backend: FastAPI (Python)
Database: PostgreSQL (via Docker Compose)
ML Model: apple/CLaRa-7B-Instruct (HuggingFace Transformers)
Storage: Local filesystem for latent tensors
Frontend: Streamlit
Containerization: Docker Compose

Database Schema (PostgreSQL)

Tables

notebooks

CREATE TABLE notebooks (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    description TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

sources

CREATE TABLE sources (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    notebook_id UUID REFERENCES notebooks(id) ON DELETE CASCADE,
    source_type VARCHAR(10) NOT NULL CHECK (source_type IN ('pdf', 'url', 'text')),
    filename VARCHAR(255),
    url TEXT,
    content_hash VARCHAR(64),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    metadata JSONB
);

latent_tensors

CREATE TABLE latent_tensors (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    source_id UUID REFERENCES sources(id) ON DELETE CASCADE,
    tensor_path VARCHAR(512) NOT NULL,
    segment_index INTEGER NOT NULL,
    token_count INTEGER NOT NULL,
    original_text_length INTEGER,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    metadata JSONB
);

chat_messages

CREATE TABLE chat_messages (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    notebook_id UUID REFERENCES notebooks(id) ON DELETE CASCADE,
    role VARCHAR(20) NOT NULL CHECK (role IN ('user', 'assistant')),
    content TEXT NOT NULL,
    sources_used JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Directory Structure

antigravity/
├── docker-compose.yml
├── requirements.txt
├── .env.example
├── backend/
│   ├── main.py                 # FastAPI application entry
│   ├── database.py             # Database connection & models
│   ├── config.py               # Configuration management
│   ├── models/
│   │   ├── clara.py            # CLaRa model wrapper
│   │   └── schemas.py          # Pydantic schemas
│   ├── services/
│   │   ├── ingestion.py        # Multi-modal ingestion (PDF, URL, Text)
│   │   ├── context_manager.py  # "Infinite Context" logic
│   │   ├── storage.py          # Tensor storage management
│   │   └── retrieval.py        # CLaRa retrieval & ranking
│   └── routers/
│       ├── notebooks.py        # Notebook CRUD
│       ├── sources.py          # Source management
│       └── chat.py             # Chat with notebook context
├── frontend/
│   └── app_notebook.py         # Streamlit NotebookLM UI
└── data/
    └── latent_tensors/         # Local storage for tensors

Core Components

1. CLaRa Model Wrapper (`models/clara.py`)

Load apple/CLaRa-7B-Instruct from HuggingFace
Implement compression (text → latent tensors)
Implement generation (latent tensors → text)
Implement ranking (score tensors against query)

2. Context Manager (`services/context_manager.py`)

The "Brain" of NotebookLM functionality

Key Logic:

Fetch all latent tensors for a notebook
Calculate total token count
If <= 32k tokens: Stack ALL tensors (whole-notebook reasoning)
If > 32k tokens: Use CLaRa retrieval to rank and select top-N tensors

3. Multi-Modal Ingestion (`services/ingestion.py`)

PDF: PyPDF2/pdfplumber → text extraction → CLaRa compression
URL: BeautifulSoup/Trafilatura → web scraping → CLaRa compression
Text: Direct input → CLaRa compression

Chunking Strategy:

Split text into 2048-token chunks (pre-compression)
Compress each chunk → latent tensor
Store in latent_tensors/ directory
Record metadata in PostgreSQL

4. Storage Service (`services/storage.py`)

Save/load latent tensors to/from filesystem
Path structure: data/latent_tensors/{notebook_id}/{source_id}/segment_{idx}.pt
Track tensor metadata in database

5. Chat Router (`routers/chat.py`)

Accept notebook_id + query
Use ContextManager to prepare context
Generate response with CLaRa
Return answer + citation metadata (which sources were used)

API Endpoints

Notebooks

POST /notebooks/ - Create notebook
GET /notebooks/ - List notebooks
GET /notebooks/{id} - Get notebook details
DELETE /notebooks/{id} - Delete notebook

Sources

POST /notebooks/{id}/sources/upload - Upload PDF
POST /notebooks/{id}/sources/url - Add URL source
POST /notebooks/{id}/sources/text - Add text source
GET /notebooks/{id}/sources - List sources
DELETE /sources/{id} - Delete source

Chat

POST /notebooks/{id}/chat - Query notebook
GET /notebooks/{id}/messages - Get chat history

Streamlit UI (`app_notebook.py`)

Layout

Left Sidebar:

Notebook selector dropdown
"Create New Notebook" button
List of sources in current notebook
"Add Source" section:
- File uploader (PDF)
- Text input (URL)
- Text area (direct text)
Memory usage meter: {used_tokens} / 32k tokens

Main Stage:

Chat interface (chat history + input box)
Citations display: Each assistant message shows [Source: filename.pdf] tags
Expandable source preview on citation click

Implementation Phases

Phase 1: Backend Foundation

Docker Compose setup (PostgreSQL)
Database models & migrations
CLaRa model wrapper (basic load + compress)
Storage service (save/load tensors)

Phase 2: Ingestion Pipeline

PDF ingestion
URL scraping
Text chunking & compression
Source management API

Phase 3: Context Manager (Critical)

Implement whole-context stacking
Implement retrieval-based selection
Add token counting logic

Phase 4: Chat System

Chat router with notebook context
Citation extraction
Chat history storage

Phase 5: Streamlit UI

Notebook management interface
Source upload/management
Chat interface
Citation display

Key Technical Challenges

Challenge 1: CLaRa Model Memory

CLaRa-7B is ~14GB in fp16
Solution: Load model once at startup, keep in GPU memory
Use torch.cuda.amp for mixed precision

Challenge 2: Tensor Storage Efficiency

Latent tensors can be large (100s of MBs per document)
Solution: Use torch.save() with compression
Implement lazy loading (only load when needed)

Challenge 3: Context Budget Management

Need accurate token counting for latent tensors
Solution: Store token_count in database during ingestion
Pre-calculate total context before generation

Challenge 4: Citation Tracking

Must track which source each latent segment came from
Solution: Maintain source_map in ContextManager
Include source IDs in generation metadata

Success Criteria

✅ Users can create notebooks ✅ Users can upload PDFs, paste URLs, add text ✅ System compresses content into latent tensors ✅ Chat queries reason across ALL sources simultaneously ✅ Responses include source citations ✅ Context manager intelligently handles >32k token notebooks ✅ UI shows memory usage meter

Performance Targets

Ingestion: <30s for 50-page PDF
Chat response: <10s for full notebook query
Support 10-20 average-sized books per notebook

Next Steps

Set up project structure
Implement Docker Compose + database
Build CLaRa wrapper
Implement ingestion pipeline
Build context manager
Create APIs
Build Streamlit UI