translation_app / ANTIGRAVITY_PLAN.md
Athena1621's picture
feat: Introduce new backend architecture with notebooks, sources, chat, and CLaRa models, alongside database schema and updated deployment scripts, while removing old frontend, deployment files, and previous backend components.
88f8604
# Antigravity Notebook - Implementation Plan
## Architecture Overview
A NotebookLM clone using Apple's CLaRa-7B-Instruct for latent compression, enabling "infinite context" by compressing text 16x and reasoning across multiple sources simultaneously.
## Technology Stack
- **Backend**: FastAPI (Python)
- **Database**: PostgreSQL (via Docker Compose)
- **ML Model**: apple/CLaRa-7B-Instruct (HuggingFace Transformers)
- **Storage**: Local filesystem for latent tensors
- **Frontend**: Streamlit
- **Containerization**: Docker Compose
## Database Schema (PostgreSQL)
### Tables
#### notebooks
```sql
CREATE TABLE notebooks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
#### sources
```sql
CREATE TABLE sources (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
notebook_id UUID REFERENCES notebooks(id) ON DELETE CASCADE,
source_type VARCHAR(10) NOT NULL CHECK (source_type IN ('pdf', 'url', 'text')),
filename VARCHAR(255),
url TEXT,
content_hash VARCHAR(64),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
metadata JSONB
);
```
#### latent_tensors
```sql
CREATE TABLE latent_tensors (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_id UUID REFERENCES sources(id) ON DELETE CASCADE,
tensor_path VARCHAR(512) NOT NULL,
segment_index INTEGER NOT NULL,
token_count INTEGER NOT NULL,
original_text_length INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
metadata JSONB
);
```
#### chat_messages
```sql
CREATE TABLE chat_messages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
notebook_id UUID REFERENCES notebooks(id) ON DELETE CASCADE,
role VARCHAR(20) NOT NULL CHECK (role IN ('user', 'assistant')),
content TEXT NOT NULL,
sources_used JSONB,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
## Directory Structure
```
antigravity/
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
β”œβ”€β”€ backend/
β”‚ β”œβ”€β”€ main.py # FastAPI application entry
β”‚ β”œβ”€β”€ database.py # Database connection & models
β”‚ β”œβ”€β”€ config.py # Configuration management
β”‚ β”œβ”€β”€ models/
β”‚ β”‚ β”œβ”€β”€ clara.py # CLaRa model wrapper
β”‚ β”‚ └── schemas.py # Pydantic schemas
β”‚ β”œβ”€β”€ services/
β”‚ β”‚ β”œβ”€β”€ ingestion.py # Multi-modal ingestion (PDF, URL, Text)
β”‚ β”‚ β”œβ”€β”€ context_manager.py # "Infinite Context" logic
β”‚ β”‚ β”œβ”€β”€ storage.py # Tensor storage management
β”‚ β”‚ └── retrieval.py # CLaRa retrieval & ranking
β”‚ └── routers/
β”‚ β”œβ”€β”€ notebooks.py # Notebook CRUD
β”‚ β”œβ”€β”€ sources.py # Source management
β”‚ └── chat.py # Chat with notebook context
β”œβ”€β”€ frontend/
β”‚ └── app_notebook.py # Streamlit NotebookLM UI
└── data/
└── latent_tensors/ # Local storage for tensors
```
## Core Components
### 1. CLaRa Model Wrapper (`models/clara.py`)
- Load apple/CLaRa-7B-Instruct from HuggingFace
- Implement compression (text β†’ latent tensors)
- Implement generation (latent tensors β†’ text)
- Implement ranking (score tensors against query)
### 2. Context Manager (`services/context_manager.py`)
**The "Brain" of NotebookLM functionality**
Key Logic:
1. Fetch all latent tensors for a notebook
2. Calculate total token count
3. If <= 32k tokens: Stack ALL tensors (whole-notebook reasoning)
4. If > 32k tokens: Use CLaRa retrieval to rank and select top-N tensors
### 3. Multi-Modal Ingestion (`services/ingestion.py`)
- **PDF**: PyPDF2/pdfplumber β†’ text extraction β†’ CLaRa compression
- **URL**: BeautifulSoup/Trafilatura β†’ web scraping β†’ CLaRa compression
- **Text**: Direct input β†’ CLaRa compression
Chunking Strategy:
- Split text into 2048-token chunks (pre-compression)
- Compress each chunk β†’ latent tensor
- Store in `latent_tensors/` directory
- Record metadata in PostgreSQL
### 4. Storage Service (`services/storage.py`)
- Save/load latent tensors to/from filesystem
- Path structure: `data/latent_tensors/{notebook_id}/{source_id}/segment_{idx}.pt`
- Track tensor metadata in database
### 5. Chat Router (`routers/chat.py`)
- Accept `notebook_id` + `query`
- Use ContextManager to prepare context
- Generate response with CLaRa
- Return answer + citation metadata (which sources were used)
## API Endpoints
### Notebooks
- `POST /notebooks/` - Create notebook
- `GET /notebooks/` - List notebooks
- `GET /notebooks/{id}` - Get notebook details
- `DELETE /notebooks/{id}` - Delete notebook
### Sources
- `POST /notebooks/{id}/sources/upload` - Upload PDF
- `POST /notebooks/{id}/sources/url` - Add URL source
- `POST /notebooks/{id}/sources/text` - Add text source
- `GET /notebooks/{id}/sources` - List sources
- `DELETE /sources/{id}` - Delete source
### Chat
- `POST /notebooks/{id}/chat` - Query notebook
- `GET /notebooks/{id}/messages` - Get chat history
## Streamlit UI (`app_notebook.py`)
### Layout
**Left Sidebar:**
- Notebook selector dropdown
- "Create New Notebook" button
- List of sources in current notebook
- "Add Source" section:
- File uploader (PDF)
- Text input (URL)
- Text area (direct text)
- Memory usage meter: `{used_tokens} / 32k tokens`
**Main Stage:**
- Chat interface (chat history + input box)
- Citations display: Each assistant message shows `[Source: filename.pdf]` tags
- Expandable source preview on citation click
## Implementation Phases
### Phase 1: Backend Foundation
1. Docker Compose setup (PostgreSQL)
2. Database models & migrations
3. CLaRa model wrapper (basic load + compress)
4. Storage service (save/load tensors)
### Phase 2: Ingestion Pipeline
1. PDF ingestion
2. URL scraping
3. Text chunking & compression
4. Source management API
### Phase 3: Context Manager (Critical)
1. Implement whole-context stacking
2. Implement retrieval-based selection
3. Add token counting logic
### Phase 4: Chat System
1. Chat router with notebook context
2. Citation extraction
3. Chat history storage
### Phase 5: Streamlit UI
1. Notebook management interface
2. Source upload/management
3. Chat interface
4. Citation display
## Key Technical Challenges
### Challenge 1: CLaRa Model Memory
- CLaRa-7B is ~14GB in fp16
- Solution: Load model once at startup, keep in GPU memory
- Use `torch.cuda.amp` for mixed precision
### Challenge 2: Tensor Storage Efficiency
- Latent tensors can be large (100s of MBs per document)
- Solution: Use `torch.save()` with compression
- Implement lazy loading (only load when needed)
### Challenge 3: Context Budget Management
- Need accurate token counting for latent tensors
- Solution: Store `token_count` in database during ingestion
- Pre-calculate total context before generation
### Challenge 4: Citation Tracking
- Must track which source each latent segment came from
- Solution: Maintain `source_map` in ContextManager
- Include source IDs in generation metadata
## Success Criteria
βœ… Users can create notebooks
βœ… Users can upload PDFs, paste URLs, add text
βœ… System compresses content into latent tensors
βœ… Chat queries reason across ALL sources simultaneously
βœ… Responses include source citations
βœ… Context manager intelligently handles >32k token notebooks
βœ… UI shows memory usage meter
## Performance Targets
- Ingestion: <30s for 50-page PDF
- Chat response: <10s for full notebook query
- Support 10-20 average-sized books per notebook
## Next Steps
1. Set up project structure
2. Implement Docker Compose + database
3. Build CLaRa wrapper
4. Implement ingestion pipeline
5. Build context manager
6. Create APIs
7. Build Streamlit UI