Spaces:

Athena1621
/

translation_app

Configuration error

translation_app / ANTIGRAVITY_PLAN.md

feat: Introduce new backend architecture with notebooks, sources, chat, and CLaRa models, alongside database schema and updated deployment scripts, while removing old frontend, deployment files, and previous backend components.

88f8604 2 months ago

preview code

raw

history blame contribute delete

7.89 kB

	# Antigravity Notebook - Implementation Plan

	## Architecture Overview
	A NotebookLM clone using Apple's CLaRa-7B-Instruct for latent compression, enabling "infinite context" by compressing text 16x and reasoning across multiple sources simultaneously.

	## Technology Stack
	- Backend: FastAPI (Python)
	- Database: PostgreSQL (via Docker Compose)
	- ML Model: apple/CLaRa-7B-Instruct (HuggingFace Transformers)
	- Storage: Local filesystem for latent tensors
	- Frontend: Streamlit
	- Containerization: Docker Compose

	## Database Schema (PostgreSQL)

	### Tables

	#### notebooks
	```sql
	CREATE TABLE notebooks (
	id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
	name VARCHAR(255) NOT NULL,
	description TEXT,
	created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
	updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
	);
	```

	#### sources
	```sql
	CREATE TABLE sources (
	id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
	notebook_id UUID REFERENCES notebooks(id) ON DELETE CASCADE,
	source_type VARCHAR(10) NOT NULL CHECK (source_type IN ('pdf', 'url', 'text')),
	filename VARCHAR(255),
	url TEXT,
	content_hash VARCHAR(64),
	created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
	metadata JSONB
	);
	```

	#### latent_tensors
	```sql
	CREATE TABLE latent_tensors (
	id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
	source_id UUID REFERENCES sources(id) ON DELETE CASCADE,
	tensor_path VARCHAR(512) NOT NULL,
	segment_index INTEGER NOT NULL,
	token_count INTEGER NOT NULL,
	original_text_length INTEGER,
	created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
	metadata JSONB
	);
	```

	#### chat_messages
	```sql
	CREATE TABLE chat_messages (
	id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
	notebook_id UUID REFERENCES notebooks(id) ON DELETE CASCADE,
	role VARCHAR(20) NOT NULL CHECK (role IN ('user', 'assistant')),
	content TEXT NOT NULL,
	sources_used JSONB,
	created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
	);
	```

	## Directory Structure

	```
	antigravity/
	├── docker-compose.yml
	├── requirements.txt
	├── .env.example
	├── backend/
	│ ├── main.py # FastAPI application entry
	│ ├── database.py # Database connection & models
	│ ├── config.py # Configuration management
	│ ├── models/
	│ │ ├── clara.py # CLaRa model wrapper
	│ │ └── schemas.py # Pydantic schemas
	│ ├── services/
	│ │ ├── ingestion.py # Multi-modal ingestion (PDF, URL, Text)
	│ │ ├── context_manager.py # "Infinite Context" logic
	│ │ ├── storage.py # Tensor storage management
	│ │ └── retrieval.py # CLaRa retrieval & ranking
	│ └── routers/
	│ ├── notebooks.py # Notebook CRUD
	│ ├── sources.py # Source management
	│ └── chat.py # Chat with notebook context
	├── frontend/
	│ └── app_notebook.py # Streamlit NotebookLM UI
	└── data/
	└── latent_tensors/ # Local storage for tensors
	```

	## Core Components

	### 1. CLaRa Model Wrapper (`models/clara.py`)
	- Load apple/CLaRa-7B-Instruct from HuggingFace
	- Implement compression (text → latent tensors)
	- Implement generation (latent tensors → text)
	- Implement ranking (score tensors against query)

	### 2. Context Manager (`services/context_manager.py`)
	The "Brain" of NotebookLM functionality

	Key Logic:
	1. Fetch all latent tensors for a notebook
	2. Calculate total token count
	3. If <= 32k tokens: Stack ALL tensors (whole-notebook reasoning)
	4. If > 32k tokens: Use CLaRa retrieval to rank and select top-N tensors

	### 3. Multi-Modal Ingestion (`services/ingestion.py`)
	- PDF: PyPDF2/pdfplumber → text extraction → CLaRa compression
	- URL: BeautifulSoup/Trafilatura → web scraping → CLaRa compression
	- Text: Direct input → CLaRa compression

	Chunking Strategy:
	- Split text into 2048-token chunks (pre-compression)
	- Compress each chunk → latent tensor
	- Store in `latent_tensors/` directory
	- Record metadata in PostgreSQL

	### 4. Storage Service (`services/storage.py`)
	- Save/load latent tensors to/from filesystem
	- Path structure: `data/latent_tensors/{notebook_id}/{source_id}/segment_{idx}.pt`
	- Track tensor metadata in database

	### 5. Chat Router (`routers/chat.py`)
	- Accept `notebook_id` + `query`
	- Use ContextManager to prepare context
	- Generate response with CLaRa
	- Return answer + citation metadata (which sources were used)

	## API Endpoints

	### Notebooks
	- `POST /notebooks/` - Create notebook
	- `GET /notebooks/` - List notebooks
	- `GET /notebooks/{id}` - Get notebook details
	- `DELETE /notebooks/{id}` - Delete notebook

	### Sources
	- `POST /notebooks/{id}/sources/upload` - Upload PDF
	- `POST /notebooks/{id}/sources/url` - Add URL source
	- `POST /notebooks/{id}/sources/text` - Add text source
	- `GET /notebooks/{id}/sources` - List sources
	- `DELETE /sources/{id}` - Delete source

	### Chat
	- `POST /notebooks/{id}/chat` - Query notebook
	- `GET /notebooks/{id}/messages` - Get chat history

	## Streamlit UI (`app_notebook.py`)

	### Layout
	Left Sidebar:
	- Notebook selector dropdown
	- "Create New Notebook" button
	- List of sources in current notebook
	- "Add Source" section:
	- File uploader (PDF)
	- Text input (URL)
	- Text area (direct text)
	- Memory usage meter: `{used_tokens} / 32k tokens`

	Main Stage:
	- Chat interface (chat history + input box)
	- Citations display: Each assistant message shows `[Source: filename.pdf]` tags
	- Expandable source preview on citation click

	## Implementation Phases

	### Phase 1: Backend Foundation
	1. Docker Compose setup (PostgreSQL)
	2. Database models & migrations
	3. CLaRa model wrapper (basic load + compress)
	4. Storage service (save/load tensors)

	### Phase 2: Ingestion Pipeline
	1. PDF ingestion
	2. URL scraping
	3. Text chunking & compression
	4. Source management API

	### Phase 3: Context Manager (Critical)
	1. Implement whole-context stacking
	2. Implement retrieval-based selection
	3. Add token counting logic

	### Phase 4: Chat System
	1. Chat router with notebook context
	2. Citation extraction
	3. Chat history storage

	### Phase 5: Streamlit UI
	1. Notebook management interface
	2. Source upload/management
	3. Chat interface
	4. Citation display

	## Key Technical Challenges

	### Challenge 1: CLaRa Model Memory
	- CLaRa-7B is ~14GB in fp16
	- Solution: Load model once at startup, keep in GPU memory
	- Use `torch.cuda.amp` for mixed precision

	### Challenge 2: Tensor Storage Efficiency
	- Latent tensors can be large (100s of MBs per document)
	- Solution: Use `torch.save()` with compression
	- Implement lazy loading (only load when needed)

	### Challenge 3: Context Budget Management
	- Need accurate token counting for latent tensors
	- Solution: Store `token_count` in database during ingestion
	- Pre-calculate total context before generation

	### Challenge 4: Citation Tracking
	- Must track which source each latent segment came from
	- Solution: Maintain `source_map` in ContextManager
	- Include source IDs in generation metadata

	## Success Criteria
	✅ Users can create notebooks
	✅ Users can upload PDFs, paste URLs, add text
	✅ System compresses content into latent tensors
	✅ Chat queries reason across ALL sources simultaneously
	✅ Responses include source citations
	✅ Context manager intelligently handles >32k token notebooks
	✅ UI shows memory usage meter

	## Performance Targets
	- Ingestion: <30s for 50-page PDF
	- Chat response: <10s for full notebook query
	- Support 10-20 average-sized books per notebook

	## Next Steps
	1. Set up project structure
	2. Implement Docker Compose + database
	3. Build CLaRa wrapper
	4. Implement ingestion pipeline
	5. Build context manager
	6. Create APIs
	7. Build Streamlit UI