Spaces:
Configuration error
Configuration error
feat: Introduce new backend architecture with notebooks, sources, chat, and CLaRa models, alongside database schema and updated deployment scripts, while removing old frontend, deployment files, and previous backend components.
88f8604 Antigravity Notebook - Implementation Plan
Architecture Overview
A NotebookLM clone using Apple's CLaRa-7B-Instruct for latent compression, enabling "infinite context" by compressing text 16x and reasoning across multiple sources simultaneously.
Technology Stack
- Backend: FastAPI (Python)
- Database: PostgreSQL (via Docker Compose)
- ML Model: apple/CLaRa-7B-Instruct (HuggingFace Transformers)
- Storage: Local filesystem for latent tensors
- Frontend: Streamlit
- Containerization: Docker Compose
Database Schema (PostgreSQL)
Tables
notebooks
CREATE TABLE notebooks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
sources
CREATE TABLE sources (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
notebook_id UUID REFERENCES notebooks(id) ON DELETE CASCADE,
source_type VARCHAR(10) NOT NULL CHECK (source_type IN ('pdf', 'url', 'text')),
filename VARCHAR(255),
url TEXT,
content_hash VARCHAR(64),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
metadata JSONB
);
latent_tensors
CREATE TABLE latent_tensors (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_id UUID REFERENCES sources(id) ON DELETE CASCADE,
tensor_path VARCHAR(512) NOT NULL,
segment_index INTEGER NOT NULL,
token_count INTEGER NOT NULL,
original_text_length INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
metadata JSONB
);
chat_messages
CREATE TABLE chat_messages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
notebook_id UUID REFERENCES notebooks(id) ON DELETE CASCADE,
role VARCHAR(20) NOT NULL CHECK (role IN ('user', 'assistant')),
content TEXT NOT NULL,
sources_used JSONB,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Directory Structure
antigravity/
βββ docker-compose.yml
βββ requirements.txt
βββ .env.example
βββ backend/
β βββ main.py # FastAPI application entry
β βββ database.py # Database connection & models
β βββ config.py # Configuration management
β βββ models/
β β βββ clara.py # CLaRa model wrapper
β β βββ schemas.py # Pydantic schemas
β βββ services/
β β βββ ingestion.py # Multi-modal ingestion (PDF, URL, Text)
β β βββ context_manager.py # "Infinite Context" logic
β β βββ storage.py # Tensor storage management
β β βββ retrieval.py # CLaRa retrieval & ranking
β βββ routers/
β βββ notebooks.py # Notebook CRUD
β βββ sources.py # Source management
β βββ chat.py # Chat with notebook context
βββ frontend/
β βββ app_notebook.py # Streamlit NotebookLM UI
βββ data/
βββ latent_tensors/ # Local storage for tensors
Core Components
1. CLaRa Model Wrapper (models/clara.py)
- Load apple/CLaRa-7B-Instruct from HuggingFace
- Implement compression (text β latent tensors)
- Implement generation (latent tensors β text)
- Implement ranking (score tensors against query)
2. Context Manager (services/context_manager.py)
The "Brain" of NotebookLM functionality
Key Logic:
- Fetch all latent tensors for a notebook
- Calculate total token count
- If <= 32k tokens: Stack ALL tensors (whole-notebook reasoning)
- If > 32k tokens: Use CLaRa retrieval to rank and select top-N tensors
3. Multi-Modal Ingestion (services/ingestion.py)
- PDF: PyPDF2/pdfplumber β text extraction β CLaRa compression
- URL: BeautifulSoup/Trafilatura β web scraping β CLaRa compression
- Text: Direct input β CLaRa compression
Chunking Strategy:
- Split text into 2048-token chunks (pre-compression)
- Compress each chunk β latent tensor
- Store in
latent_tensors/directory - Record metadata in PostgreSQL
4. Storage Service (services/storage.py)
- Save/load latent tensors to/from filesystem
- Path structure:
data/latent_tensors/{notebook_id}/{source_id}/segment_{idx}.pt - Track tensor metadata in database
5. Chat Router (routers/chat.py)
- Accept
notebook_id+query - Use ContextManager to prepare context
- Generate response with CLaRa
- Return answer + citation metadata (which sources were used)
API Endpoints
Notebooks
POST /notebooks/- Create notebookGET /notebooks/- List notebooksGET /notebooks/{id}- Get notebook detailsDELETE /notebooks/{id}- Delete notebook
Sources
POST /notebooks/{id}/sources/upload- Upload PDFPOST /notebooks/{id}/sources/url- Add URL sourcePOST /notebooks/{id}/sources/text- Add text sourceGET /notebooks/{id}/sources- List sourcesDELETE /sources/{id}- Delete source
Chat
POST /notebooks/{id}/chat- Query notebookGET /notebooks/{id}/messages- Get chat history
Streamlit UI (app_notebook.py)
Layout
Left Sidebar:
- Notebook selector dropdown
- "Create New Notebook" button
- List of sources in current notebook
- "Add Source" section:
- File uploader (PDF)
- Text input (URL)
- Text area (direct text)
- Memory usage meter:
{used_tokens} / 32k tokens
Main Stage:
- Chat interface (chat history + input box)
- Citations display: Each assistant message shows
[Source: filename.pdf]tags - Expandable source preview on citation click
Implementation Phases
Phase 1: Backend Foundation
- Docker Compose setup (PostgreSQL)
- Database models & migrations
- CLaRa model wrapper (basic load + compress)
- Storage service (save/load tensors)
Phase 2: Ingestion Pipeline
- PDF ingestion
- URL scraping
- Text chunking & compression
- Source management API
Phase 3: Context Manager (Critical)
- Implement whole-context stacking
- Implement retrieval-based selection
- Add token counting logic
Phase 4: Chat System
- Chat router with notebook context
- Citation extraction
- Chat history storage
Phase 5: Streamlit UI
- Notebook management interface
- Source upload/management
- Chat interface
- Citation display
Key Technical Challenges
Challenge 1: CLaRa Model Memory
- CLaRa-7B is ~14GB in fp16
- Solution: Load model once at startup, keep in GPU memory
- Use
torch.cuda.ampfor mixed precision
Challenge 2: Tensor Storage Efficiency
- Latent tensors can be large (100s of MBs per document)
- Solution: Use
torch.save()with compression - Implement lazy loading (only load when needed)
Challenge 3: Context Budget Management
- Need accurate token counting for latent tensors
- Solution: Store
token_countin database during ingestion - Pre-calculate total context before generation
Challenge 4: Citation Tracking
- Must track which source each latent segment came from
- Solution: Maintain
source_mapin ContextManager - Include source IDs in generation metadata
Success Criteria
β Users can create notebooks β Users can upload PDFs, paste URLs, add text β System compresses content into latent tensors β Chat queries reason across ALL sources simultaneously β Responses include source citations β Context manager intelligently handles >32k token notebooks β UI shows memory usage meter
Performance Targets
- Ingestion: <30s for 50-page PDF
- Chat response: <10s for full notebook query
- Support 10-20 average-sized books per notebook
Next Steps
- Set up project structure
- Implement Docker Compose + database
- Build CLaRa wrapper
- Implement ingestion pipeline
- Build context manager
- Create APIs
- Build Streamlit UI