Spaces:
Configuration error
Configuration error
feat: Introduce new backend architecture with notebooks, sources, chat, and CLaRa models, alongside database schema and updated deployment scripts, while removing old frontend, deployment files, and previous backend components.
88f8604 | # Antigravity Notebook - Implementation Plan | |
| ## Architecture Overview | |
| A NotebookLM clone using Apple's CLaRa-7B-Instruct for latent compression, enabling "infinite context" by compressing text 16x and reasoning across multiple sources simultaneously. | |
| ## Technology Stack | |
| - **Backend**: FastAPI (Python) | |
| - **Database**: PostgreSQL (via Docker Compose) | |
| - **ML Model**: apple/CLaRa-7B-Instruct (HuggingFace Transformers) | |
| - **Storage**: Local filesystem for latent tensors | |
| - **Frontend**: Streamlit | |
| - **Containerization**: Docker Compose | |
| ## Database Schema (PostgreSQL) | |
| ### Tables | |
| #### notebooks | |
| ```sql | |
| CREATE TABLE notebooks ( | |
| id UUID PRIMARY KEY DEFAULT gen_random_uuid(), | |
| name VARCHAR(255) NOT NULL, | |
| description TEXT, | |
| created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, | |
| updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP | |
| ); | |
| ``` | |
| #### sources | |
| ```sql | |
| CREATE TABLE sources ( | |
| id UUID PRIMARY KEY DEFAULT gen_random_uuid(), | |
| notebook_id UUID REFERENCES notebooks(id) ON DELETE CASCADE, | |
| source_type VARCHAR(10) NOT NULL CHECK (source_type IN ('pdf', 'url', 'text')), | |
| filename VARCHAR(255), | |
| url TEXT, | |
| content_hash VARCHAR(64), | |
| created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, | |
| metadata JSONB | |
| ); | |
| ``` | |
| #### latent_tensors | |
| ```sql | |
| CREATE TABLE latent_tensors ( | |
| id UUID PRIMARY KEY DEFAULT gen_random_uuid(), | |
| source_id UUID REFERENCES sources(id) ON DELETE CASCADE, | |
| tensor_path VARCHAR(512) NOT NULL, | |
| segment_index INTEGER NOT NULL, | |
| token_count INTEGER NOT NULL, | |
| original_text_length INTEGER, | |
| created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, | |
| metadata JSONB | |
| ); | |
| ``` | |
| #### chat_messages | |
| ```sql | |
| CREATE TABLE chat_messages ( | |
| id UUID PRIMARY KEY DEFAULT gen_random_uuid(), | |
| notebook_id UUID REFERENCES notebooks(id) ON DELETE CASCADE, | |
| role VARCHAR(20) NOT NULL CHECK (role IN ('user', 'assistant')), | |
| content TEXT NOT NULL, | |
| sources_used JSONB, | |
| created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP | |
| ); | |
| ``` | |
| ## Directory Structure | |
| ``` | |
| antigravity/ | |
| βββ docker-compose.yml | |
| βββ requirements.txt | |
| βββ .env.example | |
| βββ backend/ | |
| β βββ main.py # FastAPI application entry | |
| β βββ database.py # Database connection & models | |
| β βββ config.py # Configuration management | |
| β βββ models/ | |
| β β βββ clara.py # CLaRa model wrapper | |
| β β βββ schemas.py # Pydantic schemas | |
| β βββ services/ | |
| β β βββ ingestion.py # Multi-modal ingestion (PDF, URL, Text) | |
| β β βββ context_manager.py # "Infinite Context" logic | |
| β β βββ storage.py # Tensor storage management | |
| β β βββ retrieval.py # CLaRa retrieval & ranking | |
| β βββ routers/ | |
| β βββ notebooks.py # Notebook CRUD | |
| β βββ sources.py # Source management | |
| β βββ chat.py # Chat with notebook context | |
| βββ frontend/ | |
| β βββ app_notebook.py # Streamlit NotebookLM UI | |
| βββ data/ | |
| βββ latent_tensors/ # Local storage for tensors | |
| ``` | |
| ## Core Components | |
| ### 1. CLaRa Model Wrapper (`models/clara.py`) | |
| - Load apple/CLaRa-7B-Instruct from HuggingFace | |
| - Implement compression (text β latent tensors) | |
| - Implement generation (latent tensors β text) | |
| - Implement ranking (score tensors against query) | |
| ### 2. Context Manager (`services/context_manager.py`) | |
| **The "Brain" of NotebookLM functionality** | |
| Key Logic: | |
| 1. Fetch all latent tensors for a notebook | |
| 2. Calculate total token count | |
| 3. If <= 32k tokens: Stack ALL tensors (whole-notebook reasoning) | |
| 4. If > 32k tokens: Use CLaRa retrieval to rank and select top-N tensors | |
| ### 3. Multi-Modal Ingestion (`services/ingestion.py`) | |
| - **PDF**: PyPDF2/pdfplumber β text extraction β CLaRa compression | |
| - **URL**: BeautifulSoup/Trafilatura β web scraping β CLaRa compression | |
| - **Text**: Direct input β CLaRa compression | |
| Chunking Strategy: | |
| - Split text into 2048-token chunks (pre-compression) | |
| - Compress each chunk β latent tensor | |
| - Store in `latent_tensors/` directory | |
| - Record metadata in PostgreSQL | |
| ### 4. Storage Service (`services/storage.py`) | |
| - Save/load latent tensors to/from filesystem | |
| - Path structure: `data/latent_tensors/{notebook_id}/{source_id}/segment_{idx}.pt` | |
| - Track tensor metadata in database | |
| ### 5. Chat Router (`routers/chat.py`) | |
| - Accept `notebook_id` + `query` | |
| - Use ContextManager to prepare context | |
| - Generate response with CLaRa | |
| - Return answer + citation metadata (which sources were used) | |
| ## API Endpoints | |
| ### Notebooks | |
| - `POST /notebooks/` - Create notebook | |
| - `GET /notebooks/` - List notebooks | |
| - `GET /notebooks/{id}` - Get notebook details | |
| - `DELETE /notebooks/{id}` - Delete notebook | |
| ### Sources | |
| - `POST /notebooks/{id}/sources/upload` - Upload PDF | |
| - `POST /notebooks/{id}/sources/url` - Add URL source | |
| - `POST /notebooks/{id}/sources/text` - Add text source | |
| - `GET /notebooks/{id}/sources` - List sources | |
| - `DELETE /sources/{id}` - Delete source | |
| ### Chat | |
| - `POST /notebooks/{id}/chat` - Query notebook | |
| - `GET /notebooks/{id}/messages` - Get chat history | |
| ## Streamlit UI (`app_notebook.py`) | |
| ### Layout | |
| **Left Sidebar:** | |
| - Notebook selector dropdown | |
| - "Create New Notebook" button | |
| - List of sources in current notebook | |
| - "Add Source" section: | |
| - File uploader (PDF) | |
| - Text input (URL) | |
| - Text area (direct text) | |
| - Memory usage meter: `{used_tokens} / 32k tokens` | |
| **Main Stage:** | |
| - Chat interface (chat history + input box) | |
| - Citations display: Each assistant message shows `[Source: filename.pdf]` tags | |
| - Expandable source preview on citation click | |
| ## Implementation Phases | |
| ### Phase 1: Backend Foundation | |
| 1. Docker Compose setup (PostgreSQL) | |
| 2. Database models & migrations | |
| 3. CLaRa model wrapper (basic load + compress) | |
| 4. Storage service (save/load tensors) | |
| ### Phase 2: Ingestion Pipeline | |
| 1. PDF ingestion | |
| 2. URL scraping | |
| 3. Text chunking & compression | |
| 4. Source management API | |
| ### Phase 3: Context Manager (Critical) | |
| 1. Implement whole-context stacking | |
| 2. Implement retrieval-based selection | |
| 3. Add token counting logic | |
| ### Phase 4: Chat System | |
| 1. Chat router with notebook context | |
| 2. Citation extraction | |
| 3. Chat history storage | |
| ### Phase 5: Streamlit UI | |
| 1. Notebook management interface | |
| 2. Source upload/management | |
| 3. Chat interface | |
| 4. Citation display | |
| ## Key Technical Challenges | |
| ### Challenge 1: CLaRa Model Memory | |
| - CLaRa-7B is ~14GB in fp16 | |
| - Solution: Load model once at startup, keep in GPU memory | |
| - Use `torch.cuda.amp` for mixed precision | |
| ### Challenge 2: Tensor Storage Efficiency | |
| - Latent tensors can be large (100s of MBs per document) | |
| - Solution: Use `torch.save()` with compression | |
| - Implement lazy loading (only load when needed) | |
| ### Challenge 3: Context Budget Management | |
| - Need accurate token counting for latent tensors | |
| - Solution: Store `token_count` in database during ingestion | |
| - Pre-calculate total context before generation | |
| ### Challenge 4: Citation Tracking | |
| - Must track which source each latent segment came from | |
| - Solution: Maintain `source_map` in ContextManager | |
| - Include source IDs in generation metadata | |
| ## Success Criteria | |
| β Users can create notebooks | |
| β Users can upload PDFs, paste URLs, add text | |
| β System compresses content into latent tensors | |
| β Chat queries reason across ALL sources simultaneously | |
| β Responses include source citations | |
| β Context manager intelligently handles >32k token notebooks | |
| β UI shows memory usage meter | |
| ## Performance Targets | |
| - Ingestion: <30s for 50-page PDF | |
| - Chat response: <10s for full notebook query | |
| - Support 10-20 average-sized books per notebook | |
| ## Next Steps | |
| 1. Set up project structure | |
| 2. Implement Docker Compose + database | |
| 3. Build CLaRa wrapper | |
| 4. Implement ingestion pipeline | |
| 5. Build context manager | |
| 6. Create APIs | |
| 7. Build Streamlit UI | |