Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

README.md +259 -0
config.json +26 -0
model.py +297 -0
model.safetensors +3 -0
pytorch_model.bin +3 -0
test_results.csv +19 -0
test_results.json +182 -0

README.md ADDED Viewed

	@@ -0,0 +1,259 @@

+---
+language:
+- en
+license: apache-2.0
+library_name: pytorch
+tags:
+- text-classification
+- fiction-detection
+- byte-level
+- cnn
+datasets:
+- HuggingFaceTB/cosmopedia
+- BEE-spoke-data/gutenberg-en-v1-clean
+- common-pile/arxiv_abstracts
+- ccdv/cnn_dailymail
+metrics:
+- accuracy
+- f1
+- roc_auc
+model-index:
+- name: TinyByteCNN-Fiction-Detector
+  results:
+  - task:
+      type: text-classification
+      name: Fiction vs Non-Fiction Classification
+    dataset:
+      name: Custom Fiction/Non-Fiction Dataset
+      type: custom
+      split: validation
+    metrics:
+    - type: accuracy
+      value: 99.91
+      name: Validation Accuracy
+    - type: f1
+      value: 99.91
+      name: F1 Score
+    - type: roc_auc
+      value: 99.99
+      name: ROC AUC
+---
+# TinyByteCNN Fiction vs Non-Fiction Detector
+A lightweight, byte-level CNN model for detecting fiction vs non-fiction text with 99.91% validation accuracy.
+## Model Description
+TinyByteCNN is a highly efficient byte-level convolutional neural network designed for binary classification of fiction vs non-fiction text. The model operates directly on UTF-8 byte sequences, eliminating the need for tokenization and making it robust to various text formats and languages.
+### Architecture Highlights
+- **Model Size**: 942,313 parameters (~3.6MB)
+- **Input**: Raw UTF-8 bytes (max 4096 bytes ≈ 512 words)
+- **Architecture**: Depthwise-separable 1D CNN with Squeeze-Excitation
+- **Receptive Field**: ~2.8KB covering multi-paragraph context
+- **Key Features**:
+  - 4 stages with progressive downsampling (32x reduction)
+  - Dilated convolutions for larger receptive field
+  - SE attention modules for channel recalibration
+  - Global average + max pooling head
+## Intended Uses & Limitations
+### Intended Uses
+- Automated content categorization for libraries and archives
+- Fiction/non-fiction filtering for content platforms
+- Educational content classification
+- Writing style analysis
+- Content recommendation systems
+### Limitations
+- **Personal narratives**: May misclassify personal journal entries and memoirs as fiction (observed ~97% fiction confidence on journal entries)
+- **Mixed content**: Struggles with creative non-fiction and narrative journalism
+- **Length**: Optimized for 512-4096 byte inputs; longer texts should be chunked
+- **Language**: Primarily trained on English text
+## Training Data
+The model was trained on a diverse dataset of 85,000 samples (60k train, 15k validation, 10k test) drawn from:
+### Fiction Sources (50%)
+1. **Cosmopedia Stories** (HuggingFaceTB/cosmopedia)
+   - Synthetic fiction stories
+   - License: Apache 2.0
+2. **Project Gutenberg** (BEE-spoke-data/gutenberg-en-v1-clean)
+   - Classic literature
+   - License: Public Domain
+3. **Reddit WritingPrompts**
+   - Community-generated creative writing
+   - Via synthetic alternatives
+### Non-Fiction Sources (50%)
+1. **Cosmopedia Educational** (HuggingFaceTB/cosmopedia)
+   - Textbooks, WikiHow, educational blogs
+   - License: Apache 2.0
+2. **Scientific Papers** (common-pile/arxiv_abstracts)
+   - Academic abstracts and introductions
+   - License: Various (permissive)
+3. **News Articles** (ccdv/cnn_dailymail)
+   - CNN and Daily Mail articles
+   - License: Apache 2.0
+## Training Procedure
+### Preprocessing
+- Unicode NFC normalization
+- Whitespace normalization (max 2 consecutive spaces)
+- UTF-8 byte encoding
+- Padding/truncation to 4096 bytes
+### Training Hyperparameters
+- **Optimizer**: AdamW (lr=3e-3, betas=(0.9, 0.98), weight_decay=0.01)
+- **Schedule**: Cosine decay with 5% warmup
+- **Batch Size**: 32
+- **Epochs**: 10
+- **Label Smoothing**: 0.05
+- **Gradient Clipping**: 1.0
+- **Device**: Apple M-series (MPS)
+## Evaluation Results
+### Validation Set (15,000 samples)
+| Metric | Value |
+|--------|-------|
+| Accuracy | 99.91% |
+| F1 Score | 0.9991 |
+| ROC AUC | 0.9999 |
+| Loss | 0.1194 |
+### Test Samples by Category (12 curated samples)
+| Category | Samples | Accuracy | Avg Confidence |
+|----------|---------|----------|----------------|
+| General Fiction | 3 | 100% | 91.4% |
+| Textbook | 3 | 100% | 97.8% |
+| News Articles | 3 | 100% | 97.9% |
+| Journal Articles | 3 | 100% | 97.6% |
+| **Overall** | **12** | **100%** | **96.2%** |
+The model achieved perfect classification across all categories, including diverse journal types (financial news, scientific research, and personal travel logs).
+### Detailed Test Results
+#### ✅ All 12 Samples Correctly Classified
+**Fiction Samples (3/3):**
+1. Lighthouse keeper narrative → Fiction (79.8% conf)
+2. Time travel story → Fiction (97.2% conf)
+3. Detective mystery → Fiction (97.3% conf)
+**Textbook Samples (3/3):**
+1. Photosynthesis (Biology) → Non-Fiction (97.8% conf)
+2. Fundamental theorem (Calculus) → Non-Fiction (97.8% conf)
+3. Market equilibrium (Economics) → Non-Fiction (97.9% conf)
+**News Articles (3/3):**
+1. Federal Reserve decision → Non-Fiction (97.8% conf)
+2. City homeless initiative → Non-Fiction (97.9% conf)
+3. Exoplanet discovery → Non-Fiction (97.9% conf)
+**Journal Articles (3/3):**
+1. Wall Street Journal (Financial) → Non-Fiction (97.7% conf)
+2. Nature Scientific Reports → Non-Fiction (97.7% conf)
+3. Personal Travel Journal → Non-Fiction (97.5% conf)
+## How to Use
+### PyTorch
+```python
+import torch
+import numpy as np
+from model import TinyByteCNN, preprocess_text
+# Load model
+model = TinyByteCNN.from_pretrained("username/tinybytecnn-fiction-detector")
+model.eval()
+# Prepare text
+text = "Your text here..."
+input_bytes = preprocess_text(text)  # Returns tensor of shape [1, 4096]
+# Predict
+with torch.no_grad():
+    logits = model(input_bytes)
+    probability = torch.sigmoid(logits).item()
+    if probability > 0.5:
+        print(f"Non-Fiction (confidence: {probability:.1%})")
+    else:
+        print(f"Fiction (confidence: {1-probability:.1%})")
+```
+### Batch Processing
+```python
+def classify_texts(texts, model, batch_size=32):
+    results = []
+    for i in range(0, len(texts), batch_size):
+        batch = texts[i:i+batch_size]
+        inputs = torch.stack([preprocess_text(t) for t in batch])
+        with torch.no_grad():
+            logits = model(inputs)
+            probs = torch.sigmoid(logits)
+        for text, prob in zip(batch, probs):
+            results.append({
+                'text': text[:100] + '...',
+                'class': 'Non-Fiction' if prob > 0.5 else 'Fiction',
+                'confidence': prob.item() if prob > 0.5 else 1-prob.item()
+            })
+    return results
+```
+## Training Infrastructure
+- **Hardware**: Apple M-series with 8GB MPS memory limit
+- **Training Time**: ~20 minutes
+- **Framework**: PyTorch 2.0+
+## Environmental Impact
+- **Hardware Type**: Apple Silicon M-series
+- **Hours used**: 0.33
+- **Carbon Emitted**: Minimal (ARM-based efficiency, ~10W average)
+## Citation
+```bibtex
+@model{tinybytecnn-fiction-2024,
+  title={TinyByteCNN Fiction vs Non-Fiction Detector},
+  author={Mitchell Currie},
+  year={2024},
+  publisher={HuggingFace},
+  url={https://huggingface.co/username/tinybytecnn-fiction-detector}
+}
+```
+## Acknowledgments
+This model uses data from:
+- HuggingFace Team (Cosmopedia dataset)
+- Project Gutenberg
+- Common Pile contributors
+- CNN/Daily Mail dataset creators
+## License
+Apache 2.0
+## Contact
+For questions or issues, please open an issue on the [model repository](https://huggingface.co/username/tinybytecnn-fiction-detector).

config.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "architectures": ["TinyByteCNN"],
+  "model_type": "byte_cnn",
+  "task": "text-classification",
+  "num_labels": 2,
+  "id2label": {
+    "0": "Fiction",
+    "1": "Non-Fiction"
+  },
+  "label2id": {
+    "Fiction": 0,
+    "Non-Fiction": 1
+  },
+  "max_seq_len": 4096,
+  "vocab_size": 256,
+  "embed_dim": 32,
+  "widths": [128, 192, 256, 320],
+  "use_gn": false,
+  "head_drop": 0.1,
+  "stochastic_depth": 0.05,
+  "num_parameters": 942313,
+  "torch_dtype": "float32",
+  "validation_accuracy": 99.91,
+  "validation_f1": 0.9991,
+  "validation_auc": 0.9999
+}

model.py ADDED Viewed

	@@ -0,0 +1,297 @@

+"""
+TinyByteCNN Model for Fiction vs Non-Fiction Classification
+"""
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import numpy as np
+import unicodedata
+import re
+from typing import Union, List
+class SE(nn.Module):
+    """Squeeze-Excitation module"""
+    def __init__(self, c, r=8):
+        super().__init__()
+        m = max(c // r, 4)
+        self.fc1 = nn.Linear(c, m)
+        self.fc2 = nn.Linear(m, c)
+    def forward(self, x):
+        # x: [B, C, T]
+        s = x.mean(dim=-1)  # [B, C]
+        s = F.silu(self.fc1(s))
+        s = torch.sigmoid(self.fc2(s))  # [B, C]
+        return x * s.unsqueeze(-1)
+class SepResBlock(nn.Module):
+    """Separable Residual Block with SE attention"""
+    def __init__(self, c_in, c_out, k=7, stride=1, dilation=1, use_gn=False, se_ratio=8, drop=0.0):
+        super().__init__()
+        Norm = (lambda c: nn.GroupNorm(32, c)) if use_gn else nn.BatchNorm1d
+        self.dw = nn.Conv1d(c_in, c_in, k, stride=stride, dilation=dilation,
+                           padding=((k-1)//2)*dilation, groups=c_in, bias=False)
+        self.bn1 = Norm(c_in)
+        self.pw = nn.Conv1d(c_in, c_out, 1, bias=False)
+        self.bn2 = Norm(c_out)
+        self.se = SE(c_out, se_ratio)
+        self.drop = nn.Dropout(p=drop)
+        self.proj = None
+        if stride != 1 or c_in != c_out:
+            self.proj = nn.Conv1d(c_in, c_out, 1, stride=stride, bias=False)
+    def forward(self, x):
+        y = self.dw(x)
+        y = F.silu(self.bn1(y))
+        y = self.pw(y)
+        y = self.bn2(y)
+        y = self.se(y)
+        if self.proj is not None:
+            x = self.proj(x)
+        y = self.drop(y)
+        return F.silu(x + y)
+class TinyByteCNN(nn.Module):
+    """TinyByteCNN for Fiction vs Non-Fiction Classification"""
+    def __init__(self, config=None):
+        super().__init__()
+        # Default configuration
+        if config is None:
+            config = type('Config', (), {
+                'vocab_size': 256,
+                'embed_dim': 32,
+                'widths': [128, 192, 256, 320],
+                'use_gn': False,
+                'head_drop': 0.1,
+                'stochastic_depth': 0.05
+            })()
+        self.config = config
+        # Embedding layer for bytes
+        self.embed = nn.Embedding(config.vocab_size, config.embed_dim)
+        # Stem convolution
+        self.stem = nn.Conv1d(config.embed_dim, config.widths[0], 5, stride=2, padding=2, bias=False)
+        self.bn0 = nn.BatchNorm1d(config.widths[0]) if not config.use_gn else nn.GroupNorm(32, config.widths[0])
+        # Build stages
+        cfg = [
+            (2, config.widths[0], [1, 2]),
+            (2, config.widths[1], [1, 2]),
+            (3, config.widths[2], [1, 2, 4]),
+            (3, config.widths[3], [1, 2, 8])
+        ]
+        stages = []
+        c_prev = config.widths[0]
+        for blocks, c, ds in cfg:
+            for i in range(blocks):
+                stride = 2 if i == 0 else 1
+                d = ds[i]
+                stages.append(SepResBlock(c_prev, c, k=7, stride=stride, dilation=d,
+                                        use_gn=config.use_gn, drop=config.stochastic_depth))
+                c_prev = c
+        self.stages = nn.Sequential(*stages)
+        # Classification head
+        self.head = nn.Sequential(
+            nn.Dropout(p=config.head_drop),
+            nn.Linear(2 * config.widths[-1], 1)
+        )
+    def forward(self, x_bytes):
+        """
+        Args:
+            x_bytes: [B, T] uint8 tensor of byte values
+        Returns:
+            logits: [B] tensor of binary classification logits
+        """
+        x = self.embed(x_bytes.long())  # [B, T, E]
+        x = x.transpose(1, 2).contiguous()  # [B, E, T]
+        x = F.silu(self.bn0(self.stem(x)))  # [B, C0, T/2]
+        x = self.stages(x)  # [B, C, T/32]
+        # Global pooling
+        avg = x.mean(dim=-1)
+        mx = x.amax(dim=-1)
+        feats = torch.cat([avg, mx], dim=1)
+        logits = self.head(feats).squeeze(1)
+        return logits
+    @classmethod
+    def from_pretrained(cls, path_or_repo, use_safetensors=True):
+        """Load pretrained model (supports both .bin and .safetensors)"""
+        import os
+        from pathlib import Path
+        # Determine if it's a file or directory/repo
+        if os.path.isdir(path_or_repo):
+            # Directory path - look for model files
+            base_path = Path(path_or_repo)
+            safetensors_path = base_path / "model.safetensors"
+            pytorch_path = base_path / "pytorch_model.bin"
+            if use_safetensors and safetensors_path.exists():
+                # Load from safetensors
+                from safetensors.torch import load_file
+                state_dict = load_file(str(safetensors_path))
+                # Load config if available
+                config_path = base_path / "config.json"
+                if config_path.exists():
+                    import json
+                    with open(config_path) as f:
+                        config_dict = json.load(f)
+                    config = type('Config', (), config_dict)()
+                else:
+                    config = None
+                model = cls(config)
+                model.load_state_dict(state_dict)
+                return model
+            elif pytorch_path.exists():
+                checkpoint = torch.load(pytorch_path, weights_only=False, map_location='cpu')
+        elif os.path.isfile(path_or_repo):
+            if path_or_repo.endswith('.safetensors'):
+                from safetensors.torch import load_file
+                state_dict = load_file(path_or_repo)
+                model = cls()
+                model.load_state_dict(state_dict)
+                return model
+            else:
+                checkpoint = torch.load(path_or_repo, weights_only=False, map_location='cpu')
+        else:
+            # HuggingFace hub loading
+            from huggingface_hub import hf_hub_download
+            if use_safetensors:
+                try:
+                    model_file = hf_hub_download(repo_id=path_or_repo, filename="model.safetensors")
+                    from safetensors.torch import load_file
+                    state_dict = load_file(model_file)
+                    model = cls()
+                    model.load_state_dict(state_dict)
+                    return model
+                except:
+                    pass  # Fall back to pytorch format
+            model_file = hf_hub_download(repo_id=path_or_repo, filename="pytorch_model.bin")
+            checkpoint = torch.load(model_file, weights_only=False, map_location='cpu')
+        # Load from checkpoint (pytorch format)
+        if 'checkpoint' in locals():
+            config = checkpoint.get('config', None)
+            model = cls(config)
+            state_dict = checkpoint.get('model_state_dict', checkpoint)
+            model.load_state_dict(state_dict)
+            return model
+    def save_pretrained(self, save_path):
+        """Save model to directory"""
+        import os
+        os.makedirs(save_path, exist_ok=True)
+        torch.save({
+            'model_state_dict': self.state_dict(),
+            'config': self.config
+        }, os.path.join(save_path, 'pytorch_model.bin'))
+def preprocess_text(text: str, max_len: int = 4096) -> torch.Tensor:
+    """
+    Preprocess text to bytes for model input
+    Args:
+        text: Input text string
+        max_len: Maximum sequence length (default 4096)
+    Returns:
+        Tensor of shape [1, max_len] containing byte values
+    """
+    # Unicode NFC normalize
+    text = unicodedata.normalize('NFC', text)
+    # Replace \r\n → \n
+    text = text.replace('\r\n', '\n')
+    # Collapse runs of whitespace to at most 2
+    text = re.sub(r'\s{3,}', '  ', text)
+    # Convert to bytes
+    text_bytes = text.encode('utf-8', errors='ignore')
+    # Pad or truncate to max_len
+    input_ids = np.zeros(max_len, dtype=np.uint8)
+    input_ids[:min(len(text_bytes), max_len)] = list(text_bytes[:max_len])
+    return torch.from_numpy(input_ids).unsqueeze(0)  # Add batch dimension
+def classify_text(text: Union[str, List[str]], model=None, device='cpu'):
+    """
+    Classify text as fiction or non-fiction
+    Args:
+        text: Single string or list of strings to classify
+        model: Pre-loaded model (optional)
+        device: Device to run on ('cpu', 'cuda', 'mps')
+    Returns:
+        Dictionary with predictions and confidence scores
+    """
+    if model is None:
+        model = TinyByteCNN.from_pretrained("fiction_classifier_hf")
+    model = model.to(device)
+    model.eval()
+    # Handle single text or batch
+    if isinstance(text, str):
+        texts = [text]
+    else:
+        texts = text
+    results = []
+    for t in texts:
+        input_ids = preprocess_text(t).to(device)
+        with torch.no_grad():
+            logits = model(input_ids)
+            prob = torch.sigmoid(logits).item()
+        pred_class = "Non-Fiction" if prob > 0.5 else "Fiction"
+        confidence = prob if prob > 0.5 else (1 - prob)
+        results.append({
+            'text': t[:100] + '...' if len(t) > 100 else t,
+            'prediction': pred_class,
+            'confidence': confidence,
+            'probability_nonfiction': prob
+        })
+    return results[0] if isinstance(text, str) else results
+if __name__ == "__main__":
+    # Example usage
+    sample_text = "The detective's coffee had gone cold hours ago, but she hardly noticed."
+    # Load and use model
+    model = TinyByteCNN.from_pretrained("fiction_model_output_cnn/best_model.pt")
+    result = classify_text(sample_text, model)
+    print(f"Text: {result['text']}")
+    print(f"Prediction: {result['prediction']}")
+    print(f"Confidence: {result['confidence']:.1%}")

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e788bf5427b996650f8e657b05615078bdb3f0e778f23eb5059a2566b92e8a2a
+size 3821900

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c21fe2faa7c6707c40c83b6c866dbd93f9437e2a67c558330408d4085448f1b6
+size 3862846

test_results.csv ADDED Viewed

	@@ -0,0 +1,19 @@

+sample_id,category,true_label,predicted_label,confidence,probability_nonfiction,correct,text_preview
+general_fiction_1,Fiction,Fiction,Fiction,0.7979161292314529,0.20208387076854706,True,"The old lighthouse keeper squinted through the salt-stained window, watching the storm gather streng..."
+general_fiction_2,Fiction,Fiction,Fiction,0.9722247179597616,0.02777528204023838,True,"Marcus never believed in second chances until the morning he woke up in his childhood bedroom, seven..."
+general_fiction_3,Fiction,Fiction,Fiction,0.9732040446251631,0.02679595537483692,True,"The detective's coffee had gone cold hours ago, but she hardly noticed. The case files spread across..."
+childrens_stories_1,Fiction,Fiction,Fiction,0.9714287109673023,0.028571289032697678,True,Benny the bunny had a very important problem. His favorite carrot was stuck at the top of the talles...
+childrens_stories_2,Fiction,Fiction,Fiction,0.972626393660903,0.027373606339097023,True,"Princess Luna loved to paint, but there was one big problem - all her paintings came to life at midn..."
+childrens_stories_3,Fiction,Fiction,Fiction,0.9603868946433067,0.03961310535669327,True,Tommy's grandpa had a secret. Hidden in his workshop was a pair of goggles that could let you see in...
+fantasy_stories_1,Fiction,Fiction,Fiction,0.9737914837896824,0.02620851621031761,True,The ancient runes on Kaelen's sword began to glow with an otherworldly blue light as she approached ...
+fantasy_stories_2,Fiction,Fiction,Fiction,0.9677380956709385,0.03226190432906151,True,"Elara discovered she could weave moonlight into solid form quite by accident, during the Festival of..."
+fantasy_stories_3,Fiction,Fiction,Fiction,0.9729745481163263,0.027025451883673668,True,"In the dragon markets of Valengard, memories were currency and dreams could be bottled like wine. Th..."
+textbook_1,Non-Fiction,Non-Fiction,Non-Fiction,0.9782394766807556,0.9782394766807556,True,The process of photosynthesis can be divided into two main stages: the light-dependent reactions and...
+textbook_2,Non-Fiction,Non-Fiction,Non-Fiction,0.9783691167831421,0.9783691167831421,True,The fundamental theorem of calculus establishes the relationship between differentiation and integra...
+textbook_3,Non-Fiction,Non-Fiction,Non-Fiction,0.9790801405906677,0.9790801405906677,True,"Market equilibrium occurs at the intersection of supply and demand curves, where the quantity demand..."
+news_1,Non-Fiction,Non-Fiction,Non-Fiction,0.9778554439544678,0.9778554439544678,True,The Federal Reserve announced Tuesday its decision to maintain interest rates at their current level...
+news_2,Non-Fiction,Non-Fiction,Non-Fiction,0.9786907434463501,0.9786907434463501,True,"City officials unveiled a comprehensive plan Wednesday to address the growing homeless crisis, alloc..."
+news_3,Non-Fiction,Non-Fiction,Non-Fiction,0.9789323210716248,0.9789323210716248,True,Scientists at the European Space Observatory have discovered three potentially habitable exoplanets ...
+journal_entries_1,Non-Fiction,Non-Fiction,Non-Fiction,0.9772427678108215,0.9772427678108215,True,"Wall Street Journal - March 15, 2024: Technology stocks led a broad market rally Thursday as investo..."
+journal_entries_2,Non-Fiction,Non-Fiction,Non-Fiction,0.9768512845039368,0.9768512845039368,True,"Nature Scientific Reports - September 2024: In this study, we investigated the correlation between m..."
+journal_entries_3,Non-Fiction,Non-Fiction,Non-Fiction,0.9745306968688965,0.9745306968688965,True,"Personal Travel Journal - January 8, 2024: Day 3 in Kyoto. Visited Fushimi Inari shrine early this m..."

test_results.json ADDED Viewed

	@@ -0,0 +1,182 @@

+[
+  {
+    "category": "Fiction",
+    "sample_id": "general_fiction_1",
+    "true_label": "Fiction",
+    "predicted_label": "Fiction",
+    "confidence": 0.7979161292314529,
+    "probability_nonfiction": 0.20208387076854706,
+    "correct": true,
+    "text_preview": "The old lighthouse keeper squinted through the salt-stained window, watching the storm gather streng..."
+  },
+  {
+    "category": "Fiction",
+    "sample_id": "general_fiction_2",
+    "true_label": "Fiction",
+    "predicted_label": "Fiction",
+    "confidence": 0.9722247179597616,
+    "probability_nonfiction": 0.02777528204023838,
+    "correct": true,
+    "text_preview": "Marcus never believed in second chances until the morning he woke up in his childhood bedroom, seven..."
+  },
+  {
+    "category": "Fiction",
+    "sample_id": "general_fiction_3",
+    "true_label": "Fiction",
+    "predicted_label": "Fiction",
+    "confidence": 0.9732040446251631,
+    "probability_nonfiction": 0.02679595537483692,
+    "correct": true,
+    "text_preview": "The detective's coffee had gone cold hours ago, but she hardly noticed. The case files spread across..."
+  },
+  {
+    "category": "Fiction",
+    "sample_id": "childrens_stories_1",
+    "true_label": "Fiction",
+    "predicted_label": "Fiction",
+    "confidence": 0.9714287109673023,
+    "probability_nonfiction": 0.028571289032697678,
+    "correct": true,
+    "text_preview": "Benny the bunny had a very important problem. His favorite carrot was stuck at the top of the talles..."
+  },
+  {
+    "category": "Fiction",
+    "sample_id": "childrens_stories_2",
+    "true_label": "Fiction",
+    "predicted_label": "Fiction",
+    "confidence": 0.972626393660903,
+    "probability_nonfiction": 0.027373606339097023,
+    "correct": true,
+    "text_preview": "Princess Luna loved to paint, but there was one big problem - all her paintings came to life at midn..."
+  },
+  {
+    "category": "Fiction",
+    "sample_id": "childrens_stories_3",
+    "true_label": "Fiction",
+    "predicted_label": "Fiction",
+    "confidence": 0.9603868946433067,
+    "probability_nonfiction": 0.03961310535669327,
+    "correct": true,
+    "text_preview": "Tommy's grandpa had a secret. Hidden in his workshop was a pair of goggles that could let you see in..."
+  },
+  {
+    "category": "Fiction",
+    "sample_id": "fantasy_stories_1",
+    "true_label": "Fiction",
+    "predicted_label": "Fiction",
+    "confidence": 0.9737914837896824,
+    "probability_nonfiction": 0.02620851621031761,
+    "correct": true,
+    "text_preview": "The ancient runes on Kaelen's sword began to glow with an otherworldly blue light as she approached ..."
+  },
+  {
+    "category": "Fiction",
+    "sample_id": "fantasy_stories_2",
+    "true_label": "Fiction",
+    "predicted_label": "Fiction",
+    "confidence": 0.9677380956709385,
+    "probability_nonfiction": 0.03226190432906151,
+    "correct": true,
+    "text_preview": "Elara discovered she could weave moonlight into solid form quite by accident, during the Festival of..."
+  },
+  {
+    "category": "Fiction",
+    "sample_id": "fantasy_stories_3",
+    "true_label": "Fiction",
+    "predicted_label": "Fiction",
+    "confidence": 0.9729745481163263,
+    "probability_nonfiction": 0.027025451883673668,
+    "correct": true,
+    "text_preview": "In the dragon markets of Valengard, memories were currency and dreams could be bottled like wine. Th..."
+  },
+  {
+    "category": "Non-Fiction",
+    "sample_id": "textbook_1",
+    "true_label": "Non-Fiction",
+    "predicted_label": "Non-Fiction",
+    "confidence": 0.9782394766807556,
+    "probability_nonfiction": 0.9782394766807556,
+    "correct": true,
+    "text_preview": "The process of photosynthesis can be divided into two main stages: the light-dependent reactions and..."
+  },
+  {
+    "category": "Non-Fiction",
+    "sample_id": "textbook_2",
+    "true_label": "Non-Fiction",
+    "predicted_label": "Non-Fiction",
+    "confidence": 0.9783691167831421,
+    "probability_nonfiction": 0.9783691167831421,
+    "correct": true,
+    "text_preview": "The fundamental theorem of calculus establishes the relationship between differentiation and integra..."
+  },
+  {
+    "category": "Non-Fiction",
+    "sample_id": "textbook_3",
+    "true_label": "Non-Fiction",
+    "predicted_label": "Non-Fiction",
+    "confidence": 0.9790801405906677,
+    "probability_nonfiction": 0.9790801405906677,
+    "correct": true,
+    "text_preview": "Market equilibrium occurs at the intersection of supply and demand curves, where the quantity demand..."
+  },
+  {
+    "category": "Non-Fiction",
+    "sample_id": "news_1",
+    "true_label": "Non-Fiction",
+    "predicted_label": "Non-Fiction",
+    "confidence": 0.9778554439544678,
+    "probability_nonfiction": 0.9778554439544678,
+    "correct": true,
+    "text_preview": "The Federal Reserve announced Tuesday its decision to maintain interest rates at their current level..."
+  },
+  {
+    "category": "Non-Fiction",
+    "sample_id": "news_2",
+    "true_label": "Non-Fiction",
+    "predicted_label": "Non-Fiction",
+    "confidence": 0.9786907434463501,
+    "probability_nonfiction": 0.9786907434463501,
+    "correct": true,
+    "text_preview": "City officials unveiled a comprehensive plan Wednesday to address the growing homeless crisis, alloc..."
+  },
+  {
+    "category": "Non-Fiction",
+    "sample_id": "news_3",
+    "true_label": "Non-Fiction",
+    "predicted_label": "Non-Fiction",
+    "confidence": 0.9789323210716248,
+    "probability_nonfiction": 0.9789323210716248,
+    "correct": true,
+    "text_preview": "Scientists at the European Space Observatory have discovered three potentially habitable exoplanets ..."
+  },
+  {
+    "category": "Non-Fiction",
+    "sample_id": "journal_entries_1",
+    "true_label": "Non-Fiction",
+    "predicted_label": "Non-Fiction",
+    "confidence": 0.9772427678108215,
+    "probability_nonfiction": 0.9772427678108215,
+    "correct": true,
+    "text_preview": "Wall Street Journal - March 15, 2024: Technology stocks led a broad market rally Thursday as investo..."
+  },
+  {
+    "category": "Non-Fiction",
+    "sample_id": "journal_entries_2",
+    "true_label": "Non-Fiction",
+    "predicted_label": "Non-Fiction",
+    "confidence": 0.9768512845039368,
+    "probability_nonfiction": 0.9768512845039368,
+    "correct": true,
+    "text_preview": "Nature Scientific Reports - September 2024: In this study, we investigated the correlation between m..."
+  },
+  {
+    "category": "Non-Fiction",
+    "sample_id": "journal_entries_3",
+    "true_label": "Non-Fiction",
+    "predicted_label": "Non-Fiction",
+    "confidence": 0.9745306968688965,
+    "probability_nonfiction": 0.9745306968688965,
+    "correct": true,
+    "text_preview": "Personal Travel Journal - January 8, 2024: Day 3 in Kyoto. Visited Fushimi Inari shrine early this m..."
+  }
+]