--- license: apache-2.0 tags: - nsa - byte-level - fine-tuned - alpaca widget: - text: "System: You are a helpful assistant. Answer briefly and clearly. User: What is 2+2? Assistant:" - text: "System: You are a helpful assistant. Answer briefly and clearly. User: Write a short poem about AI. Assistant:" - text: "System: You are a helpful assistant. Answer briefly and clearly. User: Explain quantum computing in one sentence. Assistant:" inference: true --- # NSA-117M-Byte-SFT This is a fine-tuned version of the NSA-117M byte-level model on the Alpaca dataset. ## Model Details - **Base model**: seconds-0/nsa-117m-byte - **Training**: 20,000 steps on Alpaca dataset - **Tokenizer**: Byte-level (256 vocabulary size) - **Architecture**: Native Sparse Attention (NSA) ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained("seconds-0/nsa-117m-byte-sft", trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained("seconds-0/nsa-117m-byte-sft", trust_remote_code=True) prompt = "System: You are a helpful assistant. Answer briefly and clearly.\nUser: What is the capital of France?\nAssistant:" inputs = tokenizer(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ## Training Details - Loss: 8.46 (final) - Hardware: NVIDIA H100 80GB - Training time: ~1.3 hours - LoRA rank: 16 ## Note This model uses byte-level tokenization, so outputs may appear unusual compared to standard subword tokenizers.