---
license: apache-2.0
tags:
- nsa
- byte-level
- fine-tuned
- alpaca
widget:
- text: "System: You are a helpful assistant. Answer briefly and clearly.
User: What is 2+2?
Assistant:"
- text: "System: You are a helpful assistant. Answer briefly and clearly.
User: Write a short poem about AI.
Assistant:"
- text: "System: You are a helpful assistant. Answer briefly and clearly.
User: Explain quantum computing in one sentence.
Assistant:"
inference: true
---

# NSA-117M-Byte-SFT

This is a fine-tuned version of the NSA-117M byte-level model on the Alpaca dataset.

## Model Details
- **Base model**: seconds-0/nsa-117m-byte
- **Training**: 20,000 steps on Alpaca dataset
- **Tokenizer**: Byte-level (256 vocabulary size)
- **Architecture**: Native Sparse Attention (NSA)

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("seconds-0/nsa-117m-byte-sft", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("seconds-0/nsa-117m-byte-sft", trust_remote_code=True)

prompt = "System: You are a helpful assistant. Answer briefly and clearly.\nUser: What is the capital of France?\nAssistant:"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=50, temperature=0.7, do_sample=True)
    
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Training Details
- Loss: 8.46 (final)
- Hardware: NVIDIA H100 80GB
- Training time: ~1.3 hours
- LoRA rank: 16

## Note
This model uses byte-level tokenization, so outputs may appear unusual compared to standard subword tokenizers.