--- license: apache-2.0 tags: - music - text-generation - instruction-tuning - lora - preview - untrained - qwen3.5 - touchgrass datasets: - synthetic language: - en library_name: transformers pipeline_tag: text-generation --- # TouchGrass-3B ๐ŸŽต **Status: PREVIEW - UNTRAINED MODEL** This is a **preview repository** for TouchGrass-3B, a lightweight music AI assistant fine-tuned from Qwen3.-3B-Instruct. **This model has NOT been trained yet** - it contains randomly initialized LoRA adapters and is not ready for inference. ## โš ๏ธ Important Notice - **Model is UNTRAINED**: The LoRA adapters are randomly initialized. Performance will be no better than the base Qwen3.5-3B-Instruct model. - **For demonstration purposes only**: This repository contains the complete codebase and configuration for training the model. - **Expected performance after training**: 94-95% accuracy on music-specific tasks (based on architecture design and synthetic data pipeline). ## ๐ŸŽฏ Model Overview TouchGrass is a specialized music AI assistant built by fine-tuning Qwen3.5 models with: - **Music Tokenizer Extension**: 21+ music-specific tokens (guitar, piano, drums, vocals, theory, DJ, tablature, chords, etc.) - **Five Specialized Modules**: - ๐ŸŽธ Tab & Chord Generation (guitar tabs, chord diagrams) - ๐ŸŽน Music Theory Engine (scales, intervals, progressions) - ๐Ÿ‘‚ Ear Training (interval ID, solfege exercises) - ๐Ÿ˜Œ EQ Adapter (frustration detection, emotional adaptation) - โœ๏ธ Song Writing Assistant (progressions, lyrics, hooks) - **LoRA Fine-Tuning**: Efficient parameter-efficient fine-tuning - **Multi-Task Learning**: Weighted losses (LM: 1.0, EQ: 0.1, Music: 0.05) ## ๐Ÿ“Š Model Details | Property | Value | |----------|-------| | Base Model | Qwen/Qwen3.5-3B-Instruct | | Model Size | ~3.5B parameters (with LoRA) | | Vocab Size | 32,000 (Qwen3.5 + music tokens) | | Max Sequence Length | 4,096 tokens | | LoRA Rank | 16 (configurable) | | Training Data | Synthetic music QA (10 categories, 80+ templates) | | Training Steps | 50,000 (planned) | | Batch Size | 8-16 (depending on GPU) | | Learning Rate | 2e-4 (with warmup) | ## ๐Ÿ—๏ธ Architecture The model extends Qwen3.5 with: 1. **Custom tokenizer** with music domain tokens 2. **Five LoRA-adapted modules** inserted at transformer layers 3. **Multi-task heads** for music-specific predictions 4. **Emotional intelligence** via EQ adapter ## ๐Ÿš€ Usage (After Training) ### HuggingFace Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer from TouchGrass.configuration_touchgrass import TouchGrassConfig from TouchGrass.tokenization_touchgrass import TouchGrassTokenizer # Load model and tokenizer model = AutoModelForCausalLM.from_pretrained("your-username/TouchGrass-3B") tokenizer = TouchGrassTokenizer.from_pretrained("your-username/TouchGrass-3B") # Generate with instrument context prompt = "[GUITAR][BEGINNER] How do I play an F major chord?" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=200) print(tokenizer.decode(outputs[0])) ``` ### Ollama (After Training) ```bash # Create Modelfile (provided in repository) ollama create touchgrass-3b -f ollama_3b_modelfile # Run inference ollama run touchgrass-3b "How do I build a chord progression in C major?" ``` ## ๐Ÿ“ Repository Structure This repository contains all necessary files for training: ``` touchgrass-3b/ โ”œโ”€โ”€ configuration_touchgrass.py # HuggingFace config class โ”œโ”€โ”€ tokenization_touchgrass.py # HuggingFace tokenizer wrapper โ”œโ”€โ”€ train.py # Main training script โ”œโ”€โ”€ configs/ โ”‚ โ”œโ”€โ”€ touchgrass_3b_config.py # Model architecture config โ”‚ โ”œโ”€โ”€ touchgrass_7b_config.py # 7B config (for reference) โ”‚ โ””โ”€โ”€ training_config.py # Training hyperparameters โ”œโ”€โ”€ tokenizer/ โ”‚ โ””โ”€โ”€ music_token_extension.py # Music token definitions โ”œโ”€โ”€ models/ # Five specialized modules โ”‚ โ”œโ”€โ”€ tab_chord_module.py โ”‚ โ”œโ”€โ”€ music_theory_module.py โ”‚ โ”œโ”€โ”€ ear_training_module.py โ”‚ โ”œโ”€โ”€ eq_adapter.py โ”‚ โ””โ”€โ”€ songwriting_module.py โ”œโ”€โ”€ data/ # Data pipeline โ”‚ โ”œโ”€โ”€ music_qa_generator.py โ”‚ โ”œโ”€โ”€ chat_formatter.py โ”‚ โ””โ”€โ”€ dataset_loader.py โ”œโ”€โ”€ training/ โ”‚ โ”œโ”€โ”€ losses.py โ”‚ โ”œโ”€โ”€ trainer.py โ”‚ โ””โ”€โ”€ train.py โ”œโ”€โ”€ inference/ โ”‚ โ””โ”€โ”€ inference.py โ”œโ”€โ”€ benchmarks/ โ”‚ โ”œโ”€โ”€ evaluate_music_modules.py โ”‚ โ””โ”€โ”€ evaluate_inference.py โ”œโ”€โ”€ tests/ # Comprehensive test suite โ”œโ”€โ”€ ollama_3b_modelfile # Ollama configuration โ”œโ”€โ”€ README.md # Full documentation โ””โ”€โ”€ PREVIEW_README.md # This preview notice ``` ## ๐Ÿงช Testing Run the test suite: ```bash cd touchgrass-3b python -m pytest tests/ -v ``` ## ๐Ÿ“š Documentation See [README.md](README.md) for complete documentation including: - Installation instructions - Training guide - Inference examples - Module specifications - Data generation details - Troubleshooting ## โš™๏ธ Training (When Resources Available) 1. **Generate synthetic data**: ```bash python -c "from data.music_qa_generator import MusicQAGenerator; MusicQAGenerator().generate_dataset(num_samples=10000, output_path='data/music_qa.jsonl')" ``` 2. **Start training**: ```bash python train.py --config configs/touchgrass_3b_config.py --data data/music_qa.jsonl --output_dir ./checkpoints ``` 3. **Convert to HuggingFace format**: ```bash python -c "from configuration_touchgrass import TouchGrassConfig; from tokenization_touchgrass import TouchGrassTokenizer; config = TouchGrassConfig.from_pretrained('./checkpoints'); tokenizer = TouchGrassTokenizer.from_pretrained('./checkpoints'); config.save_pretrained('./model'); tokenizer.save_pretrained('./model')" ``` 4. **Push to HuggingFace**: ```bash huggingface-cli login huggingface-cli upload your-username/TouchGrass-3B ./model --repo-type model ``` ## ๐Ÿค Contributing This is a preview. Contributions welcome for: - Improving synthetic data quality - Adding more music categories - Optimizing training efficiency - Extending to more instruments ## ๐Ÿ“„ License Apache 2.0 ## ๐Ÿ™ Acknowledgments - Built upon [Qwen3.5](https://huggingface.co/Qwen) by Alibaba Cloud - Inspired by the need for accessible music education AI - Special thanks to the open-source music technology community --- **โš ๏ธ REMINDER**: This is an UNTRAINED PREVIEW model. Do not use for production inference without completing the training process.