It's a DELLA! 👼
First vibe test positive, the magic is there... ✨
Glad to hear it , I did not test this one as much yet.
Cant wait to finetune 24B models eventually
Townfolk was talking about some kind of repetition bug, prompting development of Goetia v1.4 -- what's that all about? 👀
Not sure but v1.4 is probably a ways off. I'm trying to save up for a 3090, working long hours. I have made some new datasets and found a way to improve Cthulhu/Raven/Morpheus with longer context Q&A Pairs (~2k tokens instead of 500). My initial test set for this long-form style performed quite well.
So, a proper Cthulhu/Goetia 24B dataset is under construction, but we'll likely see the smaller versions first.
Attempts to finetune 12B on a 8GB card have failed, but I can now merge MOE for mistral 7b and llama 8b. These seem to be promising. I have more ideas for MOE finetunes/merges coming up.
Also was working on a custom moe_karcher method but lost the script in a BSOD
Update: The script has been recovered! Running a test of it now
What's your total VRAM? At least 16-24B is enough to help with finetuning.
Also you need high system ram (32-64gb, and a pagefile with SSD that has enough write cycles, I recommend at least 50-100GB)
I don't use mac but here is a supposed way to install it.
If you can get it to run a simple SLERP merge let me know, and I can send you more advanced scripts to test.
Save this as config.yaml and try to run it (recreate kekthulhu)
Some of these commands won't work on mac, so you may have to ask a 'smart' LLM for help mergekit-yaml C:\mergekit-main\config.yaml C:\mergekit-main\merged_model_output --copy-tokenizer --allow-crimes --out-shard-size 5B --trust-remote-code --lazy-unpickle --random-seed 420 --cuda
base_model: SicariusSicariiStuff\Assistant_Pepe_8B
architecture: MistralForCausalLM
merge_method: slerp
dtype: float32
out_dtype: bfloat16
slices:
- sources:
- model: Naphula\Cthulhu-8B-v1.4
layer_range: [0, 32]
- model: SicariusSicariiStuff\Assistant_Pepe_8B
layer_range: [0, 32]
parameters:
t: 0.5
tokenizer:
source: union
#chat_template: auto
To run Arcee AI's MergeKit on a Macintosh, first clone the repository using git clone https://github.com/arcee-ai/mergekit.git, then navigate to the directory with cd mergekit and install the package using pip install -e .. After installation, you can use the main script mergekit-yaml to perform merges by specifying a YAML configuration file and an output path.
Setting Up MergeKit on macOS
Prerequisites
Python: Ensure you have Python installed. You can download it from the official Python website.
Git: Install Git to clone the MergeKit repository. You can download it from the Git website.
Installation Steps
Clone the Repository:
Open your terminal and run the following command:
bash
git clone https://github.com/arcee-ai/mergekit.git
Navigate to the Directory:
Change to the MergeKit directory:
bash
cd mergekit
Install MergeKit:
Use pip to install MergeKit:
bash
pip install -e .
Running MergeKit
Prepare Your Configuration:
Create a YAML configuration file for your merge. You can use a template or create one from scratch.
Run the Merge Command:
Execute the merge command using the following syntax:
bash
mergekit-yaml path/to/your/config.yml ./output-model-directory [--cuda] [--lazy-unpickle] [--allow-crimes] [... other options]
Replace path/to/your/config.yml with the path to your configuration file and ./output-model-directory with your desired output directory.
Thanks for sharing these steps, I will try them out when I’m back home. I should be able to use around 50 GB as VRAM, though I don’t know if that translates well with this use case. Will keep you posted, once I’m at it 👍
If you manage to get Mergekit running then you will want to replace the graph.py with this version
https://huggingface.co/spaces/Naphula/model_tools/blob/main/graph_v18.py
(you would want to edit some of the settings to recalibrate it for your vram ceiling)
Not sure if it works on Mac but if so then you should be able to breeze through big merges in minutes instead of hours with 50GB VRAM (assuming you have storage space on the SSD for them, it takes forever to transfer from HDD)
I’m still on sick leave, but beginning next week, I hope to finally be able to try it out! I'd love to be able to contribute in a meaningful way, even if I don’t really know what I’m doing 🤪