--- library_name: transformers base_model: - canopylabs/orpheus-3b-0.1-ft --- # Malaysian canopylabs/orpheus-3b-0.1-ft Finetune [canopylabs/orpheus-3b-0.1-ft](https://huggingface.co/canopylabs/orpheus-3b-0.1-ft) on standard Malay and minimal Mandarin. ## Training session Finetune on [Mesolitica/TTS](https://huggingface.co/datasets/mesolitica/TTS) to make the model able to generate Malay voice with minimal Mandarin. ## How we train 1. LoRA on `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"]`. 2. 128 Rank with alpha 256, or alpha of 2.0, but during merging, we use 1.5 ratio. 3. Multipacking with proper SDPA causal masking to prevent document contamination and also make sure proper position ids. 4. Chunk CE loss to reduce memory. Wandb at https://wandb.ai/huseinzol05/malay-orpheus-3b-0.1-ft-lora-128/workspace?nw=nwuserhuseinzol05 ## Example Load the model, ```python from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer from snac import SNAC import torch import IPython.display as ipd def redistribute_codes(row): row_length = row.size(0) new_length = (row_length // 7) * 7 trimmed_row = row[:new_length] code_list = [t - 128266 for t in trimmed_row] layer_1 = [] layer_2 = [] layer_3 = [] for i in range((len(code_list)+1)//7): layer_1.append(code_list[7*i][None]) layer_2.append(code_list[7*i+1][None]-4096) layer_3.append(code_list[7*i+2][None]-(2*4096)) layer_3.append(code_list[7*i+3][None]-(3*4096)) layer_2.append(code_list[7*i+4][None]-(4*4096)) layer_3.append(code_list[7*i+5][None]-(5*4096)) layer_3.append(code_list[7*i+6][None]-(6*4096)) with torch.no_grad(): codes = [torch.concat(layer_1), torch.concat(layer_2), torch.concat(layer_3)] for i in range(len(codes)): codes[i][codes[i] < 0] = 0 codes[i] = codes[i][None] audio_hat = snac_model.decode(codes) return audio_hat.cpu()[0, 0] snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz") snac_model = snac_model.to("cuda") tokenizer = AutoTokenizer.from_pretrained('mesolitica/Malaysian-orpheus-3b-0.1-ft') model = AutoModelForCausalLM.from_pretrained( 'mesolitica/Malaysian-orpheus-3b-0.1-ft', torch_dtype = torch.bfloat16 ).cuda() speakers = [ 'Husein', 'Shafiqah Idayu', 'Anwar Ibrahim', 'KP' ] speaker = speakers[0] text = 'Nama saya Husein, saya tak suka nasi ayam dan tak suka mandi, Xiàn zài wǒ yǒu bing chilling Wǒ hěn xǐ huān bing chilling.' prompt = f'<|begin_of_text|>{speaker}: {text}<|eot_id|>' input_ids = tokenizer(prompt,add_special_tokens = False, return_tensors = 'pt').to('cuda') with torch.no_grad(): generated_ids = model.generate( **input_ids, max_new_tokens=1200, do_sample=True, temperature=0.9, top_p=0.8, repetition_penalty=1.1, num_return_sequences=1, eos_token_id=128258, ) row = generated_ids[0, input_ids['input_ids'].shape[1]:] y_ = redistribute_codes(row) ipd.Audio(y_, rate = 24000) ``` ## Source code Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/orpheus