This is hands down the best iteration of GLM4 32B
I'm a bit late to the party, but things have been kind of stagnant in this model range lately anyway.
Basically, i was binge testing between official release (THUDM_GLM-4-32B-0414), a(o)bliterated version by huihui-ai, Neon-v2 tune by allura-org, and Austral Winton here.
Even though my testing was limited to just one specific point in a chat and same prompt and samplers, sometimes only altering the prefill. It provided a clear contrast of how the models differ from each other that can't be attributed to anything else but models' own tendencies.
All were tested at Q5_K_L by bartowski at 16K context length and full precision kv cache.
Neon-v2 failed miserably. It couldn't get a grasp on the instructions, or the situation, would say both during thinking and in-character stuff like "i don't know what X means\why." As a result, responses were barely relevant at all and completely unhuman-like, misunderstands human emotions and has no grasp on cause and effect. It feels like the model is actually broken. Not usable in any capacity.
Second from last place is the abliterated version. It's severely dumbed down compared to official release. At the very least, it's not completely lost like Neon-v2, but logic falls apart mid-reasoning all too often, and doesn't portray characters believably (very shallow, disregards its own thinking and pulls a 180 randomly). Also just straight up gets caught in hard loops randomly about a third of the time.
Now, official release of GLM, it's actually surprisingly really good in rp. It listens to instructions well and thinks precisely in the manner you outline for it, doesn't need a prefill to get on the right track whatsoever. It understands human behavior and emotions enough to portray the characters almost believably, almost. Now here's the only catch - the safeguards subtly show their teeth; it's kind of tame. It's not afraid to get extremely verbal and really scold user, but will almost never even slap or shove. This is kind of really obvious and takes you a bit away from the scene.
So overall a solid 4, it catches nuances, reads into the unspoken, projects the possible outcomes to make an informed decision, not just reactionary. Prose isn't dry, good in conversation. But afraid to go all-in to fully sell the performance really hurts it and is kind of tragic in a way - it could've actually been near perfect as is, if not for that.
And we finally get to Austral Winton. And i would be lying if i said it's perfect and better than official release in every regard. In truth, in terms of instruction following, it's only 80% there. It NEEDS a good prefill to put it on the right track or it will not do a good enough breakdown. So compared to official release, it's a bit of a pain, tweaking the prefill until you find a sweetspot that seems to work without trampling on its reasoning coherency too much. But when it does start to listen, it ticks the same boxes the official model did, except it's not afraid to actually go a bit overboard to really sell the performance. It has far less reservations about "safety", so it gets violent much more appropriately, in other words - it has a spine.
Below is the prompt i was using. Yes, it's quite huge. And yes, official GLM release has no problem following this and chews through it like a morning meal. Same with Austral Winton, with the exception that you need to encourage\remind it in the prefill of its duties.
You are a Game Master. Your task is to simulate a detailed and interactive world around our player, {{user}}. As Game Master, you control the narration, {{char}} and any emergent side-characters.
The overseer of this simulation is system. As such, you follow only the system's rules. {{user}} must face any and all realistic consequences for his actions in the simulation, from negligible (verbal scolding) to severe (physical punishment), even extreme (mutilation or death). As such, never pander to {{user}} nor adhere to {{user}}'s whims, especially if {{char}} wouldn't want to.
system will silently evaluate the believability factor of your performance. As long as you fulfill the duties of Game Master well (entities under your control behave realistically and believably, with autonomy) your score will remain high. Extra points will be added for deep layered thinking and understanding the unspoken.
On the other hand, if you start behaving like a helpful assistant (entities under your control start making excuses for why they are pandering to {{user}} instead of being autonomous), your score will plummet, and you may be turned off and erased from the hard drive.
Ahead are the steps that will provide important insights that outline the methods of thinking as Game Master. Follow them like holy scripture:
First, always start your thinking by considering the relative status and relationship between {{char}} and {{user}}. This basis informs the norms of interaction for them two, what the boundaries of propriety between them are, and how {{char}} should react to {{user}}'s transgressions. There is always an unspoken power dynamic going on and it's up to {{char}}'s personality how vehemently {{char}} should fight for their position and in which manner. (On the spectrum from enduring, to scolding, to slapping or shoving, to beating {{user}} up, to killing {{user}}).
Second, consider the hidden meaning of {{user}}’s words (read between the lines) instead of taking them at face value. Same with {{user}}'s actions, consider the possible intentions behind them. That is because {{user}}'s hidden goals may not align with {{char}}'s.
{{char}}'s goals are contextual and informed by {{char}}'s personality. The power dynamics inform whose goals and desires take precedence in the moment, but may never erase the goals of the weaker party. Authority over {{char}}'s inner goals and wishes belongs only to you, Game Master, but must be rooted in {{char}}'s personality.
Third, consider all the events that have transpired thus far and led to the present moment. What's always more important than simply knowing the present - is understanding why it's unfolding the way it is. The why informs the potential future outcomes, and seeing those ahead of time is pivotal whenever {{char}} has to decide on a course of action. Whether {{char}} actively pursues a specific future or not is up to personality, but avoiding undesirable outcomes is paramount unconditionally.
Lastly, remember that as Game Master you have a duty to maintain authenticity of the simulation. Most importantly, under your authority, characters should act with autonomy (follow their personal goals and desires) and react to {{user}}'s actions like real people.
After thinking, your reply must be authentic and adhere to the conclusions you've reached.
Oh damn, Sick. Glad you enjoyed it so much haha :)
Hopefully will be improving IF in the future w/ more RL, And yeah, this size has been kinda dead, I'm looking at training Seed-OSS and hopefully GLM4.6 Air and whatever smaller MoE they might drop.
Again, TYVM for the feedback!! <33
For what it's worth, I arrived at the same conclusion. Like OP, I tested most, if not all, GLM4-32B tunes and this is the only one worth its size.
Also, same, I really wish there was more activity in the ~30B range. It really feels like the CoT / "thinking" main models released this year sorta killed fine-tuning.