Quick overview of LLM MLX LORA training parameters. weight_decay A regularization technique that adds a small penalty to the weights during training to prevent them from growing too large, helping to reduce overfitting. Often implemented as L2 regularization.examples: 0.00001 – 0.01 grad_clip Short for gradient clipping — a method that limits (clips) the size of gradients during backpropagation to prevent exploding gradients and stabilize training.examples: 0.1 – 1.0 rank Refers to the dimensionality or the number of independent directions in a matrix or tensor. In low-rank models, it controls how much the model compresses or approximates the original data.examples: 4,
Full fine-tuning of mlx-community/Qwen2.5-3B-Instruct-bf16 Recently I posted article on how to train LORA MLX LLM here. Then I asked myself how can I export or convert such MLX model into HF or GGUF format. Even that MLX has such option to export MLX into GGUF most of the time it is not supported by models I have been using. From what I recall even if it does support Qwen it is not version 3 but version 2 and quality suffers by such conversion. Do not know why exactly it works like that. So I decided to give a try with
I have done over 500 training sessions using Qwen2.5, Qwen3, Gemma and plenty other LLM publicly available to inject domain specific knowledge into the model’s low rank adapters (LORA). However, instead of giving you tons of unimportant facts I will just stick to the most important things. Starting with the fact that I have used MLX on my Mac Studio M2 Ultra as well as on MacBook Pro M1 Pro. Both fit well to this task in terms of BF16 speed as well as unified memory capacity and speed (up to 800GB/s). Memory speed is the most important factor comparing