AI/ML

LLM training parameters explanation

Quick overview of LLM MLX LORA training parameters.

weight_decayA regularization technique that adds a small penalty to the weights during training to prevent them from growing too large, helping to reduce overfitting. Often implemented as L2 regularization.examples: 0.00001 – 0.01
grad_clipShort for gradient clipping — a method that limits (clips) the size of gradients during backpropagation to prevent exploding gradients and stabilize training.examples: 0.1 – 1.0
rankRefers to the dimensionality or the number of independent directions in a matrix or tensor. In low-rank models, it controls how much the model compresses or approximates the original data.examples: 4, 8, 16 or 32
scaleA multiplier or factor used to adjust the magnitude of values — for example, scaling activations, gradients, or learning rates to maintain numerical stability or normalize features.examples: 0.5 – 2.0
dropoutA regularization method that randomly “drops out” (sets to zero) a fraction of neurons during training, forcing the network to learn more robust and generalizable patterns.examples: 0.1 – 0.5