LoRA / QLoRA: Definition

LoRA (Low-Rank Adaptation) is a technique for customizing a large model without retraining all of its billions of parameters. Instead, it freezes the original model and trains a tiny set of new weights that nudge its behavior toward your task. QLoRA goes further by first compressing the base model with quantization, so the whole job can run on far cheaper hardware.

This matters because full fine-tuning is expensive, slow, and produces a giant new model to store and serve. LoRA adapters are small, fast to train, and easy to swap, so one base model can carry many task-specific adapters. QLoRA makes that practical on a single modest GPU, putting domain customization within reach of teams that could not otherwise afford it.

At arosplatforms we treat LoRA and QLoRA as the pragmatic middle ground between prompting and full fine-tuning. When prompt engineering and retrieval are not enough, we use them to specialize a model on a client's domain at a fraction of the cost, while keeping the underlying base model swappable as the field moves.

AI

LoRA / QLoRA

Related terms

Have a use for this in your business?