arosplatforms™AI consultancy

AI

ar
← AI Glossary
Models & training

LoRA / QLoRA

Efficient fine-tuning methods that adapt a large model by training small add-on weights instead of the whole thing.

LoRA (Low-Rank Adaptation) is a technique for customizing a large model without retraining all of its billions of parameters. Instead, it freezes the original model and trains a tiny set of new weights that nudge its behavior toward your task. QLoRA goes further by first compressing the base model with quantization, so the whole job can run on far cheaper hardware.

This matters because full fine-tuning is expensive, slow, and produces a giant new model to store and serve. LoRA adapters are small, fast to train, and easy to swap, so one base model can carry many task-specific adapters. QLoRA makes that practical on a single modest GPU, putting domain customization within reach of teams that could not otherwise afford it.

At arosplatforms we treat LoRA and QLoRA as the pragmatic middle ground between prompting and full fine-tuning. When prompt engineering and retrieval are not enough, we use them to specialize a model on a client's domain at a fraction of the cost, while keeping the underlying base model swappable as the field moves.

Have a use for this in your business?

Book a free consultation and we'll show you what's feasible and how we'd ship it.