Skip to main content

Preparing a model for PEFT

Let's start by importing the relevant peft function we'll need.

from peft import LoraConfig, get_peft_model

A model contains a lot of Matrixs that are grouped into "modules". We can see what modules a model is made off to decide what matrices we want to train.

>>> model.modules
<bound method Module.modules of CodeGenForCausalLM(
(transformer): CodeGenModel(
(wte): Embedding(51200, 2048)
(drop): Dropout(p=0.0, inplace=False)
(h): ModuleList(
(0-15): 16 x CodeGenBlock(
(ln_1): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
(attn): CodeGenAttention(
(attn_dropout): Dropout(p=0.0, inplace=False)
(resid_dropout): Dropout(p=0.0, inplace=False)
(qkv_proj): Linear(in_features=2048, out_features=6144, bias=False)
(out_proj): Linear(in_features=2048, out_features=2048, bias=False)
)
(mlp): CodeGenMLP(
(fc_in): Linear(in_features=2048, out_features=8192, bias=True)
(fc_out): Linear(in_features=8192, out_features=2048, bias=True)
(act): NewGELUActivation()
(dropout): Dropout(p=0.0, inplace=False)
)
)
)
(ln_f): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
)
(lm_head): Linear(in_features=2048, out_features=51200, bias=True)
)>

Modules type include:

  • Dropout, a layer that sets it's inputs to zero with probability p. Useful to reduce overfitting.
  • Embedding, a layer to convert from the tokens (there are 51200 different possible Tokens) to a Vector of size 2048.
  • Linear, a matrix of size in_features Γ—\times out_features. If bias is True, it also adds to its input a vector that can be learnt.
  • LayerNorm, a layer that normalizes the input by dividing the input vector by its length.

For this model, we see that we have modules called qkv_proj, out_proj, fc_in, fc_out and lm_head that can be trained (Linear means it's a matrix with trainable coefficients)

We then create a configuration to define the parameters of the training.

config = LoraConfig(
r=8, # lora dimension, rank of the matrix that will be added to the weights.
lora_alpha=32, # boost to the value of the lora coefficient
target_modules=["qkv_proj","out_proj"], # the modules that we want to train
lora_dropout=0.05, # probability that a Lora coefficient will be 0 during backpropagation, used to reduce overfiting.
bias="none",
task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)

We are using the Lora (QLoRA) algorithm to speed-up training. We pick the lora parameters here.