Preparing a model for PEFT

Let's start by importing the relevant peft function we'll need.

from peft import LoraConfig, get_peft_model

A model contains a lot of Matrixs that are grouped into "modules". We can see what modules a model is made off to decide what matrices we want to train.

>>> model.modules
<bound method Module.modules of CodeGenForCausalLM(
  (transformer): CodeGenModel(
    (wte): Embedding(51200, 2048)
    (drop): Dropout(p=0.0, inplace=False)
    (h): ModuleList(
      (0-15): 16 x CodeGenBlock(
        (ln_1): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
        (attn): CodeGenAttention(
          (attn_dropout): Dropout(p=0.0, inplace=False)
          (resid_dropout): Dropout(p=0.0, inplace=False)
          (qkv_proj): Linear(in_features=2048, out_features=6144, bias=False)
          (out_proj): Linear(in_features=2048, out_features=2048, bias=False)
        )
        (mlp): CodeGenMLP(
          (fc_in): Linear(in_features=2048, out_features=8192, bias=True)
          (fc_out): Linear(in_features=8192, out_features=2048, bias=True)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.0, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=2048, out_features=51200, bias=True)
)>

Modules type include:

Dropout, a layer that sets it's inputs to zero with probability p. Useful to reduce overfitting.
Embedding, a layer to convert from the tokens (there are 51200 different possible Tokens) to a Vector of size 2048.
Linear, a matrix of size in_features $\times$ out_features. If bias is True, it also adds to its input a vector that can be learnt.
LayerNorm, a layer that normalizes the input by dividing the input vector by its length.

For this model, we see that we have modules called qkv_proj, out_proj, fc_in, fc_out and lm_head that can be trained (Linear means it's a matrix with trainable coefficients)

We then create a configuration to define the parameters of the training.

config = LoraConfig(
    r=8, # lora dimension, rank of the matrix that will be added to the weights.
    lora_alpha=32, # boost to the value of the lora coefficient
    target_modules=["qkv_proj","out_proj"], # the modules that we want to train
    lora_dropout=0.05, # probability that a Lora coefficient will be 0 during backpropagation, used to reduce overfiting.
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)

We are using the Lora (QLoRA) algorithm to speed-up training. We pick the lora parameters here.