Preparing a model for PEFT
Let's start by importing the relevant peft function we'll need.
from peft import LoraConfig, get_peft_model
A model contains a lot of Matrixs that are grouped into "modules". We can see what modules a model is made off to decide what matrices we want to train.
>>> model.modules
<bound method Module.modules of CodeGenForCausalLM(
(transformer): CodeGenModel(
(wte): Embedding(51200, 2048)
(drop): Dropout(p=0.0, inplace=False)
(h): ModuleList(
(0-15): 16 x CodeGenBlock(
(ln_1): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
(attn): CodeGenAttention(
(attn_dropout): Dropout(p=0.0, inplace=False)
(resid_dropout): Dropout(p=0.0, inplace=False)
(qkv_proj): Linear(in_features=2048, out_features=6144, bias=False)
(out_proj): Linear(in_features=2048, out_features=2048, bias=False)
)
(mlp): CodeGenMLP(
(fc_in): Linear(in_features=2048, out_features=8192, bias=True)
(fc_out): Linear(in_features=8192, out_features=2048, bias=True)
(act): NewGELUActivation()
(dropout): Dropout(p=0.0, inplace=False)
)
)
)
(ln_f): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
)
(lm_head): Linear(in_features=2048, out_features=51200, bias=True)
)>
Modules type include:
- Dropout, a layer that sets it's inputs to zero with probability
p. Useful to reduce overfitting. - Embedding, a layer to convert from the tokens (there are 51200 different possible Tokens) to a Vector of size 2048.
- Linear, a matrix of size
in_featuresout_features. IfbiasisTrue, it also adds to its input a vector that can be learnt. - LayerNorm, a layer that normalizes the input by dividing the input vector by its length.
For this model, we see that we have modules called qkv_proj, out_proj,
fc_in, fc_out and lm_head that can be trained (Linear means it's a
matrix with trainable coefficients)
We then create a configuration to define the parameters of the training.
config = LoraConfig(
r=8, # lora dimension, rank of the matrix that will be added to the weights.
lora_alpha=32, # boost to the value of the lora coefficient
target_modules=["qkv_proj","out_proj"], # the modules that we want to train
lora_dropout=0.05, # probability that a Lora coefficient will be 0 during backpropagation, used to reduce overfiting.
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, config)
We are using the Lora (QLoRA) algorithm to speed-up training. We pick the lora parameters here.