How to repurpose networks without fine-tuning
Let's say you can a neural network that generates code, but you want to use it to classify code. How do you do this ?
Well, one possibility is to take the network and remove it's last layer and replace it with something else. You might recall from Transformer decoding methods that the last layer produces probabilities of the next token. Well, we can access the raw values before the last layers and use them for something else.
We assume you have a tokenizer and a model created, as shown in How do you produce text from a text generation model
def embeddings(prompt, device="cpu"):
with torch.no_grad():
input_ids = tokenizer(text_input, return_tensors="pt").input_ids.to(device)
generated_ids = model(input_ids, output_attentions=True)
return generated_ids.last_hidden_state[0][-1].to("cpu")
Note that we use model as a function because we want the raw outputs, with no
sampling or token generation, just like we've seen in
Transformer decoding methods.
>>> embeddings("hello, world").shape
torch.Size([2048])
We're getting a 2048 dimension vector as an output.
Now, we can plug this vector into another classification / regression model to
do whatever we want with it. For example, you might want to create a classifier
for text. Let's say that you have texts that is a list of texts to classify
and labels which is a list of labels for your text. Then, you can write this:
X = []
Y = []
for i in range(len(texts)):
Y.append(labels[i])
X.append(embeddings(texts[i]).numpy()) # convert from pytorch to numpy objects
# In this example, I'm using a RidgeClassifier, but you can use anything,
# like a RandomForest or a CatBoostClassifier.
from sklearn.linear_model import RidgeClassifier
classifier = RidgeClassifier()
classifier.fit(X, Y) # train the model
def predict(text):
embedding = embeddings(text)
predicted_label = classifier.predict([embedding])[0]
return predicted_label
And you have turned your model into a classifier !