Choosing the next token

When generating natural language, there are multiple possible next words that make sense in a sentence. Code is more rigid and the list of possible Tokens is reduced.

So, you need a different Transformer decoding methods.

Using the greedy approach (always picking the most likely token) leads to repetitive text: This is also true with a sampling approach on low temperatures (lower than 0.3).

describe('parseBooleanVariables', () => {
  it('should parse boolean variables correctly', () => {
    const config = {
      ENV_VARIABLE_1: 'true',
      ENV_VARIABLE_2: 'false',
      ENV_VARIABLE_3: 'true',
      ENV_VARIABLE_4: 'false',
      ENV_VARIABLE_5: 'true',
      ENV_VARIABLE_6: 'false',
      ENV_VARIABLE_7: 'true',
      // ... Goes on infinitly ...

But using a sampling approach leads to code that is "too creative" (bad):

import {NativeConfig} from 'react-native-config';
import {GenericEnvironment} from '..';

export const parseBooleanVariables = (
  config: NativeConfig,
): GenericEnvironment => {
  const configWithActualBooleans = {...config} as GenericEnvironment;

  Object.keys(config).map(key => {
    if (config[key] === 'true') {
      configWithActualBooleans[key] = true;
    } else if (config[key] === 'false') {
      configWithActualBooleans[key] = false;
    }
  });

  return configWithActualBooleans;
};

------
import {parseBooleanVariables} from './parseBooleanVariables';

describe('parseBooleanVariables', () => {
  it('should return true for all KadLind upgrading Rome Alzheimer ... garbage output

What we want is the code of the greedy approach, but without the repetition. To do this, we can penalize in the probability distribution tokens that were recently generated multiple times. So, by the time "ENV" is produced for the forth time in a few lines, it's probability is lowered enough that it is no longer the top token and something else is produced.

This is described more in depth here: https://arxiv.org/pdf/1909.05858.pdf

My implementation looks like this, we add a special temperature like variable that is only applied on repeated tokens.

past_tokens = []
device = "mps"

def generate_next_token(temperature=0.2, repetition_penality=1.2):
	global past_tokens
	outputs = model(**self.prompt_as_tokens)
	logits = outputs.logits[0][-1] / temperature

	scores = torch.ones(logits.shape[0]).to(device)
	for i in range(len(self.past_tokens)):
		scores[past_tokens[i]] += 1 * repetition_penality

	logits = logits / scores

	probability_distribution = torch.nn.functional.softmax(logits, dim=-1)

	predicted_token = torch.multinomial(probability_distribution, num_samples=1, replacement=False)

	# Add the new token to the list of past tokens.
	past_tokens.append(predicted_token)
	new_tok = tokenizer.decode(predicted_token, skip_special_tokens=True)