Skip to main content

Understanding Code with Open AI

Goal: Determine which model to use for bam.AI​

Idea 1: Use Codex by training it on BAM code.​

  • Codex is deprecated and no longer available, but fine-tuning with OpenAI can be used to train the model on BAM code.

Idea 2: Use model fine-tuning with Open AI​

https://openai.com/pricing :

ModelTrainingUsage
Ada$0.0004Β / 1K tokens$0.0016Β / 1K tokens
Babbage$0.0006Β / 1K tokens$0.0024Β / 1K tokens
Curie$0.0030Β / 1K tokens$0.0120Β / 1K tokens
Davinci$0.0300Β / 1K tokens$0.1200Β / 1K tokens

https://platform.openai.com/docs/guides/fine-tuning

Alternatives to Open AI​

If we want to choose a text embedding model to calculate distance and display function models, as currently done:

Text-embedding models are ranked in a leaderboard here:

Common Limitations of Models​

  • Often a token limit: 512 or even less for most, compared to 8191 for text-embedding-ada-002 and possibly 32k for ChatGPT

How to Overcome These Limitations?​

We could consider splitting the code properly using the AST. A bit complicated / slow, and some functions are probably longer than 500 tokens.

Actions Taken Following the Article​