Skip to main content

Use sub-files to add more context to the prompt

Strategy​

Use the AST for resolving sub-files of the file to test. ADR\ - Explore AST with babel vs typescript => We choose to use typescript for resolving sub-files.

Why?​

To improve test generation by giving more context: Hub Test Generation

Results​

Testing strategy​

  • Our test data: files of the warehouse package in the react-native-enablers repo.
  • Idea: generate tests with AI for all warehouse files which already have a test (so we can compare the generated tests with the human-written tests)
  • Methodology:
    • Get all files to test
    • For each file to test, get all the sub-files imported in the file
    • Create a prompt: A. Without the sub-files (previous version) B. With the sub-files (new version)
    • Make the request to OpenAI/GPT for each file
    • Clean the response (ie remove the backquotes or verbatims)
    • Replace the render method by the method from utils renderWithProviders (because we know it is a common mistake from the AI)
    • Run jest in the warehouse folder to know the number of successful tests
  • Limitations: we crop the prompt to ≃3000 tokens to avoid exceeding the maximum number of tokens per request

Benchmark​

ExperienceSuccess rate (after cleaning)Success rate (after replacing render by renderWithProviders) manually
A: Without sub-files in the prompt2/72 (2,7%)27/72 (37,5%)
B: With sub-files in the prompt5/63 (7,9%)29/59 (49,2%)

Legend: X/Y :

  • X is the number of tests written by AI that passed,
  • Y is the total number of tests where jest has been able to parse the file.

For the full test procedure and benchmark, see the ai-research repo.

Analysis of the results​

πŸ‘ A better success rate with sub-files​

The tests are more accurate, thanks to the sub-files.

πŸ‘Ž Less tests in total​

We have more test files that are not semantically correct, and so can't be parsed by jest. Hypothesis of cause:

  • We give the file and sub-files in the same way in the prompt, and thus the AI tries to create a test file for each of the sub-files. This result in multiple files written in the same answer, and so in the same file, and so we have duplicated imports in the generated test file (which breaks the file).