How to improve the prompt
Strategyβ
Improve the prompt for Test Generation (see Hub Test Generation). Target: improve the rate of tests which are Right-the-first-time (ie do not need any intervention of the user to make them pass).
Improvementsβ
1. Following the examplesβ
Observation: the generated test does not fully follows the example (it does not use the renderWithProviders method, does not always follow the style of the example tests...)
Action: Update the prompt to enforce GPT to better follow the example:
- You are a React-Native developer with 20 years of experience.
- Your task is to write a test of a react-native component.
+ You are an experienced React Native developer, and your task is
+ to write a test for a specific component following the exact methods,
+ structure, and conventions shown in the examples provided below.
2. Correct manually recurrent errorsβ
Observation: GPT stills does the same errors, e.g. using render instead of
renderWithProviders and destructuring the returned object from
renderWithProviders instead of using the screen object.
Action: tell in the prompt to specifically address these issues:
+ Your tests MUST ALWAYS use the provided renderWithProviders method
+ instead of the render method.
+ You MUST also NOT destructure the screen object.
Note: I had to use CAPITALS to force GPT to respect these criteria.
3. Isolate the file to testβ
Observation: when passing the sub-files which are imported in the file to test, GPT sometimes returns multiple file tests for each of these files. This create duplicated imports in the generated test files, thus resulted in broken files.
Action: Separate sub-files from the main file to test in the prompt
context_files = get_imports(filepath, project_root)
context_files_content = ""
for path in context_files:
with open(path, "r") as file:
context_files_content += f"// {path}\n"
context_files_content += file.read()
prompt = f"""
...
Examples of good tests :
{examples_of_tests}
------------
Now the component to test :
// {filepath}
{file_to_test_content}
------------
Finally here are some context files to help you understand the
component to test:
{context_files_content}
"""
Resultsβ
The tests are all made with the sub-files. The generated tests are all passed to
the clean method (see
Use sub-files to add more context to the prompt).
We do NOT do any extra human-actions after.
| Number of tests found by jest | Number of passing tests | Percentage | |
|---|---|---|---|
| BEFORE (previous prompt) | 63 | 5 | 7,9 % |
| NOW (new prompt) | 85 | 28 | 32,9 % |
Analysisβ
- π More tests are found by jest (ie we have less invalid typescript files!)
- π More tests are passing
- π A lot of test still not pass because of incorrect mocks or methods that do not exist
Next stepsβ
- Fine-tuning with BAM examples to make the model learn about our way to do
mocks?
- which projects have better tests for training? => ask Pierre Z.
- which projects can we use for testing? Must be different from training
- Comparing ai-generated test and human-written test
- re-using the model for comparing code from the search feature, to evaluate the distance between the two files?
- Loop improvement: ask GPT to correct its own code
- Problem: we already reach the limit of token with the first message, how can
reduce it to be able to send a second message in the same chat?
- Fallback to the ChatGPT web-interface?
- Problem: we already reach the limit of token with the first message, how can
reduce it to be able to send a second message in the same chat?