Evaluation of the test generation
Problemβ
How to make sure the Hub Test Generation is always improving?
Control pointsβ
- We need a score to compare two generated tests
Short Answerβ
- By testing the use of hard-coded test standards we can give a score to each tests: The generated test gets some points each time it respects one of our dev standards
Strategy/Solutionβ
We used an OpenAI's functions with a prompt to compute a score. The dev standards are hard-coded in the function's parameters.
function_to_provide_score = {
"name": "provide_score_to_user",
"description": "Provide the score to the user",
"parameters": {
"type": "object",
"properties": {
"score_for_correct_render_function": {
"type": "number", "minimum": 0, "maximum": 2,
"description": "The test should use the renderWithProvider function (0 - 2)",
# [...]
"score_for_following_the_minimum_code_example": {
"type": "number", "minimum": 0, "maximum": 5,
"description": "The test should use follow the minimum code example (0 - 5)",
},
},
},
}
To make sure this works we tested it on a bad, medium and good tests to see what score they get. Here is the result : "Total bad: 50.0 medium: 62.5 good: 95.8"
Limitationsβ
- The dev standards must be hardcoded
- The score is not deterministic and can change for the same test for 2 different runs
- It evaluates only the syntax (and not if the test is runnable, righteous, ...)
Exampleβ
Here is the prompt + api call to work with the above function
prompt = """
Here is a test written by an intern. I would like to give them feedback on how good it is and provide a score.
To generate the score you will, for each dev standard, evaluate the test and give a score (float number is allowed).
Here is a minimal code example for the test:
'''
import { renderWithProviders } from "#testing/renderWithProviders";
import { fireEvent, screen, waitFor } from "@testing-library/react-native";
describe("MyComponent", () => {
it("displays a title", () => {
renderWithProviders(<MyComponent />);
expect(screen.getByText("MyComponent Title")).toBeOnTheScreen();
expect(screen).toMatchComponentSnapshot();
});
it("fires a callback when button is pressed", () => {
const mockCallback = jest.fn();
renderWithProviders(<MyComponent onButtonPress={mockCallback} />);
fireEvent.press(screen.getByRole("button", { name: "Button" }));
expect(mockCallback).toHaveBeenCalled();
});
});
'''
Here is the test:
'''
""" + generated_test + """
'''
"""
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": "You are a helpful developer who is expert in writing tests for React Native components. You are going to help the user to evaluate some tests.",
},
{"role": "user", "content": prompt},
],
functions=[function_to_provide_score],
function_call={"name": "provide_score_to_user"}
)