Skip to main content

Evaluation of the test generation

Problem​

How to make sure the Hub Test Generation is always improving?

Control points​

  • We need a score to compare two generated tests

Short Answer​

  • By testing the use of hard-coded test standards we can give a score to each tests: The generated test gets some points each time it respects one of our dev standards

Strategy/Solution​

We used an OpenAI's functions with a prompt to compute a score. The dev standards are hard-coded in the function's parameters.

function_to_provide_score = {
"name": "provide_score_to_user",
"description": "Provide the score to the user",
"parameters": {
"type": "object",
"properties": {
"score_for_correct_render_function": {
"type": "number", "minimum": 0, "maximum": 2,
"description": "The test should use the renderWithProvider function (0 - 2)",

# [...]

"score_for_following_the_minimum_code_example": {
"type": "number", "minimum": 0, "maximum": 5,
"description": "The test should use follow the minimum code example (0 - 5)",
},
},
},
}

To make sure this works we tested it on a bad, medium and good tests to see what score they get. Here is the result : "Total bad: 50.0 medium: 62.5 good: 95.8"

Limitations​

  • The dev standards must be hardcoded
  • The score is not deterministic and can change for the same test for 2 different runs
  • It evaluates only the syntax (and not if the test is runnable, righteous, ...)

Example​

Here is the prompt + api call to work with the above function

prompt = """
Here is a test written by an intern. I would like to give them feedback on how good it is and provide a score.
To generate the score you will, for each dev standard, evaluate the test and give a score (float number is allowed).

Here is a minimal code example for the test:
'''
import { renderWithProviders } from "#testing/renderWithProviders";
import { fireEvent, screen, waitFor } from "@testing-library/react-native";

describe("MyComponent", () => {
it("displays a title", () => {
renderWithProviders(<MyComponent />);

expect(screen.getByText("MyComponent Title")).toBeOnTheScreen();

expect(screen).toMatchComponentSnapshot();
});

it("fires a callback when button is pressed", () => {
const mockCallback = jest.fn();
renderWithProviders(<MyComponent onButtonPress={mockCallback} />);

fireEvent.press(screen.getByRole("button", { name: "Button" }));

expect(mockCallback).toHaveBeenCalled();
});
});
'''


Here is the test:
'''
""" + generated_test + """
'''
"""

response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": "You are a helpful developer who is expert in writing tests for React Native components. You are going to help the user to evaluate some tests.",
},
{"role": "user", "content": prompt},
],
functions=[function_to_provide_score],
function_call={"name": "provide_score_to_user"}
)