Evaluation of the test generation

Problem

How to make sure the Hub Test Generation is always improving?

Control points

We need a score to compare two generated tests

Short Answer

By testing the use of hard-coded test standards we can give a score to each tests: The generated test gets some points each time it respects one of our dev standards

Strategy/Solution

We used an OpenAI's functions with a prompt to compute a score. The dev standards are hard-coded in the function's parameters.

function_to_provide_score = {
    "name": "provide_score_to_user",
    "description": "Provide the score to the user",
    "parameters": {
        "type": "object",
        "properties": {
            "score_for_correct_render_function": {
                "type": "number", "minimum": 0, "maximum": 2,
                "description": "The test should use the renderWithProvider function (0 - 2)",

           # [...]

            "score_for_following_the_minimum_code_example": {
                "type": "number", "minimum": 0, "maximum": 5,
                "description": "The test should use follow the minimum code example (0 - 5)",
            },
        },
    },
}

To make sure this works we tested it on a bad, medium and good tests to see what score they get. Here is the result : "Total bad: 50.0 medium: 62.5 good: 95.8"

Limitations

The dev standards must be hardcoded
The score is not deterministic and can change for the same test for 2 different runs
It evaluates only the syntax (and not if the test is runnable, righteous, ...)

Example

Here is the prompt + api call to work with the above function

prompt = """
Here is a test written by an intern. I would like to give them feedback on how good it is and provide a score.
To generate the score you will, for each dev standard, evaluate the test and give a score (float number is allowed).

Here is a minimal code example for the test:
'''
import { renderWithProviders } from "#testing/renderWithProviders";
import { fireEvent, screen, waitFor } from "@testing-library/react-native";

describe("MyComponent", () => {
  it("displays a title", () => {
    renderWithProviders(<MyComponent />);

    expect(screen.getByText("MyComponent Title")).toBeOnTheScreen();

    expect(screen).toMatchComponentSnapshot();
  });

  it("fires a callback when button is pressed", () => {
    const mockCallback = jest.fn();
    renderWithProviders(<MyComponent onButtonPress={mockCallback} />);

    fireEvent.press(screen.getByRole("button", { name: "Button" }));

    expect(mockCallback).toHaveBeenCalled();
  });
});
'''


Here is the test:
'''
""" + generated_test + """
'''
"""

response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {
                "role": "system",
                "content": "You are a helpful developer who is expert in writing tests for React Native components. You are going to help the user to evaluate some tests.",
            },
            {"role": "user", "content": prompt},
        ],
        functions=[function_to_provide_score],
        function_call={"name": "provide_score_to_user"}
    )

Problem​

Control points​

Short Answer​

Strategy/Solution​

Limitations​

Example​