Skip to main content

Collect examples by extracting fix commits

Global Goal​

Create a tool to detect bugs, based more on previous bug fixes from a project. See Implementation Roadmap

Question​

Can we extract all fixes made on a project, by using the commit name?

Answer​

Yes, we can.

How?​

We can filter in python by commit messages, and we keep only the ones with bug or fix in the message. Then we parse all file diffs, and so retrieve files with +/- markers to known what has been removed and what has been added to fix a bug. We also keep the commit message to know what bug these diffs are fixing.

import re
from dataclasses import dataclass
from ai_research.shared.code_datasets import load_repository, Commit

@dataclass
class FileDiff:
filename: str
content: str

@dataclass
class PartialCommit:
message: str
files: list[FileDiff]

# load the repo
repo = code_datasets.load_repository(
"https://github.com/pass-culture/pass-culture-app-native",
rev="master")

# a function to filter fix commits
def is_bug_fix_commit(commit: Commit):
pattern = r'\s(fix|bug)\s'
return bool(re.search(pattern, commit.message, re.IGNORECASE))

fix_commits_changes: list[PartialCommit] = []

for commit_hash in repo.commits:
# filter fix commits
if not is_bug_fix_commit(repo.commits[commit_hash]):
continue
# create commit object with message
fix_commit_changes = PartialCommit(
message=repo.commits[commit_hash].message,
files=[])
# add file diffs content for each file of the commit
for file_path in repo.commits[commit_hash].diffs:
fix_commit_changes.files.append(FileDiff(
filename=file_path,
content=repo.commits[commit_hash].diffs[file_path])
)
fix_commits_changes.append(fix_commit_changes)