Collect examples by extracting fix commits
Global Goalβ
Create a tool to detect bugs, based more on previous bug fixes from a project. See Implementation Roadmap
Questionβ
Can we extract all fixes made on a project, by using the commit name?
Answerβ
Yes, we can.
How?β
We can filter in python by commit messages, and we keep only the ones with bug
or fix in the message. Then we parse all file diffs, and so retrieve files
with +/- markers to known what has been removed and what has been added to fix a
bug. We also keep the commit message to know what bug these diffs are fixing.
import re
from dataclasses import dataclass
from ai_research.shared.code_datasets import load_repository, Commit
@dataclass
class FileDiff:
filename: str
content: str
@dataclass
class PartialCommit:
message: str
files: list[FileDiff]
# load the repo
repo = code_datasets.load_repository(
"https://github.com/pass-culture/pass-culture-app-native",
rev="master")
# a function to filter fix commits
def is_bug_fix_commit(commit: Commit):
pattern = r'\s(fix|bug)\s'
return bool(re.search(pattern, commit.message, re.IGNORECASE))
fix_commits_changes: list[PartialCommit] = []
for commit_hash in repo.commits:
# filter fix commits
if not is_bug_fix_commit(repo.commits[commit_hash]):
continue
# create commit object with message
fix_commit_changes = PartialCommit(
message=repo.commits[commit_hash].message,
files=[])
# add file diffs content for each file of the commit
for file_path in repo.commits[commit_hash].diffs:
fix_commit_changes.files.append(FileDiff(
filename=file_path,
content=repo.commits[commit_hash].diffs[file_path])
)
fix_commits_changes.append(fix_commit_changes)