Skip to main content

Use a GPT crawler to generate knowledge

Problemโ€‹

How to easily generate knowledge to feed our custom bot or app using https://github.com/BuilderIO/gpt-crawler ?

Short Answerโ€‹

You take a website doc (for example bam list of articles) and tell it to generate knowledge for all the articles

How?โ€‹

You have a config file, change the initial url, then which urls it should crawl based on that one and where to find the text on this url. Start the bot using node, it will fetch all the urls and create a json file with all the compiled text and knowledge.

Works great on documentations

Limitationsโ€‹

  • It should be able to just crawl all links with a matcher, without an initial url
  • It doesnโ€™t work if the first url doesnโ€™t have text on it
  • You need to find an html balise where the text is on all pages

Exampleโ€‹

export const config: Config = {
url: "https://reactnavigation.org/docs/getting-started",
match: "https://reactnavigation.org/docs/**",
selector: `.theme-doc-markdown`,
maxPagesToCrawl: 200,
outputFileName: "output.json",
};