Ghostbuster works by passing documents through a series of weaker language models and running a structured search over possible combinations of their features, then training a classifier on the selected features to determine if the target document was AI-generated.
Ghostbuster’s training data, which consists of news, student essay, and creative writing data, is not
representative of all writing styles or topics and contains predominantly British and American English
text. If you wish to apply Ghostbuster to real-world cases of potential off-limits usage of text
generation, such as identifying ChatGPT-written student essays, be wary that incorrect predictions by
Ghostbuster are particularly likely in the following cases: