Ghostbuster works by passing documents through a series of weaker language models and running a structured search over possible combinations of their features, then training a classifier on the selected features to determine if the target document was AI-generated.
Ghostbuster’s training data, which consists of news, student essay, and creative writing data, is not
representative of all writing styles or topics and contains predominantly British and American English
text. If you wish to apply Ghostbuster to real-world cases of potential off-limits usage of text
generation, such as identifying ChatGPT-written student essays, be wary that incorrect predictions by
Ghostbuster are particularly likely in the following cases:
No AI-generated text detector is 100% accurate; we strongly discourage incorporation of Ghostbuster into
any systems that automatically penalize students or other writers for alleged usage of text generation
without human intervention.
Privacy: Please be aware that all inputs to Ghostbuster are sent to the OpenAI API, and we also save the
inputs for internal testing purposes. Though we will not distribute the data publicly, we cannot
guarantee the privacy of any inputs to Ghostbuster.