ghostbuster

How does it work?

Ghostbuster works by passing documents through a series of weaker language models and running a structured search over possible combinations of their features, then training a classifier on the selected features to determine if the target document was AI-generated.

Disclaimer

IMPORTANT: As of January 2024, OpenAI has deprecated the davinci and ada endpoints used in the original paper. As such, Ghostbuster now uses babbage-002 and davinci-002, which result in worse performance. We are working on improving the model, but please keep this in mind when assessing predictions.

Ghostbuster’s training data, which consists of news, student essay, and creative writing data, is not representative of all writing styles or topics and contains predominantly British and American English text. If you wish to apply Ghostbuster to real-world cases of potential off-limits usage of text generation, such as identifying ChatGPT-written student essays, be wary that incorrect predictions by Ghostbuster are particularly likely in the following cases:

For shorter text

In domains that are further from those on which Ghostbuster was trained (e.g., text messages)

For text in varieties of English besides American and British English, or in non-English languages

For text written by non-native speakers of English

For AI-generated text that has been edited or paraphrased by a human

No AI-generated text detector is 100% accurate; we strongly discourage incorporation of Ghostbuster into any systems that automatically penalize students or other writers for alleged usage of text generation without human intervention. Privacy: Please be aware that all inputs to Ghostbuster are sent to the OpenAI API, and we also save the inputs for internal testing purposes. Though we will not distribute the data publicly, we cannot guarantee the privacy of any inputs to Ghostbuster.