>>52533572ChatGPT and GPT-4 are what are known as large language models (Neuro-sama is also powered by an LLM). By training an AI on a shit-ton of human written text researchers have discovered you can make them fairly intelligent. The more text data you can train them on, the smarter they will be. ChatGPT's success has lead to a gold rush situation where every big tech company is trying to make their own custom datasets, by automatically downloading text data from every big website they can find in order to train their own LLMs. This process of downloading text and sometimes images or videos from websites is known as data scraping.