While this might be a boon for AI research, it will be a disaster for privacy and the rights of users on Reddit and other sites that have been used for AI training. Large-language models vacuum massive amounts of data as part of their training, and Reddit users could have their every word added to ChatGPT and potentially exposed to the world. More guardrails are needed to ensure training models respect our privacy.
If we want AI tools to continue to grow in sophistication, then we have to accept that the models have to be trained on a large volume of high-quality data. Instead of a "wild-west" situation where designers scrape online data without authorization, a recent turn towards licensing deals will ensure that training data is managed in a conscientious way that is fair to all parties involved.