>Generative Data Refinement:Just Ask for Better Datahttps://arxiv.org/pdf/2509.08653 Google DeepMind using a /pol/ dataset to test detoxifying data. Basically they want to be able to sanitize any training data and remove anything they don't want in there. They'll re-write the data before training.