>>62165152Is this a 14-page paper about submitting images of a cat tagged as "portrait of a dog" to datasets?
It's pretty stupid because by now everyone should stop relying on image-text pairs anyway. The datasets should only be made up of images with text descriptions either being manually created by the one who plans to work with it or use another AI to tag them instead of relying on pre-existing ALT text.
Even if they planned to add some magical pixels that alter a dataset(which this method doesn't seem to do) the solution is easy. Just convert all images to jpeg before training to get rid of the alpha channel.