A major AI training data set contains millions of examples of personal data
Summary
New research reveals that the DataComp CommonPool, a major open-source AI image training set, contains millions of images with sensitive personal data such as passports, credit cards, and birth certificates. This discovery raises serious privacy concerns and highlights the urgent need for stricter data vetting and ethical standards in AI training practices.