Data Labeling Is the Hot New Thing in AI

Data Labeling Is the Hot New Thing in AI

IEEE Spectrum - AI
Aug 1, 2025 13:00
Matthew S. Smith
1 views
airesearchieeetechnology

Summary

Meta’s $14.3 billion investment in Scale AI, a leader in data labeling, has sparked industry-wide concern as competitors like OpenAI and Google rush to end their contracts with Scale to protect their proprietary training methods. The move highlights the growing importance and complexity of high-quality data labeling in developing advanced AI models, as organizations recognize that better-labeled data is crucial for improving AI performance and efficiency.

Earlier this summer Meta made a US $14.3 billion bet on a company most people had never heard of before: Scale AI. The deal, which gave Meta a 49 percent stake, sent Meta’s competitors—including OpenAI and Google—scrambling to exit their contracts with Scale AI for fear it might give Meta insight into how they train and fine-tune their AI models. Scale AI is a leader in data labeling for AI models. It’s an industry that, at its core, does what it says on the tin. The most basic example can be found in the thumbs-up and thumbs-down icons you’ve likely seen if you’ve ever used ChatGPT. One labels a reply as positive; the other, negative. But as AI models grow, both in model size and popularity, this seemingly simple task has grown into a beast every organization looking to train or tune a model must manage. “The vast majority of compute is used on pre-training data that’s of poor quality,” says Sara Hooker, a vice president of research at Cohere Labs. “We need to mitigate that, to improv