
LLM Benchmarking Shows Capabilities Doubling Every 7 Months
Summary
A new benchmarking approach from Berkeley’s METR think tank measures LLM progress by comparing model performance to human completion times across tasks of varying complexity. Their findings show that LLM capabilities are doubling roughly every seven months, highlighting exponential improvement and underscoring the need for updated evaluation methods as AI rapidly advances. This rapid progress has significant implications for both the development and oversight of AI systems.