Campbell Brown’s Forum AI mobilizes experts to audit foundation models on high-stakes topics
Campbell Brown’s Forum AI evaluates foundation models on high-stakes topics using expert benchmarks and AI judges to improve accuracy and reduce systemic bias.
Campbell Brown, the former Facebook news chief, has launched Forum AI to test and improve how large language models handle complex, high-stakes subjects. Forum AI recruits top domain experts to design task-specific benchmarks and then trains so-called AI judges to score model outputs at scale. The effort responds to mounting concern that foundation models produce confident but flawed answers on geopolitics, finance, mental health and hiring.
Forum AI recruits leading experts to design benchmarks
Forum AI’s approach centers on assembling authoritative figures to craft realistic evaluation scenarios. For its geopolitics work the company has drawn on voices across the political and policy spectrum, including Niall Ferguson, Fareed Zakaria, former Secretary of State Tony Blinken, former House Speaker Kevin McCarthy, and Anne Neuberger.
Those experts do more than advise; they build the test cases and define what constitutes a responsible, well-contextualized response. The goal is to capture nuance and edge cases that generic benchmarks miss and to ensure the evaluations reflect real-world risks and trade-offs.
AI judges aim to reach consensus with human authorities
Instead of relying solely on human raters, Forum AI trains machine “judges” to scale the expert criteria. The company measures success by how often its AI judges agree with subject-matter authorities, targeting roughly 90% consensus with the human experts.
Training AI to evaluate other AI serves two purposes: it accelerates assessment across large model outputs and it creates a repeatable metric that enterprise customers can use for compliance. Forum AI says that, in early work, it has achieved high levels of agreement between its automated judges and the expert panels.
Early evaluations reveal bias and contextual failures
Forum AI’s audits of major foundation models have highlighted systematic problems beyond outright factual errors. Brown has pointed to political slant in many models and specific misattributions, such as reliance on state-affiliated sources for unrelated topics, as evidence of errant data sourcing and weighting.
Subtler shortcomings include missing context, inadequate representation of diverse perspectives, and straw-manning complex positions rather than acknowledging nuance. Forum AI argues these failures are often correctable through targeted data curation and improved evaluation standards.
Lessons from social platforms shape Forum AI’s mission
Brown’s work at Facebook and Meta informs her urgency to address model failures now. She has described watching platform optimization reward engagement over accuracy and later witnessing the societal costs of that dynamic.
Forum AI’s strategy reflects a belief that building accuracy-oriented incentives into AI design is essential. Brown says enterprise users who face legal and financial liability for model-driven decisions may provide the pragmatic demand needed to shift incentives away from click-driven optimization.
Enterprise demand offers a commercial path but revenue challenges remain
Forum AI is positioning its services toward businesses that use AI for lending, hiring, insurance and other regulated activities. Those organizations have a clear interest in robust evaluation because model errors can produce legal exposure and reputational damage.
Converting regulatory concern into steady revenue is not straightforward. Many customers today accept checkbox audits and standardized tests; Forum AI contends that meaningful evaluation requires time-consuming domain expertise and bespoke scenarios that go beyond existing compliance checklists.
Regulators, audits and the limits of standardized reviews
Brown has criticized the current audit landscape as insufficient, noting instances where formal audits failed to detect violations in hiring algorithms. She argues that oversight that lacks deep domain knowledge will miss critical edge cases and produce false confidence.
Forum AI emphasizes tests built by specialists who understand where models can fail in practice. The company’s backers include a $3 million seed round led by Lerer Hippeau, a signal that some investors see value in more rigorous, expert-led model validation.
The firm’s focus on high-stakes subjects reflects wider concerns among policymakers and civil society about AI’s societal impacts. Forum AI hopes its methods will inform both private sector practices and public regulatory expectations.
Campbell Brown frames Forum AI as an attempt to prevent history from repeating: where a platform-optimized ecosystem once degraded public information, the next wave of AI should be steered toward reliability. The company’s combination of expert benchmarks and automated judges aims to provide the tools organizations and regulators will need to demand better performance from foundation models.
Forum AI’s work highlights a broader question for the industry: whether accuracy and nuance can be made central design goals rather than afterthoughts. If enterprises and regulators increasingly require domain-specific evaluation, model developers may face stronger incentives to reduce bias and error in areas where mistakes carry real-world harm.