Tuesday, June 23, 2026
Home TechnologyIn the Weights launches leaderboard measuring which AI models recall people

In the Weights launches leaderboard measuring which AI models recall people

by Kim Stewart
0 comments
In the Weights launches leaderboard measuring which AI models recall people

In the Weights: New Site Scores How Well AI Models Remember People

In the Weights measures how AI models recall individuals, querying systems like GPT, Gemini and Grok to assign a “strength” score that ranks which names live in model parameters.

In the Weights launched as a public experiment to test a simple question: how well do large language models remember a person without using web search or external tools? The site, created by Thomas Dimson and Joey Flynn, probes multiple models and aggregates their responses into a single strength score. The result is a searchable leaderboard and profile pages that purport to show who is encoded in the numerical “weights” of AI systems.

Founders and the idea behind In the Weights

Dimson and Flynn developed the project after leaving OpenAI following the acquisition of their design studio Global Illumination. They said they wanted a small, playful experiment to explore how people are represented inside modern models rather than on the open web. The founders describe “being in the weights” as having your existence reflected in the statistical patterns these systems learned during training.

How In the Weights measures AI recall

The site sends name-based queries to a range of models, including popular and lesser-known systems, and requests up to 10 short results for each name. Responses are clustered by similarity and assigned a composite strength score intended to reflect how consistently a model can produce identifiable information about a person. The methodology focuses on outputs produced without the model using search tools, isolating what is stored in the model’s parameters rather than what it can fetch in real time.

The project’s designers say the “weights” term refers to the numerical parameters that shape model behavior, and the metric aims to capture whether those parameters encode recognizably human identities. That framing highlights a conceptual shift: as more information surfaces through chatbots and LLM-driven interfaces, being retrievable from a model’s internal memory has a different cultural meaning than appearing in search-engine results.

Leaderboard highlights and notable scores

Early results have produced surprising placements and a shifting leaderboard that invites direct comparison across names. For example, a tech writer sampled by the site received a score of 641, which the platform reports as placing the name in roughly the top 6 percent of entries. At the top of the board, the site lists Macaulay Culkin and famed opera singer Luciano Pavarotti with near-identical, higher scores, underscoring how public figures dominate recall.

Profiles show which models produced which descriptions and allow side-by-side inspection of outputs across systems. That transparency lets observers see when multiple models converge on the same basic biographical details and when scores rise or fall as the site re-queries models and updates clusters.

Model disagreements and hallucination flags

A notable part of the project is its effort to flag potential hallucinations and ambiguous answers from specific models. In one instance included on the platform, a variant of GPT produced an answer describing a name as “ambiguous,” suggesting multiple individuals could match the query rather than confidently identifying a single person. The site highlights such discrepancies to signal where a model’s internal representation is weak or uncertain.

Those differences illuminate a core limitation of the approach: models trained on distinct datasets, with different cutoffs and architectures, will remember and prioritize information differently. Aggregating outputs into a single score simplifies that variation, so the metric is best read as a relative indicator rather than a precise measure of factual prominence.

Reception, criticism and visual identity

Public reaction to In the Weights has been lively, with founders reporting an unexpectedly high level of interest that exceeded their expectations for a lighthearted experiment. Some commentators dismissed the work as merely asking multiple chatbots the same question and pooling answers, while others found the concept both intriguing and unnerving. The site’s retro, Nintendo-inspired visual design has been singled out as a playful foil to the seriousness of measuring digital remembrance.

Dimson and Flynn acknowledged that the project was partly about sparking conversation, and that reception so far has shown there is appetite for tools that expose how models encode human lives. Critics also noted that presenting scores and rankings invites competitive comparisons that may not reflect real-world influence or accuracy.

Implications for search, bias and digital legacy

In the Weights underscores a broader transition in how people encounter information: as conversational AI increasingly mediates discovery, what a model “remembers” can shape public perception. That shift raises questions about bias and representation, since underrepresented individuals may be absent or mischaracterized inside model weights. The creators said they plan to probe which models favor certain kinds of people and which notable figures appear absent despite public significance.

Beyond tests of memory, the project points toward new debates over digital legacy and the ethics of algorithmic remembrance. Tools that reduce personal histories to scores risk misinterpreting absence as insignificance, and researchers say any automated measure should be paired with careful analysis of training data, demographic skew and the potential for harm.

The In the Weights experiment is ongoing, and the founders say their next steps include deeper analysis of model-by-model differences, bias patterns and cases where a public figure “should” have a formal encyclopedic entry but does not. As conversational systems play a bigger role in how people are discovered, projects like this aim to make the invisible architecture of recall more visible and subject to scrutiny.

You may also like

Leave a Comment

The Calgary Tribune
The voice of Alberta to the world