Testing better online behavior is harder than it should be
What 31 researchers learned about the barriers — and fixes — to testing prosocial tech.
Everyone agrees we need better online spaces — but what actually works to make them more prosocial?
That’s the deceptively simple question at the heart of a new paper: “Independently testing prosocial interventions: Methods and recommendations from 31 researchers.” Despite years of concern over online harms, public research into what improves online behavior remains rare. Not because people aren’t trying — but because nearly everything is stacked against them.
Why we know so little
Most experimental testing happens inside platforms, where results remain proprietary or selectively disclosed. Meanwhile, independent researchers tend to focus on identifying harm — it’s faster, cheaper, and often easier to study than fixes.
But the biggest barrier is one of method: if you’re not a platform, conducting ecologically valid research — that is, studying behavior in realistic settings — is incredibly difficult.
“Only a handful of experiments have been conducted outside industry settings in ways that meet all the criteria for ecological validity.” — Grüning et al., 2025
Ecological validity means more than asking people what they’d do — it means observing what they actually do while navigating a feed, reacting to posts, or engaging with others.
Prosocial design research methods
When researchers can’t access platforms, they rely on workarounds. The paper outlines three main routes to studying prosocial design in the wild:
Browser extensions, like those used in the Prosocial Ranking Challenge (PRC), can alter what users see on real platforms. PRC tested whether alternative ranking models could reduce toxicity — with randomized conditions, behavior tracking, and post-study surveys. It was rigorous, impactful, and extremely resource-intensive.
Simulated social media environments, like Truman or Social Media TestDrive, are emerging as a more scalable option. These platforms let researchers test interventions in dynamic, controlled environments that feel realistic. They may not capture every nuance of real-world interaction, but they enable large-scale behavioral experiments with fewer constraints.
Observational studies using public platform data help researchers study how users respond to existing features. But shrinking data access and lack of causality limit their impact.
“Every method we explored represents a trade-off between ecological validity and feasibility.” — Grüning et al., 2025
Two other methods, though not emphasized in the paper, offer promise:
Sock puppet research uses simulated accounts to test how platforms treat different behaviors — including prosocial ones like bridging engagement or non-toxic dialogue. While often used to expose harms, it holds potential for positive testing, too.
Crowdsourced testing through tools like Mechanical Turk helps researchers test mock-ups or prototype designs. These methods lack ecological validity but allow for rapid design iteration and early-stage feedback.
Studying algorithms while protecting privacy
Privacy-enhancing technologies (PETs) such as OpenMined’s PySyft (for secure, remote analysis) and OpenDP (for differential privacy), merit further experimentation. Piloted via the Christchurch Call Initiative on Algorithmic Outcomes (CCIAO), this method enabled researchers to audit platform algorithms while ensuring privacy of user data.
This PET-based “structured transparency” model opens the door to studying not just harms, but positive changes. For example, it could also measure what happens when platforms change ranking to promote more civil, diverse, or bridging content. If middleware lets users choose different recommendation logics, PETs could help us evaluate the effects — safely, scalably, and ethically.
Hope for DSA’s Article 40
These workarounds are all the more important given the declining transparency by most major platforms, from Meta closing Crowdtangle to X cutting off academic access in 2023. It’s why all eyes are on Europe’s Digital Services Act (DSA) and its Article 40, which establishes formal access rights for vetted researchers:
Article 40(12) mandates access to public data.
Article 40(4), launching July 2025, expands access to non-public data under structured conditions.
Whether Article 40 reshapes the research landscape depends on how it's implemented.
Sharing & advancing collectively
Infrastructure is starting to catch up. The Prosocial Design Network (PDN), in partnership with Jigsaw, launched a working group to open-source tools for browser-extension research. Their Awesome Web Research Tools repository compiles methods, designs, and reusable software.
PDN also hosts a curated library of interventions, ranked by evidence strength, and a Slack community of 400+ people — creating the connective tissue needed for repeatable, scalable prosocial research.
Despite real progress in tools and networks, the imbalance remains clear: huge investments are made to study the harms of tech — and now AI — alongside strong support for regulation and litigation.
But there’s still surprisingly little support for testing what drives cohesion, reduces polarization, or fosters healthier digital spaces.
The Council on Tech and Social Cohesion’s Blueprint for Prosocial Tech Design Governance, authored by Dr. Lisa Schirch, highlights the imperative for public funders, platforms, and policymakers to rebalance their focus — by supporting safe, independent experimentation into what makes digital life better.
There’s no shortage of good ideas about how to improve social media. But testing them — seeing what really changes behavior at scale — requires access, infrastructure, and funding that still doesn’t match the urgency of the challenge.
If we want online spaces that foster connection and cohesion rather than division and harm, we need to make public-interest research on prosocial design far easier to do — and far harder to ignore.
Lena Slachmuijlder is Senior Advisor at Search for Common Ground and Co-Chairs the Council on Tech and Social Cohesion.

