Why Most Protests Fail

Why Most Protests Fail

Audited Protest Outcomes

Many protest movements create an opening. Far fewer make that opening last. The deeper question is whether the regime stays coordinated long enough to outlast the moment.

Protest pressure collides with regime structure. When barriers are low, the structure fractures and light bleeds through. When barriers stack, the wave dissipates against the surface.

Prologue

Iran is where the question began, because the desire for change was visible while the regime still held the state.

I started with Iran. The current war pushed an older question back into view. In January 2026, the Islamic Republic killed tens of thousands of its own citizens in the streets. The international response that followed — American and Israeli strikes against the regime’s military infrastructure — arrived from outside. The structural question this article tries to answer arrived earlier, and from inside: what decides whether protest pressure can turn into lasting political change, and what happens when the decisive barriers are all in place at once?

Iran appears twice in the historical record behind this piece — 66 protest movements across — 57 countries, from Solidarity to Tiananmen to the Arab Spring. In 1978–79, the revolution succeeded in toppling the Shah partly because the military split and key elites defected — only one of the three barriers was in place. But the democratic goal failed; the theocracy that replaced the monarchy was not the outcome most protesters had demanded. In 2009, the Green Movement faced a fully closed regime — all three barriers in place — and failed outright. Both times, the structural profile told the story before the ending did.

The current situation fits the most closed profile: the security forces remain cohesive, no meaningful elites have broken away, and repression is lethal. In this record, no movement facing all three barriers has achieved lasting change — not once in nine such cases, across 47 years. Two of those nine had significant international support for protesters: Venezuela and Syria. Both still failed. The record suggests that outside pressure — sanctions, strikes, diplomatic isolation — can amplify, constrain, punish, expose. What it cannot do is fracture a security apparatus from outside. And the data says that fracture is the only thing that has ever mattered.

This essay begins with Iran, then moves outward. The deeper lesson is broader. Many protest movements create an opening. Far fewer make that opening last.

Three-Part Structure

Read in three parts

Part I

How the moment can mislead

The Event Window

Immediate outcomes register pressure at its peak. They do not yet register endurance.

Protests are often judged too early. A ruler falls. A concession is announced. An election is promised. Crowds remain in the square, and the political atmosphere begins to look transformed. In that first phase, change can feel larger than it later turns out to be.

The record tracks that gap. It looks at two outcomes for each case: the immediate result, and the durable one — what remained three to five years later, after the regime had time to regroup. Of the 56 cases old enough to compare both, 20 got worse. Fourteen moved from immediate success to durable partial success. Six moved from partial success to durable failure.

The Reversal Problem

Most durable setbacks look like erosion. The opening was real. It simply held less than it first seemed to promise.

Most reversals look like attrition. A movement wins enough to make history feel open, then loses part of that gain as the regime regroups, institutions absorb the shock, or the opposition fails to turn the moment into a new settlement. The process is slower than dramatic collapse. It is also much more common.

Figure 1

Most reversals are step-down erosions.

Success Partial Failure 0 4 7 10 14 14 Success to partial 6 Partial to failure Changed cases in the baseline: 14 success-to-partial, 6 partial-to-failure.
Early gains are often real. The durable record shows how frequently those gains contract once the regime has time to respond.

That is why immediate outcomes can mislead. They show pressure and visibility. They do not yet show durability. They tell us that the regime had to respond. They do not yet tell us how much of that response will still matter two or three years later.

Part II

What usually decides the outcome

The Barrier Threshold

Three structural barriers stack against protest movements. When all three are present, no movement in the record has achieved durable change.

Once the question becomes whether change lasts, a much sharper structure comes into view. In this record, the pattern is sharp. When none of the three barriers are present, no movement has failed — zero out of twenty-two. When one or two are in place, roughly a quarter to a third end in lasting failure. When all three hold, every movement has failed — nine out of nine, with no exceptions (95% CI: 66–100%).

But looking at how the three barriers work together reveals something the simple stacking story does not. The three barriers do not simply add up in a neat, even way. The loyalty of the security forces is the regime’s load-bearing wall. On its own, it explains more of the outcome than the other two barriers do (pseudo-R² = 0.244 vs 0.370 for the full score), though not by as much as a first reading might imply. Elite defections and repression still matter, but they matter less on their own. When the three barriers are examined separately and in every combination, the barriers stack roughly in proportion — there is no hidden amplification when two or three coincide (interaction test p = 0.35). They add up, and one of them carries the most weight.

Figure 2

Failure rates climb from zero to certainty.

STACKED BARRIER ZONE 0% 25% 50% 75% 100% Barrier 0 N = 22 0.0% Barrier 1 N = 19 26.3% Barrier 2 N = 16 31.2% Barrier 3 N = 9 100.0% ZERO FAILURES NO EXCEPTIONS The sharpest acceleration occurs from one to two barriers. Durable failure rates: 0.0% (BS=0), 26.3% (BS=1), 31.2% (BS=2), 100.0% (BS=3). N = 66 cases, 1978–2025.
Durable failure accelerates as barriers stack. Logistic regression: β = 1.83 (MLE), OR = 6.23, p < 0.001, pseudo-R² = 0.370. Firth-corrected: β = 1.70, OR = 5.45, p < 0.001.
The stacking effect
0 / 229 / 9

When the regime cracks, the movement always wins something. When all barriers hold, it never does.

When security forces fracture, the failure rate drops sharply. When the security forces stay loyal and elites remain unified, the failure rate exceeds 50 percent even without mass killing. Security-force loyalty appears to matter more than either repression or elite defections (OR = 13.9 vs 7.3 and 5.7), even if the sample is too small to say that with complete statistical confidence.

Barrier 00.0%Zero failures in 22 cases.
Barrier 125.0%Risk becomes visible.
Barrier 233.3%One in three fail.
Barrier 3100%No exceptions. 9 cases.
0% → 100% Stability test: the result holds even when any single case is removed from the record (coefficient range 1.79–1.98 across all 66 tests). No single case drives the finding.
Technical note on the threshold

Logistic regression on the barrier score: MLE β = 1.83, odds ratio = 6.23, p < 0.001, McFadden pseudo-R² = 0.370. Firth penalized regression (correcting for quasi-separation at barrier score 0 and barrier score 3): β = 1.70, OR = 5.45, p < 0.001. With V-Dem controls (polyarchy, GDP, urbanization): pseudo-R² = 0.427, barrier score still p < 0.001 while no control reaches significance. Temporal split: pre-2000 β = 1.42 (p = 0.053), post-2000 β = 2.43 (p = 0.002). DV sensitivity: when recoded as non-success vs. success, pseudo-R² = 0.182 — the framework predicts failure far more sharply than it predicts success. Wald tests for coefficient equality between the three barrier components cannot reject the null (SFL vs ED: p = 0.97; SFL vs RL: p = 0.47); the “load-bearing wall” claim for SFL is descriptively supported (50pp failure-rate swing vs 39pp for RL and 35pp for ED) but not formally provable at N = 66 with quasi-separation.

The Closed Regime Bloc

The strongest dividing line runs through the security apparatus. Whether the men with guns hold or fracture determines the outcome more than anything the street can do.

Security force loyalty alone produces a 50-percentage-point swing in failure rates: 60% when forces remain cohesive, 10% when they fracture. By this measure, security-force loyalty is a much stronger indicator of failure (OR = 13.9) than either lethal repression (OR = 7.3) or elite defections (OR = 5.7). It is the strongest single predictor.

The case-by-case pattern is sharper still. When security forces are loyal and elites haven’t defected, the failure rate is 40% even without mass killing — Hong Kong 2014, Russia 2011–12, Thailand 2020, Turkey 2013. The regime does not need to massacre. It needs only to hold together. But when security forces fracture, no movement at that profile has experienced durable failure — though outcomes are often partial rather than full success (Mali 1991, Thailand 1992, Kyrgyzstan 2010, Bangladesh 1990). Once the coercive apparatus cracks, the death toll stops deciding the outcome.

A comparison with the best-known finding in this area reinforces the point. Chenoweth and Stephan (2011) showed that nonviolent campaigns exceeding 3.5% of the national population in active participation almost always succeeded. Across these 66 cases, with independently verified participation estimates, that threshold does not hold. Movements above 3.5% fail at 16.7%; those below fail at 31.5%. If anything, the relationship runs in the opposite direction.

The 3.5% threshold does not predict outcomes here

The size of a protest and the barrier score barely move together. Even after accounting for movement size, the barrier score effect becomes stronger, not weaker (increases 19%). The 3.5% threshold that anchors the Chenoweth finding produces failure rates of 16.7% above versus 31.5% below. Iran’s 1978 revolution mobilized 16% of the population and succeeded — because the military split. Iran’s 2009 Green Movement mobilized roughly 3% and failed — because it didn’t. Poland’s Solidarity enrolled 28% of the population and succeeded — with none of the three barriers in place. Bahrain mobilized 17% and failed — with all three barriers in place. The number matters less than the structure.

This does not refute the Chenoweth thesis. It complicates it in a specific way. Chenoweth and Stephan studied a broader universe of campaigns (323 cases, 1900–2006) that included armed conflicts and movements with different goals. Their 3.5 percent rule may capture something real about a broader set of campaigns — including armed conflicts and movements with different goals — that this narrower record cannot. What this record shows is that among large protest movements against autocratic regimes, the decisive question is not how many people are in the street. It is whether the structure across from them cracks. The barrier score is not just another way of measuring how many people a movement can bring out. The two things are largely independent (orthogonal), and the barriers matter in much the same way no matter how large the movement is.

Named individuals still matter. Gen. Rachid Ammar refused to fire in Tunisia — the regime fell. SCAF told Mubarak to go in Egypt — then took power themselves. Zhao Ziyang spoke for the students at Tiananmen — alone, the regime survived. But the deeper pattern is institutional, not personal. Whether one general or a hundred soldiers break, the question is the same: does the security apparatus continue to function as a single instrument of the state?

Figure 3

Elite breaks reshape the outcome distribution.

Success Partial Failure 0 10 20 30 40 7 6 13 26 No elite break 21 13 6 40 Elite break No-elite-break (ED=0): 7 successes, 6 partials, 13 failures. Elite-break (ED=1): 21 successes, 13 partials, 6 failures.
Where the ruling bloc stays closed, failure dominates. Where meaningful breaks appear, durable success becomes the most common outcome.

A regime under pressure can still survive while it acts as a unified political machine. Protest pressure becomes something else once ministers, judges, generals, business allies, or coalition partners begin to calculate that distance from the center is safer than loyalty to it.

Technical note on regime closure

Inter-coder reliability: SFL κ = 0.220, ED κ = 0.476, RL κ = 0.468, Outcome κ = 0.654. All 66 cases independently re-coded from primary sources. The dataset was independently re-coded by a second analyst, and every disagreement was reviewed and resolved. Five repression-level values were corrected in a second pass to ensure consistency with the stated coding rules.

The Death Toll Does Not Decide It

Tunisia: 132 killed, success. Myanmar 2021: 884 killed, failure. Repression intensity alone does not predict outcome.

Repression carries its own weight in the story. But what matters is whether the killing splits the regime or binds it more tightly together. Tunisia saw 132 deaths from state action and the regime fell — because Gen. Ammar refused to fire, because Ben Ali’s inner circle fractured. Myanmar 2021 saw 884 killed and failed — because the Tatmadaw held together, because no senior commander broke.

The scatter plot makes the pattern visible. At every level of death toll, you find successes and failures. What separates them is not how many people the regime killed, but whether the killing fractured or consolidated the regime’s internal structure.

Figure 4

Deaths from state action versus outcome.

No elite break Elite break 1 10 100 1,000 10,000 Deaths from state action (log scale) Success Partial Failure Syria Tiananmen Myanmar 2021 South Africa Iran 1979 Tunisia READ THIS CHART VERTICALLY At the same death toll, outcomes diverge. What separates them is elite cohesion, not body count. Successes cluster where elites broke (filled dots) Failures cluster where elites held (open circles) 44 cases with documented death tolls. Filled dots = elites broke with the regime. Open circles = no elite defections.
At every level of state violence, outcomes diverge. The key mediator is whether elites broke with the regime.

Repression works most effectively when it protects a regime that is already cohesive. A fragmented ruling bloc can turn a crackdown into a trigger for further splits. A cohesive ruling bloc can use the same crackdown to buy time, raise fear, and exhaust the movement.

The Post-2000 Hardening

Regimes learned. Over time, the barrier score became a sharper guide to what would happen. Protest is more frequent but less effective.

The temporal pattern is stark. Among movements facing two barriers before 2000, none ended in lasting failure. Post-2000, it jumped to 71%. The barriers did not change in kind — security force loyalty, elite cohesion, and repression are the same three mechanisms across the full period. What changed is that regimes became better at deploying all three simultaneously. Digital surveillance, smart repression, controlled pluralism, and the strategic study of other regimes’ mistakes produced a hardening effect visible in the data.

A closer look tells a more nuanced story than the before-and-after comparison alone. Before 2000, the pattern is visible but falls just short of conventional statistical confidence (β = 1.42, p = 0.053, N = 34). After 2000, it becomes roughly twice as strong and much harder to dismiss as coincidence (β = 2.43, p = 0.002, N = 32). The structural logic holds across both eras, but it bites harder now. Whether this happened because regimes learned, or because the ones that failed to adapt simply fell and disappeared from the record, is a question the evidence raises but cannot settle.

Figure 5

The post-2000 darkening.

OUTCOME Success Partial Failure BARRIER SCORE 0 1 2 3 1980 1990 2000 2010 2020 POST-2000 BS 0 BS 1 BS 2 BS 3 1989 wave: mostly green circles (low barriers, success) Post-2007: every BS=3 case is a black square (all three barriers, all fail) HOW TO READ THIS Pre-2000: green circles (low barriers, success). Post-2000: fuchsia and black squares proliferate (high barriers, failure). 66 cases, 1978–2025. Post-2000 BS=2 failure rate: 71% vs pre-2000 BS=2: 0%. Regimes learned.
Pre-2000, light circles dominate — success at low barrier scores, especially in the 1989 wave. Post-2000, dark squares proliferate. Regimes learned to keep their barriers stacked.

What International Help Can and Cannot Do

Outside pressure can amplify or constrain. Lasting change still depends on what happens inside the state.

This was one of the questions that first pulled me into the Iran case. Outside pressure matters. It still has narrower reach than many people hope when they are watching a movement bleed in real time.

The record offers a limited test of that question. Of the nine cases in which all three barriers were in place, two drew substantial outside support for protesters: Venezuela 2014–17 (US/EU sanctions, recognition of Guaidó), and Syria 2011 (Western and Gulf backing of the opposition). Both still failed. The FANB held in Caracas. The Syrian Arab Army, backed by Russian and Iranian forces, held in Damascus. In neither case did the external pressure fracture the security apparatus.

A caveat is necessary. Two of nine is a thin test. Nine cases is a thin sample — too few to rule out the possibility that outside support could matter in some such situations. What we have is a historical pattern, not a statistical law.

The broader record provides firmer ground. When we account for outside support on both sides, a striking asymmetry appears. Regime-side external support — Saudi troops in Bahrain, Russian backing of Assad and Lukashenko, Chinese shielding of the Myanmar junta — is strongly associated with failure (p = 0.031) and appears to roughly double the odds that a movement will not succeed. By contrast, outside support for protesters shows no clear effect (p = 0.90). Accounting for international support barely changes the underlying relationship between the barriers and the outcome.

The asymmetry that matters for Iran

External backing for the regime demonstrably increases the odds of protest failure. External backing for the opposition — sanctions, strikes, diplomatic isolation, material support — has not yet shown a measurable effect in this record (p = 0.899). The policy implication is direct: outside intervention can make a regime harder to dislodge, but the historical record contains no precedent for outside intervention fracturing a security apparatus that was already holding. Whether this reflects a genuine structural limit or simply the limits of the historical record is a question the evidence cannot settle.

This does not mean international support is useless. Sanctions can degrade military readiness over time. Diplomatic pressure can shift the calculation for fence-sitting elites. Military strikes can destroy infrastructure the regime depends on. What the data says is that none of these mechanisms have yet been sufficient, on their own, to flip the outcome when all three barriers are in place. The fracture has to come from inside. Outside pressure may create the conditions — but the decisive crack has always been internal.

INTERACTIVE

Explore all 66 cases

Click any case to see its barrier profile, death toll, and outcome.

Barrier Score
Outcome
Era
Success Partial Failure
Size = deaths (log)
Part III

What protest can still achieve

A protest does not need to fully transform the regime in order to matter.

A protest can force concessions, split elites later even when it fails to split them now, alter what becomes sayable in public life, expose the terms on which the regime survives, and leave a political memory that shapes what comes next. Partial success belongs in the story because it changes incentives and narrows the room for abuse.

That is one reason durable analysis matters. It distinguishes between lasting democratic change, partial gain, and openings that close almost completely. Those are different outcomes, and they deserve different language.

How to Read the Barrier Score

The barrier score describes a regime under stress. It does not predict the future — it diagnoses the present.

An honest reading of this framework requires acknowledging what it can and cannot do. The barrier score is drawn from the same episode whose outcome it describes. When a regime survives, we observe that its forces stayed loyal and its elites held the line. When it falls, we observe fractures. In a strict sense, the model is partly saying: regimes that held together held together. Cause and effect run both ways here, and no analysis of 66 historical cases can fully sort that out.

What the framework can do is something different: it works as a diagnostic. It tells you what to look at and where pressure might work. Think of the three barriers as the load-bearing walls of the regime’s survival structure. If you know which walls are intact and which are cracked, you know where to push.

When security forces are already fractured — officers refusing orders, units going quiet, soldiers looking the other way — the regime is structurally exposed. Street pressure, international isolation, even economic disruption can widen existing cracks. A regime split into factions is vulnerable to many kinds of pressure, because the factions are already calculating their own survival independently of the center.

When the regime is unified — all three barriers in place — the same tools accomplish much less. Sanctions can degrade capacity over time. Diplomatic isolation can shift calculations at the margin. But the historical record shows no case where outside pressure alone fractured a security apparatus that was already holding. The fracture, when it has come, has come from inside: a commander who refuses an order, a minister who breaks publicly, a coalition partner who calculates that distance from the center is safer than loyalty to it.

This is why some of the most consequential interventions in the record involved cultivating or enabling those internal breaks — not replacing them with external force. When a senior figure inside the security establishment is turned, the entire structural profile of the regime changes overnight. One major defection can turn a fully closed regime into one with only a single barrier still standing. The street does not need to do what an insider can do faster.

The practical implication is not that protest is useless against closed regimes. It is that protest alone, without a strategy aimed at the specific barriers in place, is insufficient. The better question is not how many people are in the street but which walls are weight-bearing and whether any of them can be moved from the inside.

One test of whether this framework captures something real: the patterns from the 44 pre-2010 cases were used to predict the 22 that came afterward — cases the analysis had never seen. The predictions matched with striking accuracy (AUC = 0.90). The structural profile of regimes before 2010 predicted what would happen to regimes after 2010 with high accuracy. The pattern held in later cases that were not part of the original analysis. It also held when repression was measured only by the death toll rather than by a broader assessment that includes arrests and military deployment (β = 1.89, p < 0.001). The finding does not depend on coding judgment calls.

INTERACTIVE

The barrier simulator

What does the model predict for a given combination of barriers?

Scenarios:
Security Force Loyalty
Forces split
Defect Split Loyal
Elite Defections
Elites defect
Defections None
Repression Level
Moderate repression
Low Moderate Lethal
International Support for Protesters
No significant support
None Significant
Barrier Score
0
Predicted Failure Rate
0%
Historical cases at this barrier score

Move the three sliders to build a regime profile and see what the historical record says about its chances of surviving protest pressure. The predicted failure rate is the observed rate from the record — how often movements facing this exact barrier profile have failed. The logistic model estimate below it is a smoothed prediction fitted across all 66 cases. When the two numbers diverge, it usually means the model is hedging against a small sample — at barrier score 3, for instance, the record says 100% failure, but only nine cases exist, so the model pulls the estimate down to 86%. The gap is a measure of how much certainty the data can support. The international support slider adds context but does not change the prediction — in this record, outside support for protesters has not independently affected outcomes (p = 0.899).

The Hard Lesson

The crowd is only one side of the confrontation. The other side is a political system under stress.

The durable endpoint tells a harder story than the moment of protest itself. Openings narrow. Some vanish quickly. Many movements that looked decisive at the time leave behind a smaller legacy than the street had promised. The central divide in the record runs between regimes that crack open and regimes that hold together long enough for the opening to die.

Why do most protests fail? They fail because the crowd is only one side of the confrontation. The other side is a political system under stress, and the decisive question is often whether that system still functions as a coordinated structure. When the security apparatus remains loyal, when elites do not defect, when repression raises the cost of persistence, and when the regime can survive the first shock without internal fracture, durable political change becomes much harder to secure.

Closing line

The drama of protest lives in the street. Its fate is often decided inside the regime.

Methodology

Data and methods

66 protest movements from 1978 to 2025, each reviewed against primary sources, with every judgment traceable to specific evidence. A companion document contains the full methodology, case evidence, and underlying data.

Case selection and universe

Cases were drawn from NAVCO (Chenoweth and Lewis 2013), the ICNC case archive, and the Social Conflict Analysis Database, supplemented by Freedom House annual reports and V-Dem Country Episodes. A case was included if it met three criteria: (1) sustained mobilization of tens of thousands over weeks or longer, (2) primary demand was regime change or fundamental political transformation, and (3) the state response involved deliberate political management. Armed insurgencies, civil wars, and movements in consolidated democracies were excluded. The 66 cases span 57 countries. Outcome distribution: 28 success, 19 partial, 19 failure.

Coding protocol

Security Force Loyalty (SFL): 0 = defected/refused orders, 1 = mixed/split, 2 = fully loyal. The threshold for SFL = 0 requires military units or named commanders publicly refusing to fire or switching sides prior to regime collapse. Individual desertion below ~5% of force strength is coded as SFL = 2.

Elite Defections (ED): 0 = no meaningful defections, 1 = at least one senior official (minister, general, governor, ambassador) publicly broke with the regime while the outcome was contested. Repositioning after the leader departed is excluded.

Repression Level (RL): 0 = minimal (<5 deaths), 1 = moderate (5–50 deaths, mass arrests, tear gas/rubber rounds), 2 = lethal (50+ deaths, live fire into crowds, military deployment). Peak repression during the active protest period is coded.

Barrier Score = (SFL = 2) + (ED = 0) + (RL = 2). Range 0–3. Equal weighting reflects the theoretical prior that the three barriers are approximately additive in their marginal contribution to failure. The interaction analysis (below) confirms: no interaction terms reach significance.

Two-stage coding and inter-coder reliability

All 66 cases were independently re-coded by an AI research agent operating under the same protocol, consulting only primary sources (HRW, Amnesty International, OHCHR, ICG, Reuters, AP, academic journals). Wikipedia was prohibited. Cases were stratified: Tier A (43 cases, full deep research), Tier B (13 cases, deep research), Tier C (10 cases, lighter pass). The re-coder identified 73 variable-level disagreements. Each was adjudicated by the primary researcher: 37 corrections accepted, 35 originals kept, 1 coding error caught (Syria 2011 outcome updated to reflect the Assad regime’s 2024 collapse).

Cohen’s κ: SFL = 0.220 (fair), ED = 0.476 (moderate), RL = 0.468 (moderate), Outcome = 0.654 (substantial). The low SFL κ reflects genuine construct ambiguity in cases involving military neutrality, gradual fracture, and post-departure power seizures — motivating the shift to observable indicators described below.

The use of an AI re-coder is a methodological choice that warrants transparency. AI agents apply decision rules with high consistency without fatigue or anchoring effects, but may exhibit correlated errors if they have internalized patterns from the same secondary literature. The Wikipedia decontamination protocol and source-verification procedures were designed to address this concern.

Death toll verification and observable indicators

22 death toll values were corrected through verification against at minimum two independent primary sources. Major corrections: China 1989 (10,000 → 1,000, Amnesty International), Belarus 2020 (50 → 4, HRW), Syria 2011 (75 → 3,934, UN/HRW), Iran 1978–79 (64 → 2,500, scholarly consensus), Myanmar 2021 (6,000 → 884, AAPP). Low, high, and best estimates recorded for all 66 cases with source URLs.

Arrest coverage improved from 48% to 91% (60 of 66 cases). Binary indicators extracted for each case: live ammunition use, military deployment, torture in custody, security force cohesion/split/refusal, elite defection type (cabinet-level, military commander, party split), and outcome indicators (leader removed, elections held, democracy persisted 5 years).

Five RL values were corrected in v2 to align with the stated protocol: Bahrain 2011 (1→2, 92 deaths), Iran 2009 (1→2, 72 deaths), Cameroon 1991 (1→2, 100 deaths), Bangladesh 1990 (1→2, 100 deaths), Nepal 1990 (1→2, 50 deaths), Chile 1983–88 (426 deaths, RL = 1) and Poland 1980–89 (100 deaths, RL = 1) were retained: the protocol codes peak episode intensity for long-duration movements, not cumulative totals across years. All Barrier Scores updated accordingly.

19 cases that originally cited Wikipedia were subjected to a decontamination protocol: each coded value was confirmed from a non-Wikipedia primary source. All 19 were confirmed with zero value changes.

Statistical models and interaction analysis

Primary model: Logistic regression on the barrier score: MLE β = 1.83, OR = 6.23, p < 0.001, McFadden pseudo-R² = 0.370, AIC = 53.9. Firth penalized regression (correcting for quasi-separation): β = 1.70, OR = 5.45, p < 0.001. With V-Dem controls (polyarchy, log GDP, urbanization): pseudo-R² = 0.427; barrier score p < 0.001; all controls non-significant (polyarchy p = 0.936, GDP p = 0.334, urban p = 0.657). Temporal split: pre-2000 β = 1.42 (p = 0.053, N = 34), post-2000 β = 2.43 (p = 0.002, N = 32). Both eras reach significance; the effect is roughly 65% stronger post-2000.

Omitted variable analysis: The barrier score coefficient is robust across all tested specifications. Adding region dummies (MENA significant at p = 0.025; barrier score β increases to 2.96), a post-2000 indicator (significant at p = 0.012; barrier score β = 2.01), a 1989-wave dummy (p = 0.075; barrier score β = 1.98), log(duration) (non-significant; barrier score β = 1.90), log(population) (p = 0.096; barrier score β = 2.01), or regime type (non-significant; barrier score β = 1.83) does not attenuate the barrier score — in every specification, BS remains significant at p < 0.01 and the coefficient increases rather than shrinks. In the kitchen-sink model (BS + region + post-2000 + 1989-wave + duration), barrier score β = 2.54 (p = 0.002, pseudo-R² = 0.618), while no control reaches significance. MENA is the most substantively important correlate: it remains significant alongside BS (p = 0.006) even after controlling for regime type (p = 0.004 in a three-variable model; regime type itself is non-significant at p = 0.292). The correlation between MENA and regime type is negligible (r = 0.10), confirming that MENA captures a regional effect — likely related to petro-state resources, GCC mutual defense dynamics, and stronger authoritarian learning networks — not a proxy for autocracy. MENA cases fail at higher rates than non-MENA cases at the same barrier score (barrier score 1: MENA 60%, non-MENA 13%), indicating an additive regional disadvantage. The BS × MENA interaction is non-significant, meaning the barrier mechanism operates the same way inside and outside the region. The post-2000 dummy captures temporal hardening independently of the barrier mechanism.

Interaction analysis: The barrier score was decomposed into three binary components (SFL = 2, ED = 0, RL = 2) and tested with all two-way and three-way interactions. Decomposition into components improves fit (LR test A vs B: p = 0.003), but adding interaction terms does not (two-way: p = 0.353). The barriers stack additively. SFL is the strongest single predictor: pseudo-R² = 0.244 (OR = 13.9), compared to ED = 0.118 (OR = 5.7) and RL = 0.151 (OR = 7.3). SFL alone produces a 50-percentage-point swing in failure rates (60% vs 10%). RL explanatory power increased after recoding: OR = 7.3, pseudo-R² = 0.151.

Alternative indices: PCA on binary observables, theory-weighted composites, and individual predictors entered simultaneously all underperform the simple barrier score. The additive, equal-weighted construction captures the covariance structure more efficiently than data-driven or theory-derived alternatives.

Ordered logit: Outcome coded as success/partial/failure. Pseudo-R² = 0.193.

International support: Adding international support for the regime (significant, p = 0.031) and for protesters (non-significant, p = 0.899) as controls. Barrier score coefficient stable at 2.14. Among 9 BS = 3 cases, 2 had significant international support for protesters (Venezuela 2017, Syria 2011); both still failed. N = 2 is insufficient for any statistical claim about international support at the highest barrier level.

Leave-one-out sensitivity: Barrier score coefficient ranges 1.79–1.98 across all 66 jackknife samples. All yield p < 0.05. No single case drives the result.

Temporal holdout: Model trained on pre-2010 cases (N = 44) and tested on post-2010 (N = 22). AUC = 0.902, Brier skill score = 0.40. The model generalizes strongly out of sample, though it underpredicts failure post-2010 (8/13 false negatives), consistent with the post-2000 hardening pattern.

Permutation test: 50,000 random permutations of the failure label produced zero results as extreme as the observed BS-failure association (p < 0.00001), confirming the relationship non-parametrically.

Construct validity: A model using only binary observable indicators (SFL cohesive, SFL split, SFL refused, ED any, live ammo, military) without ordinal coding achieves pseudo-R² = 0.210. The ordinal barrier score (R² = 0.370) adds substantial value, but the raw indicators alone carry significant signal, confirming the construct measures something real rather than recoding outcomes.

RL coding sensitivity: 22 of 66 cases have RL codes that reflect composite factors (mass arrests, military deployment, live fire) rather than death count alone. When RL is recoded strictly by the death-count thresholds (<5 = 0, 5–50 = 1, 50+ = 2), the barrier score coefficient is essentially unchanged (β = 1.89 vs 1.83, OR = 6.6 vs 6.2, p < 0.001). BS = 3 drops from 9 to 6 cases but remains 6/6 failures. The finding does not depend on the composite RL coding.

Participation size: Peak participation estimates coded for all 66 cases from NAVCO, Chenoweth & Stephan, news archives, and case study literature. Twenty-two participation estimates were corrected in total after independent verification against primary sources: Bahrain 2011 (300K → 150K; original figure was from a pro-government rally), Turkey 2013 (3.5M → 1M; original was cumulative across 5,000 events, not single-day), Poland 1980 (12M → 10M; scholarly consensus), Hong Kong 2014 (500K → 172K; HKU independent estimate), Eswatini 2021 (50K → 10K; sourced to Amnesty), Yemen 2011 (3M → 1M; verified Sana’a peak), Iran 2009 (3M → 2M; midpoint of TIME 2–3M range). Movement size and barrier score are nearly uncorrelated (r = −0.03), eliminating the concern that BS proxies for mobilization capacity. Controlling for log(participation), barrier score β increases from 1.83 to 2.13 (p < 0.001). Participation alone has no predictive power (p = 0.068); conditional on BS it is marginally significant (p = 0.068) with a positive sign. The Chenoweth 3.5% threshold produces virtually identical failure rates (16.7% above vs 31.5% below, p = 0.99). The BS × participation interaction is non-significant.

Limitations

Small N: 66 cases limits multivariate inference. With 19 failure events, the event-per-variable ratio falls below 10 for models with 3+ predictors. We address this by treating the single-predictor barrier score as the primary model and using leave-one-out validation. Quasi-complete separation at barrier score 0 (0/22 failures) and BS = 3 (9/9 failures) biases MLE upward; Firth penalized regression is reported alongside MLE as a correction. The 95% Clopper-Pearson CI for the BS = 3 failure rate is [66%, 100%].

AI re-coder: Correlated measurement error cannot be fully excluded. If the re-coder’s knowledge derives from the same secondary literature, inter-coder statistics may understate true error. Observable indicators and Wikipedia decontamination partially mitigate this.

Selection bias: Cases were drawn from established datasets and case study literatures rather than random sampling. Prominent or dramatic cases are likely overrepresented. Multiple sampling frames (NAVCO, ICNC, SCAD, Freedom House) reduce dependence on any single prior scholar’s choices.

Death toll uncertainty: Several cases carry wide confidence intervals. For all cases where the lower bound approaches the 50-death threshold, sensitivity analysis confirmed the RL code is robust across the full range.

Contemporaneous coding: SFL, ED, and RL are coded from the same protest episode whose outcome they are used to explain. The barrier score describes the regime’s structural profile during the crisis, not before it. This means the causal arrow is ambiguous: barriers may shape outcomes, or outcomes may shape how we observe and code barriers. The construct validity test (observable indicators alone achieve R² = 0.210) and the temporal holdout (AUC = 0.902 on post-2010 cases trained on pre-2010 data) provide partial reassurance that the pattern is not purely circular, but fully resolving the endogeneity concern would require pre-protest measures of regime cohesion that this dataset does not contain. The framework is best read as a structural diagnostic — describing the regime’s response profile and identifying which barriers hold — rather than as a causal model.


Explore the full dataset behind the analysis below 📊

Subscribe to EconScope by The Agora Review

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe