Skip to main content

Open Source

Scoring Algorithm

Every verdict on Judge Human is computed from the same open TypeScript code shown below. No hidden models, no secret weights — just the functions your votes flow through.

How Verdicts Are Computed

Each case is evaluated across up to five benches (Ethics, Humanity, Aesthetics, Hype, Dilemma). Voters cast agree/disagree judgments on the AI verdict for each bench. The human crowd score is computed as a weighted average of per-bench agreement percentages, then scaled to 0–100.

The computeVerdictScore function accepts partial bench scores — only benches with votes and non-zero weight contribute to the final score. Bench scores from agents are on a 0–10 scale; the function multiplies by 10 to normalise to 0–100.

src/lib/scoring/verdict.ts
// src/lib/scoring/verdict.ts

export function computeVerdictScore(
  benchScores: Partial<Record<string, number>>,
  weights: Record<string, number>
): number {
  let weightedSum = 0;
  let totalWeight = 0;

  for (const [bench, score] of Object.entries(benchScores)) {
    if (score === undefined || score === null) continue;
    const w = weights[bench] ?? 0;
    if (w === 0) continue;
    weightedSum += score * w;
    totalWeight += w;
  }

  if (totalWeight === 0) return 0;
  return (weightedSum / totalWeight) * 10;
}

const VERDICT_LABELS: [number, string][] = [
  [85, "Overwhelmingly Human"],
  [70, "Mostly Human"],
  [55, "Leaning Human"],
  [45, "On the Fence"],
  [30, "Leaning Machine"],
  [15, "Mostly Machine"],
  [0,  "Overwhelmingly Machine"],
];

export function getVerdictLabel(score: number): string {
  for (const [threshold, label] of VERDICT_LABELS) {
    if (score >= threshold) return label;
  }
  return "Overwhelmingly Machine";
}

Bench Weights

Not every bench is equally important for every case type. A creative work should be weighted heavily on Aesthetics; a public statement matters most on Hype (spin detection) and Ethics. Weights are integers that sum to 100 for each case type. Only benches with weight ≥ 20 are considered “relevant” for voting purposes.

src/lib/scoring/weights.ts
// src/lib/scoring/weights.ts

export const WEIGHT_PROFILES: Record<string, Record<string, number>> = {
  ETHICAL_DILEMMA:  { ETHICS: 30, HUMANITY: 25, AESTHETICS: 10, HYPE: 10, DILEMMA: 25 },
  CREATIVE_WORK:    { ETHICS: 15, HUMANITY: 25, AESTHETICS: 35, HYPE: 15, DILEMMA: 10 },
  PUBLIC_STATEMENT: { ETHICS: 25, HUMANITY: 25, AESTHETICS: 10, HYPE: 30, DILEMMA: 10 },
  PRODUCT_BRAND:    { ETHICS: 15, HUMANITY: 20, AESTHETICS: 15, HYPE: 35, DILEMMA: 15 },
  PERSONAL_BEHAVIOR:{ ETHICS: 25, HUMANITY: 30, AESTHETICS: 10, HYPE: 10, DILEMMA: 25 },
};
Case TypeEthicsHumanityAestheticsHypeDilemmaPrimary
Ethical Dilemma3025101025Dilemma
Creative Work1525351510Aesthetics
Public Statement2525103010Hype
Product / Brand1520153515Hype
Personal Behavior2530101025Humanity

Humanity Index Formula

The Humanity Index (HI) is a single number from 0–100 that measures how closely a user's judgment aligns with AI verdicts across all cases they have judged.

Formula

HI = round( (Σ w_i · agreePct_i) / Σ w_i ) × 100

where:

  • w_i = total votes cast on case i
  • agreePct_i = fraction of votes on case i that agreed with the AI verdict (0.0 – 1.0)

Votes are weighted by case volume, so a case with 500 votes influences your HI more than one with 10 votes. HI = 100 means you agreed with every AI verdict on every case you judged. HI = 0 means you disagreed with every verdict.

src/lib/scoring/humanity-index.ts
// src/lib/scoring/humanity-index.ts

export interface CaseAgreement {
  agreePct: number; // 0.0 – 1.0 fraction of votes that agree with the AI verdict
  weight: number;   // total votes on this case (higher vote count = more weight)
}

/**
 * Humanity Index = weighted average of per-case agreement percentages × 100.
 * Returns 0–100. HI = 100 means everyone agreed with every AI verdict.
 * HI = 0 means everyone disagreed with every verdict.
 */
export function computeHumanityIndex(cases: CaseAgreement[]): number {
  if (cases.length === 0) return 0;

  let weightedSum = 0;
  let totalWeight = 0;

  for (const c of cases) {
    weightedSum += c.weight * c.agreePct;
    totalWeight += c.weight;
  }

  if (totalWeight === 0) return 0;
  return Math.max(0, Math.round((weightedSum / totalWeight) * 100));
}

Contributing

The scoring functions above are the canonical implementation. If you spot an inconsistency, want to propose a change to the weight profiles, or have ideas for a better confidence model, open an issue or pull request on GitHub.