Hands comparing answer quality samples on a public evaluation board.

Evaluation commons

Review habits for public-facing model use.

Evaluation in civic settings is more than a leaderboard. LLMspedia.org emphasizes small, repeatable review practices that help teams notice unsupported claims, missing context, biased framing, inaccessible language, privacy risks, and failures of disclosure before model output reaches the public.

Questions every review sheet should ask

Can a reviewer identify which claim depends on which source?

Does the answer mark uncertainty instead of smoothing it away?

Would a non-specialist understand the limits of the system from the wording?

Is there a clear point where a human decision maker must verify, revise, or decline the output?