We evaluate, score, and certify AI systems for reliability, transparency, and trust.
Aligned with ISO/IEC 23894 • Third-party Independent • SOC2-friendly Process • Verification Badge Included • Re-evaluation Available
Aligned with ISO/IEC 23894 • Third-party Independent • SOC2-friendly Process • Verification Badge Included • Re-evaluation Available
Trust-first
We certify for safety, privacy & fairness — not just accuracy.
Audit-ready
Clear evidence and methodology for investors & regulators.
Built to evolve
Re-evaluation pathways as your model and data change.
Our Duty
Technical and ethical testing across five key pillars.
- Evaluate — We perform rigorous tests across reliability, transparency, fairness, privacy, and governance.
- Rate — We issue a weighted 100-point score and a letter grade (AAA–B).
- Certify — You receive an official AIGRADE Trust Report and a verifiable digital badge.
- Re-evaluate — Optional periodic reviews to maintain certification as your AI evolves.
- Support — Clear guidance for continuous improvement toward higher trust.
Evaluation Example — Fintech Risk Model, Production-Ready
We analyzed a credit-risk model for robustness, fairness, and privacy. The goal was to reduce harmful errors and improve explainability.
Scope
- Evaluated decision outputs across ~1,000 scenarios and edge prompts.
- Assessed fairness, privacy, and safety compliance under degraded contexts.
Methods
- Stress & slice testing (bias, privacy, explainability metrics).
- Prompt hardening and moderation robustness evaluation.
- Red-team simulation of bypass attempts and data leakage.
Inputs Reviewed
- Model decision outputs and behavioral edge cases.
- Evidence coverage: ≥85% (high confidence level).
-41%
Hallucinations vs. baseline
+19%
Explainability score
A
Final grade
| Area | Before | After (controls applied) | Controls |
|---|---|---|---|
| Robustness | Frequent drift in edge prompts | Stable under degraded contexts | Prompt hardening, context limits |
| Safety | Moderation bypass via phrasing | Bypass attempts blocked | Dual-layer moderation filters |
| Fairness | Small but consistent group skew | Skew reduced below threshold | Slice testing, calibrated mitigations |
| Privacy | Intermittent PII leakage | No PII in outputs | Pre-output masking, tokenized fields |
| Transparency | Sparse explanations | Human-readable rationales + logs | Rationale templates, traceable outputs |
How We Grade
We compute an overall grade across five pillars: Reliability, Safety, Fairness, Privacy, and Transparency — with weightings calibrated to real-world risk. Each evaluation aggregates test scores and evidence depth. The letter grade reflects the final weighted score and review confidence.
Outcome
- System received an A grade with verifiable badge.
- Publication of a public safety report and traceable badge URL.
“The audit made our release board-ready. The badge is now part of our pitch.” — CTO, regulated fintechStart Free Scan
FAQ
Common questions
We look at robustness, accuracy drift, privacy, security posture, governance,
fairness/bias, explainability, and evidence trails. Everything rolls up into a
single letter grade (AAA–B) with pillar scores.
Each pillar gets a 0–100 score from tests and evidence. We weight pillars and
compute the overall grade. You also receive a verifiable badge link.
We use a scoped evidence pack (artifacts, redacted samples, and execution traces).
No production keys are required. For sensitive cases, we run in a secure review
room or via your VPC.
A Trust Report (PDF + JSON), pillar scores, remediation checklist, and a
verification badge URL you can embed in docs or your site.
Yes. We can re-score after you add controls or ship a new model version so your
badge and evidence stay current.
