# AI Assistant — Model Card (Task #536)

**Version:** 1.2  •  **Date:** 2026-05-11

## Model details

- **Provider:** Google Cloud Vertex AI
- **Model:** Gemini 2.5 Flash (`gemini-2.5-flash`). Officiate360 pins the
  SKU via the `AI_MODEL_NAME` runtime setting (Support Admin →
  AI Configuration → Runtime Settings, env-var fallback `AI_MODEL_NAME`).
  Upgraded from Gemini 2.5 Flash-Lite in Task #647 — the data contract,
  prompt set, and gating chain are unchanged; Flash gives noticeably
  better summary quality at a modestly higher cost-per-call that stays
  well inside existing per-tenant caps.
- **Region:** Configured per deployment via `GCP_VERTEX_REGION`
  (default: `australia-southeast1` for AU/NZ tenants).
- **Project:** `GCP_PROJECT_ID`.
- **Required production guards:** `GCP_VERTEX_ZDR_CONFIRMED=true` and
  `GCP_VERTEX_MODEL_ARMOR_CONFIRMED=true` (boot fails closed otherwise).
- **API:** `generateContent` only — no streaming, no function calling, no
  multi-turn chat.

## Intended use

Officiate360 uses the model **only** to summarise structured
performance-review data into plain-language prose for one of six fixed
prompts:

| Prompt id | Output |
| --- | --- |
| `summarize_90_days` | High-level performance summary over the last 90 days |
| `strengths_weaknesses` | Top strengths and growth areas with evidence |
| `suggest_goals` | Suggested development goals |
| `draft_feedback` | Draft coach feedback message |
| `spot_trends` | Trend direction over the window |
| `peer_comparison` | Peer-comparison framing within the same level cohort |

Per-prompt minimum reviews / reviewers thresholds are enforced server-side
(see `PROMPTS` in `server/services/ai/ai-prompts.ts` for the live values).
There is **no free-text prompt input.** Users select one of the six.

## Inputs

What we send to Vertex:

- Aggregated rating data (per-category averages, distributions, counts).
- Referee profile fields with PII tokenized: level, years of experience.
- A fixed system prompt that fences the model's role and forbids
  speculation about identity, demographics, or protected attributes.

What we **never** send to Vertex:

- **Free-text review comments are excluded.** Comment fields are stored
  encrypted at rest and are not part of the AI input contract under any
  prompt.
- Personally identifying fields (names, emails, phone numbers, addresses)
  are removed or pseudonymized by `ai-pii.ts` before transmission.
- Data for any subject who has not granted opt-in consent
  (`users.ai_processing_consent`), is under 16 (per `users.date_of_birth`),
  or has been paused (`referees.ai_processing_paused`).

## Outputs

- Free-text prose, ≤ 800 tokens output cap (typically 200–400).
- Always rendered with the advisory disclaimer:
  *"AI-generated summary based on submitted reviews. Always advisory —
  verify before acting on any specific recommendation."*

## Limitations

- **Not a decision-maker.** The output must not be the sole basis for
  promotion, demotion, assignment, or disciplinary action.
- **Hallucination risk.** The model may invent plausible-sounding details;
  coaches must verify against the underlying review data.
- **Cold-start.** Refuses to run below the per-prompt minimum reviews /
  reviewers. We surface the deficit in the UI.
- **Coverage bias.** Reviewers with more entries dominate the signal —
  trend / growth areas weight by volume, which favours active reviewers.
- **English-language tuned.** Quality degrades for review data captured
  primarily in other languages.
- **No personalization.** The model has no memory of prior runs; identical
  inputs produce nearly-identical outputs (cached for 24h to enforce this).

## Performance

Figures refreshed for Gemini 2.5 Flash (Task #647). Flash is materially
slower and pricier than Flash-Lite per call, but produces measurably
better summary quality on the closed prompt set; both differences stay
inside the existing per-tenant cost caps.

- p50 latency: ~2.4s; p95: ~6.0s (regional Sydney).
- Average input: ~1.0k tokens (smaller now that comments are excluded);
  average output: ~340 tokens.
- Average cost per call (Gemini 2.5 Flash at May 2026 pricing): ~$0.0035.

## Safety & ethics

- **Comment exclusion** — free-text review comments are never sent.
- **PII scrubbing** before transmission (`ai-pii.ts`).
- **Subject opt-in consent** required (`users.ai_processing_consent`).
- **Under-16 hard block** at the gating layer (DOB-based).
- **Per-subject pause** available to admins (`referees.ai_processing_paused`).
- **Dual kill switch** — platform env (`AI_ASSISTANT_ENABLED`) plus
  per-tenant flag (`tenants.ai_assistant_enabled`).
- **Cost caps** — daily and monthly, per-user, per-tenant, and platform-wide
  (`AI_TENANT_DAILY_PROMPT_CAP`, `AI_TENANT_MONTHLY_PROMPT_CAP`,
  `AI_USER_DAILY_PROMPT_CAP`, `AI_TENANT_DAILY_COST_USD`,
  `AI_TENANT_MONTHLY_COST_USD`, `AI_PLATFORM_DAILY_COST_USD`,
  `AI_PLATFORM_MONTHLY_COST_USD`).
- **Auto-disable** — when a tenant crosses its monthly $ cap, the per-tenant
  AI flag is automatically flipped off for the remainder of the month and an
  audit event is recorded; an admin must manually re-enable next month.
- **Anomaly detector + circuit breaker** to bound blast radius of provider
  issues or abuse.
- **Full audit trail** of every run, denial, consent change, and config
  change (`ai.prompt.run`, `ai.prompt.denied`, `ai.prompt.error`,
  `ai.guardrail.tripped`, `ai.anomaly.detected`, `ai.consent.granted`,
  `ai.consent.revoked`, `ai.subject_consent.change`,
  `ai.subject_paused.change`, `ai.disclaimer.acknowledged`,
  `ai.config.tenant_toggle`, `ai.config.tier_downgrade`,
  `ai.config.change`).

## Out-of-scope uses

- Generating reviews on behalf of coaches.
- Predicting future performance or assignment fitness.
- Comparing referees against each other outside the bounded
  `peer_comparison` prompt (which is cohort-scoped and aggregated).
- Any output displayed to the referee themselves without coach review.
- Any free-text user prompt.

## Change management

- Pinning the model SKU in `AI_MODEL_NAME` is mandatory; we do not auto-track
  the latest model.
- A model upgrade requires:
  1. PIA addendum.
  2. Privacy policy version bump if behavior or sub-processors change.
  3. Re-publication of this card.

## Contact

- DPO: dpo@officiate360.example
- Engineering: see `docs/ai/GOOGLE_VERTEX_AI_SETUP.md` for operational
  contact and rollback procedure.