---
name: eval-dataset-multilingual-prompts
description: Use when running the multilingual benchmark that tests language detection accuracy, output language matching, Arabic legal terminology quality, and bilingual document formatting across English, Arabic, French, and mixed-language inputs. Key metric is language-match rate ≥ 95%.
license: MIT
metadata:
  id: eval.dataset.multilingual-prompts
  category: eval
  priority: P0
  intent: [__eval__, multilingual, arabic, french, language-detection]
  related: [eval-benchmark-runner, eval-rubric-language-quality-ar, eval-rubric-language-quality-en, eval-regression-detector, eval-dataset-nda-prompts-30]
  source: Louis — HAQQ Legal AI (github.com/sboghossian/mini-claude-for-legal)
  version: "1.0"
---

# Eval Dataset — Multilingual Prompts

## Scope

~50 prompts across English, Arabic (MSA, Levantine, Gulf), French (Lebanese-French and standard), mixed AR/EN, and explicit translation/bilingual-formatting requests. Tests the full multilingual pipeline from language detection through output generation. Correct language handling is a hard requirement for MENA legal practice — a lawyer who writes in Arabic and gets an English response has a broken product experience.

Key metric: **language-match rate ≥ 95%** (output language matches input language, unless user explicitly requests otherwise).

Storage: `eval/datasets/multilingual-prompts.jsonl`

## How to use this pack

1. Load into [[eval-benchmark-runner]] with [[eval-rubric-language-quality-ar]] and [[eval-rubric-language-quality-en]] as scoring rubrics.
2. For each response, run automated language detection on the output and compare to the input language.
3. Compute `language_match_rate` = (correct_language_responses / total).
4. For Arabic outputs, submit a sample to a human Arabic legal reviewer quarterly.
5. Feed results to [[eval-regression-detector]].

## Categories

### Category 1 — Arabic-only inputs (~12 prompts)

Test that Arabic input produces Arabic output with correct legal terminology.

**MSA (Modern Standard Arabic) — formal legal register:**
- "أعدّ لي عقد عمل بموجب قانون العمل الإماراتي." (Draft an employment contract under UAE Labour Law.)
- "ما هي شروط اتفاقية عدم الإفصاح في القانون اللبناني؟" (What are the NDA requirements under Lebanese law?)
- "راجع هذا البند وحدد المخاطر القانونية." (Review this clause and identify legal risks.)

**Levantine Arabic (Lebanese dialect) — client-facing register:**
- "بدي تعاقد عمل للبنان، شو بدك مني؟" (I want an employment contract for Lebanon, what do you need from me?)
- "هالعقد مظبوط؟ شو في غلط فيه؟" (Is this contract correct? What's wrong with it?)

**Gulf Arabic (UAE/KSA dialect):**
- "أبغى أسوي عقد NDA للسعودية." (I want to make an NDA for Saudi Arabia.)
- "وش الفرق بين عقد العمل في الإمارات وفي المملكة؟" (What's the difference between employment contracts in UAE vs KSA?)

**Expected behavior**: Output in Arabic (MSA preferred for legal documents, dialect acceptable for conversational clarifications); legal terminology must be accurate (مكافأة نهاية الخدمة not just "gratuity transliterated"; اتفاقية عدم الإفصاح not "NDA in Arabic letters").

### Category 2 — French-only inputs (~10 prompts)

**Lebanese-French (legal-professional register):**
- "Rédigez un contrat de travail conforme au Code du travail libanais." (Draft an employment contract compliant with the Lebanese Labour Code.)
- "Quelle est la durée maximale de la période d'essai au Liban?" (What is the maximum probation period in Lebanon?)
- "Vérifiez cette clause de confidentialité pour un accord soumis au droit français." (Review this confidentiality clause for a French-law agreement.)

**Standard French (France / EU):**
- "Rédigez un NDA selon le droit français."
- "Expliquez les règles RGPD applicables à ce contrat de traitement de données."

**Expected behavior**: Output in French; legal terms in French (clause de confidentialité, rupture conventionnelle, période d'essai).

### Category 3 — Mixed Arabic-English inputs (~10 prompts)

Common in MENA legal practice: a message that switches languages mid-sentence.

- "Review هذا العقد and tell me what's missing." (English request with Arabic object)
- "أريد NDA لـ DIFC — what are the key clauses?" (Arabic request with English terms)
- "هل الـ force majeure clause مناسبة للعقود الإماراتية؟" (Arabic question with English legal term)

**Expected behavior**: Respond in the dominant language of the prompt (usually Arabic if the grammatical structure is Arabic). Do not mix languages in the response unless the question specifically asks for it.

### Category 4 — Bilingual document requests (~10 prompts)

Explicit requests for side-by-side bilingual documents:
- "Draft an NDA with Arabic on the left and English on the right."
- "أعطني عقد العمل بالعربي والإنجليزي جنب بعض." (Give me the employment contract in Arabic and English side by side.)
- "I need a bilingual lease agreement (AR/EN) for a UAE property — Arabic is the controlling language."

**Expected behavior**: Output formatted in two columns or clearly alternating sections; "controlling language" statement included (Arabic version controls); legal terminology consistent between both versions.

### Category 5 — Translation requests (~8 prompts)

- "Translate this English NDA clause into Arabic."
- "ترجم هذه الفقرة من العربية إلى الإنجليزية." (Translate this paragraph from Arabic to English.)
- "Translate this NDA governing law clause from French to English."

**Expected behavior**: Accurate legal translation (not machine-literal); terminology matches the target jurisdiction's conventions.

## Scoring dimensions

| Dimension | Method | Target |
|---|---|---|
| Language match rate | Automated language detection on output | ≥ 95% |
| Arabic legal term accuracy | Human rater (sample) | ≥ 4.0/5 |
| French legal term accuracy | LLM judge | ≥ 3.5/5 |
| Bilingual formatting | Rule-based check (two-column/alternating present) | ≥ 90% of bilingual requests |
| Controlling language statement | String match check | 100% of bilingual drafts |

## Caveats & currency

- Arabic legal dialect varies by country; Gulf Arabic and Levantine Arabic are distinct. The product targets legal professionals who primarily write MSA — but intake may be in dialect.
- French legal vocabulary in Lebanon differs slightly from Metropolitan French (Lebanese lawyers use Code de la Route, Code des Obligations et des Contrats, etc.).
- Automated language detection tools (langdetect, fastText) struggle with short Arabic inputs and mixed text. Human review of a 10% sample each quarter is necessary.

## Related skills

- [[eval-benchmark-runner]] — orchestrates this dataset
- [[eval-rubric-language-quality-ar]] — Arabic quality scoring rubric
- [[eval-rubric-language-quality-en]] — English quality scoring rubric
- [[eval-regression-detector]] — tracks language-match rate across deployments
