Streamlining Multilingual LLM Quality Evaluation Across Four Languages

73%

Decrease in Per-File Processing Time
275%

More Files Processed Daily
6,000+

Files Annotated Across 4 Language Combinations

The Client

Who Argos Data partnered with.

A prominent global online retailer with leadership positions across e-commerce, cloud computing, digital content, and AI. The engagement was led by the client's AI division to support specialized LLM evaluation work.

The Challenge

What needed to change.

The client needed 6,000+ files annotated across four different language combinations, with no efficient way to maintain quality at scale. Each language combination involved one source file and four target files generated by various LLM and machine translation engines. The project's complexity was compounded by a tight deadline and a high accuracy requirement, demanding precision, organizational discipline, and strict adherence to evaluation guidelines throughout.

The Argos Data Solution

What Argos Data built, customized, and deployed.

Argos Data built the MQM Annotator, a tool deployed inside Argos Myriad and configured to perform Multidimensional Quality Metrics (MQM) assessment on both source and target texts. The tool also supported subsequent annotation of the source, aligned closely to the client's predefined categories, subcategories, severities, and style guides.

The tool focused on eight delivery dimensions: rapid tool development, high-quality annotations, streamlined process, precise deliverables, scalability, feedback mechanism, user-friendly interface, and data security and confidentiality.

Capabilities delivered

Source and target text MQM assessment in a single interface
Embedded adherence to client-defined error taxonomy and severity categorization
Consolidated outputs into a single JSON file per project, formatted to the client's required schema
Automated labeling at scale without sacrificing quality

Results

Measurable outcomes for the client's AI program.

73%

reduction in per-file processing time (15 min → 4 min)

275%

more files processed daily without compromising quality

6,000+

files annotated across four language combinations

additional versions of the tool subsequently built

Consolidated deliverables aligned to client-specified JSON schema

Key Learnings

What this engagement taught us.

Flexibility in Tool Design

Tools designed and adapted to meet specific project requirements unlock reliable scale.
Effective Process Optimization

Simplifying annotation and review procedures delivers measurable time savings.
Balancing Quality and Speed

High-quality output and operational speed are not trade-offs when tooling is designed correctly.
User Training and Adjustment

Comprehensive training ensures smoother transitions and increased accuracy.
Ongoing Enhancement

Consistent feedback and updates significantly improve tool effectiveness over time.

Strategic Value

Why this engagement matters beyond the numbers.

The MQM Annotator's success prompted the client to entrust Argos Data with new expansive projects featuring diverse requirements, leading to the creation of two additional tool versions. This engagement is a clear demonstration that flexible, well-designed tooling translates directly into operational scale and renewed client trust.

Capabilities Demonstrated

Argos Data services in play.

Get in touch

From pilot to production.

Share your model objective, language coverage, and quality requirements. A member of our team will follow up to scope a structured, human-in-the-loop data program.