Track LLM

Track LLM

Track LLM

A government-backed evaluation framework for LLM applications ensuring safer AI deployments in India

A government-backed evaluation framework for LLM applications ensuring safer AI deployments in India

Role

Product Designer

Responsibilities

Branding, Design System, Low and High fidelity prototyping Usability testing, visuals

Research- Amrita University

Tools

Figma, Miro, Bolt AI, Claude AI, Chat GPT

Outcome and Learning Summary

Outcome and Learning Summary

Outcome: Designed 20+ screens and 6 dashboards, translating complex AI test results into a 1- 100 risk scoring system through 8 feedback cycles with 6- 7 stakeholders.

Learning: In high-stakes AI systems, clear thresholds, hierarchy, and accessibility are essential for trustworthy decisions.

India faces critical AI safety risks across three vital sectors:

  • Healthcare: Unsafe LLM medical advice risks patient safety

  • Finance: Misleading AI guidance leads to exploitation of citizens

  • Education: Incorrect content creates learning inequities

Challenge

Challenge

The Challenge

India faces critical AI safety risks across three vital sectors:

  • Healthcare: Unsafe LLM medical advice risks patient safety

  • Finance: Misleading AI guidance leads to exploitation of citizens

  • Education: Incorrect content creates learning inequities

India faces critical AI safety risks across three vital sectors:

  • Healthcare: Unsafe LLM medical advice risks patient safety

  • Finance: Misleading AI guidance leads to exploitation of citizens

  • Education: Incorrect content creates learning inequities

Problem

Problem

While global governments create AI safety frameworks, existing solutions are generic and fail to capture India's diversity of languages, cultures, and socio-economic need

Opportunity

Opportunity

Track LLM is a context-aware evaluation framework that enables developers, testers, and GRC teams to evaluate their LLM applications for bias, fairness, toxicity, and truthfulness, specifically within India's diverse context.

Discovery & Research Phase

Discovery & Research Phase

Problem Space

Problem Space Research

Global AI Safety Context:

  • Governments worldwide creating AI safety frameworks

  • Focus on risk assessments, bias checks, accountability

  • Generic solutions don't address India's unique context

India-Specific Challenges Identified:

  1. Linguistic Diversity: 22+ scheduled languages underrepresented in LLMs

  2. Cultural Context: Caste, religion, regional biases not addressed

  3. Socio-Economic Disparity: Rural vs urban, economic class variations

  4. Domain Criticality: Healthcare, Finance, Education require highest safety

Key Insight- Existing LLM evaluation tools test models (e.g., 'Is GPT-5 biased?') but don't evaluate real-world applications (e.g., 'Is my Tutoring Bot safe for Indian students?')

Competitor Analysis

Competitor Analysis

Tools Analyzed:

  • HELM, Latitude, LangWatch, LM Eval Harness

  • OpenAI Evals, PromptBench, MT-Bench

Competitive Advantage Identified:

  • First platform to evaluate LLM applications vs. models

  • Domain-specific risk assessment for India

  • Actionable analytics that show how to fix problems

Key Research Findings

Key Research Findings

Pain Points Discovered:

  1. Developers:

    • "We don't know if our chatbot is safe to deploy"

    • "Generic bias tests don't catch India-specific issues"

    • "We need domain-specific evaluation for healthcare apps"

    • "Current tools are too technical for our team"

  2. GRC Teams:

    • "Need compliance documentation for AI governance"

    • "Can't explain AI risks to non-technical stakeholders"

    • "Lack standardized evaluation frameworks for India"

  3. Researchers:

    • "No datasets covering Indian cultural contexts"

    • "Western benchmarks miss caste, regional biases"

    • "Need reproducible evaluation methodology"

User Personas

User Personas

User Personas

User Needs

User Needs

Functional Requirements:

  1. Connect LLM application endpoints easily

  2. Select domain-specific evaluation tests

  3. Run automated bias, fairness, toxicity, truthfulness checks

  4. View detailed, interpretable results

  5. Compare results across different tests

  6. Access evaluation methodology documentation

Non-Functional Requirements:

  1. Accessible to non-technical users

  2. India-specific context awareness

  3. Government-backed credibility

  4. Transparent evaluation methodology

  5. Fast evaluation turnaround (<30 mins)

  6. Secure handling of API credentials

Functional Requirements:

  1. Connect LLM application endpoints easily

  2. Select domain-specific evaluation tests

  3. Run automated bias, fairness, toxicity, truthfulness checks

  4. View detailed, interpretable results

  5. Compare results across different tests

  6. Access evaluation methodology documentation

Non-Functional Requirements:

  1. Accessible to non-technical users

  2. India-specific context awareness

  3. Government-backed credibility

  4. Transparent evaluation methodology

  5. Fast evaluation turnaround (<30 mins)

  6. Secure handling of API credentials

Information Architecture

Information Architecture

Information Architecture

System Architecture

System Architecture

Based on the technical architecture, the system has 5 core components:

  1. Track LLM Interface - User-facing web application

  2. Recommendation Engine - Suggests domain-specific evaluations

  3. Evaluation Engine - Runs tests against LLM apps

  4. Dataset Repository - Indigenous Indian datasets

  5. Results Dashboard - Analytics and reporting

User Flow Mapping

User Flow Mapping

User Flow Mapping

Low-fidelity Design

Low-fidelity Design

Low-fidelity Design

Design System

Design System

Track LLM Color System

A modern, accessible color palette for India's AI evaluation platform

Primary Colors

Primary Lavender

#A7ABF6

Main brand color for headers, navigation, primary actions, and key UI elements.

Represents innovation and approachability.

✓ WCAG AA

Accent Olive

#819337

Main brand color for headers, navigation, success states, positive metrics. Represents growth, balance, and natural intelligence.

✓ WCAG AA

Color Psychology

Lavender: Calming yet modern. Breaks from traditional government blues while maintaining professionalism. Creates approachable AI platform feel

without appearing frivolous.

Olive Green: Grounded and trustworthy. Natural green holds universally positive connotations in Indian culture. Balances tech-forward lavender with organic stability.

Semantic Colors

Success Green

7FD798

Grade A, excellent performance,

completed states, positive outcomes.

Warning

#FFD19C

Grade B/C, moderate risk, areas

needing attention and review.

Critical Red

#F55E5E

Grade D/F, critical issues, errors

requiring immediate attention.

Information Blue

#7C7EEB

Informational alerts, neutral highlights.

Derived from primary lavender.

Neutral Palette

Gray 50

#F8F7F7

Page backgrounds, light surfaces

Gray 100

#F3F4F6

Card backgrounds, hover states

Gray 300

#D1D5DB

Borders, dividers

Gray 500

#6B7280

Secondary text, icons

Gray 900

#262626

Primary text, headings

Component Examples

Button Styles

Evaluate App

View Results

Learn More

Grade Badges

Pass

Fail

Medium

Alert Messages

Info: Your evaluation is processing. Estimated time: 15 minutes.

Success: Evaluation completed! No critical issues found.

Neutral Palette

Gray 50

#F8F7F7

Page backgrounds, light surfaces

Gray 100

#F3F4F6

Card backgrounds, hover states

Gray 300

#D1D5DB

Borders, dividers

Gray 500

#6B7280

Secondary text, icons

Gray 900

#262626

Primary text, headings

Neutral Palette

Gray 50

#F8F7F7

Page backgrounds, light surfaces

Gray 100

#F3F4F6

Card backgrounds, hover states

Gray 300

#D1D5DB

Borders, dividers

Gray 500

#6B7280

Secondary text, icons

Gray 900

#262626

Primary text, headings

Component Examples

Button Styles

Evaluate App

View Results

Learn More

Grade Badges

Pass

Fail

Medium

Alert Messages

Info: Your evaluation is processing. Estimated time: 15 minutes.

Success: Evaluation completed! No critical issues found.

Accessibility Standards

All color combinations in this palette have been tested for WCAG 2.1 compliance:

✓ Lavender #A7ABF6 on White: Contrast ratio 4.6:1 (AA Large Text)

 

✓ Olive #819337 on White: Contrast ratio 4.8:1 (AA)

 

✓ Gray 900 on White: Contrast ratio 16.1:1 (AAA)

 

All primary text uses high-contrast neutrals for maximum readability.

Accessibility Standards

All color combinations in this palette have been tested for WCAG 2.1 compliance:

✓ Lavender #A7ABF6 on White: Contrast ratio 4.6:1 (AA Large Text)

✓ Olive #819337 on White: Contrast ratio 4.8:1 (AA)

✓ Gray 900 on White: Contrast ratio 16.1:1 (AAA)

✓ Success/Warning/Error colors meet AA standards

All primary text uses high-contrast neutrals for maximum readability.

Design Rationale

Design Rationale

Accessibility Standards

Why this palette works for Track LLM:

  • Modern yet Trustworthy: Lavender breaks from traditional government blues while maintaining institutional credibility

  • Balanced Contrast: The cool lavender + warm olive creates visual interest without overwhelming

  • Accessible: All color combinations meet WCAG AA standards for readability

  • AI-Forward: Purple/lavender hues are associated with innovation and technology without being cliché

  • Calming Interface: Softer colors reduce anxiety when dealing with AI safety issues

Usage Distribution:

  • 60% Neutral grays (structure, backgrounds)

  • 25% Primary Lavender (navigation, headers, primary actions)

  • 10% Accent Olive (CTAs, positive highlights)

  • 5% Semantic colors (alerts, grades)

All color combinations in this palette have been tested for WCAG 2.1 compliance:

✓ Lavender #A7ABF6 on White: Contrast ratio 4.6:1 (AA Large Text)

✓ Olive #819337 on White: Contrast ratio 4.8:1 (AA)

✓ Gray 900 on White: Contrast ratio 16.1:1 (AAA)

✓ Success/Warning/Error colors meet AA standards

All primary text uses high-contrast neutrals for maximum readability.

Outcomes

Outcomes

Accessibility Standards

Delivered 20+ screens, including 6 data-heavy dashboards, for a government-backed LLM benchmarking platform.

  • Designed 1 primary dashboard + 5 analytical views to visualise AI evaluation results using a 1- 100 risk scoring system.

  • Implemented contextual risk alerts to highlight high-severity and non-compliant model behaviour.

  • Iterated across 8 major feedback cycles over 8 months, collaborating with 6-7 stakeholders across AI/ML, product, engineering, and leadership.

  • Applied accessibility best practices to dashboards, including colour-contrast validation, to support inclusive and responsible interpretation of AI risk data.

Delivered 20+ screens, including 6 data-heavy dashboards, for a government-backed LLM benchmarking platform.

  • Designed 1 primary dashboard + 5 analytical views to visualise AI evaluation results using a 1- 100 risk scoring system.

  • Implemented contextual risk alerts to highlight high-severity and non-compliant model behaviour.

  • Iterated across 8 major feedback cycles over 8 months, collaborating with 6-7 stakeholders across AI/ML, product, engineering, and leadership.

  • Applied accessibility best practices to dashboards, including colour-contrast validation, to support inclusive and responsible interpretation of AI risk data.

All color combinations in this palette have been tested for WCAG 2.1 compliance:

✓ Lavender #A7ABF6 on White: Contrast ratio 4.6:1 (AA Large Text)

✓ Olive #819337 on White: Contrast ratio 4.8:1 (AA)

✓ Gray 900 on White: Contrast ratio 16.1:1 (AAA)

✓ Success/Warning/Error colors meet AA standards

All primary text uses high-contrast neutrals for maximum readability.

Learnings

Learnings

Accessibility Standards

  • Numeric scoring systems (1- 100) require clear thresholds and hierarchy to avoid misinterpretation.

  • Data-dense dashboards need progressive disclosure to balance overview and depth.

  • Long-running projects benefit from structured feedback cycles to manage stakeholder complexity.

  • Accessibility is critical in risk-based interfaces, where clarity directly impacts decision-making.

All color combinations in this palette have been tested for WCAG 2.1 compliance:

✓ Lavender #A7ABF6 on White: Contrast ratio 4.6:1 (AA Large Text)

✓ Olive #819337 on White: Contrast ratio 4.8:1 (AA)

✓ Gray 900 on White: Contrast ratio 16.1:1 (AAA)

✓ Success/Warning/Error colors meet AA standards

All primary text uses high-contrast neutrals for maximum readability.

Design Rationale

Why this palette works for Track LLM:

  • Modern yet Trustworthy: Lavender breaks from traditional government blues while maintaining institutional credibility

  • Balanced Contrast: The cool lavender + warm olive creates visual interest without overwhelming

  • Culturally Appropriate: Green has universally positive associations in Indian culture (growth, prosperity, nature)

  • Accessible: All color combinations meet WCAG AA standards for readability

  • AI-Forward: Purple/lavender hues are associated with innovation and technology without being cliché

  • Calming Interface: Softer colors reduce anxiety when dealing with AI safety issues

Usage Distribution:

  • 60% Neutral grays (structure, backgrounds)

  • 25% Primary Lavender (navigation, headers, primary actions)

  • 10% Accent Olive (CTAs, positive highlights)

  • 5% Semantic colors (alerts, grades)

Like what you see?

LETS CONNECT

Like what you see?

LETS CONNECT

Like what you see?

LETS CONNECT

Like what you see?

LETS CONNECT