Track LLM

Track LLM

Track LLM

A government-backed evaluation framework for LLM applications ensuring safer AI deployments in India

A government-backed evaluation framework for LLM applications ensuring safer AI deployments in India

A government-backed evaluation framework for LLM applications ensuring safer AI deployments in India

Role

Role

Product Designer

Product Designer

Responsibilities

Responsibilities

Branding, Design System, Low and High fidelity prototyping Usability testing, visuals

Research- Amrita University

Branding, Design System, Low and High fidelity prototyping Usability testing, visuals

Research- Amrita University

Tools

Tools

Figma, Miro, Bolt AI, Claude AI, Chat GPT

Figma, Miro, Bolt AI, Claude AI, Chat GPT

Challenge

Challenge

Challenge

The Challenge

India faces critical AI safety risks across three vital sectors:

  • Healthcare: Unsafe LLM medical advice risks patient safety

  • Finance: Misleading AI guidance leads to exploitation of citizens

  • Education: Incorrect content creates learning inequities

The Challenge

India faces critical AI safety risks across three vital sectors:

  • Healthcare: Unsafe LLM medical advice risks patient safety

  • Finance: Misleading AI guidance leads to exploitation of citizens

  • Education: Incorrect content creates learning inequities

India faces critical AI safety risks across three vital sectors:

  • Healthcare: Unsafe LLM medical advice risks patient safety

  • Finance: Misleading AI guidance leads to exploitation of citizens

  • Education: Incorrect content creates learning inequities

Problem

Problem

Problem

While global governments create AI safety frameworks, existing solutions are generic and fail to capture India's diversity of languages, cultures, and socio-economic need

While global governments create AI safety frameworks, existing solutions are generic and fail to capture India's diversity of languages, cultures, and socio-economic need

Opportunity

Opportunity

Opportunity

Track LLM is a context-aware evaluation framework that enables developers, testers, and GRC teams to evaluate their LLM applications for bias, fairness, toxicity, and truthfulness, specifically within India's diverse context.

Track LLM is a context-aware evaluation framework that enables developers, testers, and GRC teams to evaluate their LLM applications for bias, fairness, toxicity, and truthfulness, specifically within India's diverse context.

Discovery & Research Phase

Discovery & Research Phase

Discovery & Research Phase

Problem Space

Problem Space

Problem Space Research

Global AI Safety Context:

  • Governments worldwide creating AI safety frameworks

  • Focus on risk assessments, bias checks, accountability

  • Generic solutions don't address India's unique context

India-Specific Challenges Identified:

  1. Linguistic Diversity: 22+ scheduled languages underrepresented in LLMs

  2. Cultural Context: Caste, religion, regional biases not addressed

  3. Socio-Economic Disparity: Rural vs urban, economic class variations

  4. Domain Criticality: Healthcare, Finance, Education require highest safety

Key Insight- Existing LLM evaluation tools test models (e.g., 'Is GPT-5 biased?') but don't evaluate real-world applications (e.g., 'Is my Tutoring Bot safe for Indian students?')

Problem Space Research

Global AI Safety Context:

  • Governments worldwide creating AI safety frameworks

  • Focus on risk assessments, bias checks, accountability

  • Generic solutions don't address India's unique context

India-Specific Challenges Identified:

  1. Linguistic Diversity: 22+ scheduled languages underrepresented in LLMs

  2. Cultural Context: Caste, religion, regional biases not addressed

  3. Socio-Economic Disparity: Rural vs urban, economic class variations

  4. Domain Criticality: Healthcare, Finance, Education require highest safety

Key Insight- Existing LLM evaluation tools test models (e.g., 'Is GPT-5 biased?') but don't evaluate real-world applications (e.g., 'Is my Tutoring Bot safe for Indian students?')

Competitor Analysis

Competitor Analysis

Competitor Analysis

Tools Analyzed:

  • HELM, Latitude, LangWatch, LM Eval Harness

  • OpenAI Evals, PromptBench, MT-Bench

Competitive Advantage Identified:

  • First platform to evaluate LLM applications vs. models

  • Domain-specific risk assessment for India

  • Actionable analytics that show how to fix problems

Tools Analyzed:

  • HELM, Latitude, LangWatch, LM Eval Harness

  • OpenAI Evals, PromptBench, MT-Bench

Competitive Advantage Identified:

  • First platform to evaluate LLM applications vs. models

  • Domain-specific risk assessment for India

  • Actionable analytics that show how to fix problems

Key Research Findings

Key Research Findings

Key Research Findings

Pain Points Discovered:

  1. Developers:

    • "We don't know if our chatbot is safe to deploy"

    • "Generic bias tests don't catch India-specific issues"

    • "We need domain-specific evaluation for healthcare apps"

    • "Current tools are too technical for our team"

  2. GRC Teams:

    • "Need compliance documentation for AI governance"

    • "Can't explain AI risks to non-technical stakeholders"

    • "Lack standardized evaluation frameworks for India"

  3. Researchers:

    • "No datasets covering Indian cultural contexts"

    • "Western benchmarks miss caste, regional biases"

    • "Need reproducible evaluation methodology"

Pain Points Discovered:

  1. Developers:

    • "We don't know if our chatbot is safe to deploy"

    • "Generic bias tests don't catch India-specific issues"

    • "We need domain-specific evaluation for healthcare apps"

    • "Current tools are too technical for our team"

  2. GRC Teams:

    • "Need compliance documentation for AI governance"

    • "Can't explain AI risks to non-technical stakeholders"

    • "Lack standardized evaluation frameworks for India"

  3. Researchers:

    • "No datasets covering Indian cultural contexts"

    • "Western benchmarks miss caste, regional biases"

    • "Need reproducible evaluation methodology"

User Personas

User Personas

User Personas

User Needs

User Needs

User Needs

Functional Requirements:

  1. Connect LLM application endpoints easily

  2. Select domain-specific evaluation tests

  3. Run automated bias, fairness, toxicity, truthfulness checks

  4. View detailed, interpretable results

  5. Compare results across different tests

  6. Access evaluation methodology documentation

Non-Functional Requirements:

  1. Accessible to non-technical users

  2. India-specific context awareness

  3. Government-backed credibility

  4. Transparent evaluation methodology

  5. Fast evaluation turnaround (<30 mins)

  6. Secure handling of API credentials

Functional Requirements:

  1. Connect LLM application endpoints easily

  2. Select domain-specific evaluation tests

  3. Run automated bias, fairness, toxicity, truthfulness checks

  4. View detailed, interpretable results

  5. Compare results across different tests

  6. Access evaluation methodology documentation


Non-Functional Requirements:

  1. Accessible to non-technical users

  2. India-specific context awareness

  3. Government-backed credibility

  4. Transparent evaluation methodology

  5. Fast evaluation turnaround (<30 mins)

  6. Secure handling of API credentials

Functional Requirements:

  1. Connect LLM application endpoints easily

  2. Select domain-specific evaluation tests

  3. Run automated bias, fairness, toxicity, truthfulness checks

  4. View detailed, interpretable results

  5. Compare results across different tests

  6. Access evaluation methodology documentation

Non-Functional Requirements:

  1. Accessible to non-technical users

  2. India-specific context awareness

  3. Government-backed credibility

  4. Transparent evaluation methodology

  5. Fast evaluation turnaround (<30 mins)

  6. Secure handling of API credentials

Information Architecture

Information Architecture

Information Architecture

System Architecture

System Architecture

System Architecture

Based on the technical architecture, the system has 5 core components:

  1. Track LLM Interface - User-facing web application

  2. Recommendation Engine - Suggests domain-specific evaluations

  3. Evaluation Engine - Runs tests against LLM apps

  4. Dataset Repository - Indigenous Indian datasets

  5. Results Dashboard - Analytics and reporting

Based on the technical architecture, the system has 5 core components:

  1. Track LLM Interface - User-facing web application

  2. Recommendation Engine - Suggests domain-specific evaluations

  3. Evaluation Engine - Runs tests against LLM apps

  4. Dataset Repository - Indigenous Indian datasets

  5. Results Dashboard - Analytics and reporting

User Flow Mapping

User Flow Mapping

User Flow Mapping

Low-fidelity Design

Low-fidelity Design

Low-fidelity Design

Design System

Design System

Track LLM Color System

A modern, accessible color palette for India's AI evaluation platform

Primary Colors

Primary Lavender

#A7ABF6

Main brand color for headers, navigation, primary actions, and key UI elements.

Represents innovation and approachability.

✓ WCAG AA

Accent Olive

#819337

Main brand color for headers, navigation, success states, positive metrics. Represents growth, balance, and natural intelligence.

✓ WCAG AA

Color Psychology

Lavender: Calming yet modern. Breaks from traditional government blues while maintaining professionalism. Creates approachable AI platform feel

without appearing frivolous.

Olive Green: Grounded and trustworthy. Natural green holds universally positive connotations in Indian culture. Balances tech-forward lavender with organic stability.

Semantic Colors

Success Green

7FD798

Grade A, excellent performance,

completed states, positive outcomes.

Warning

#FFD19C

Grade B/C, moderate risk, areas

needing attention and review.

Critical Red

#F55E5E

Grade D/F, critical issues, errors

requiring immediate attention.

Information Blue

#7C7EEB

Informational alerts, neutral highlights.

Derived from primary lavender.

Neutral Palette

Gray 50

#F8F7F7

Page backgrounds, light surfaces

Gray 100

#F3F4F6

Card backgrounds, hover states

Gray 300

#D1D5DB

Borders, dividers

Gray 500

#6B7280

Secondary text, icons

Gray 900

#262626

Primary text, headings

Component Examples

Button Styles

Evaluate App

View Results

Learn More

Grade Badges

Pass

Fail

Medium

Alert Messages

Info: Your evaluation is processing. Estimated time: 15 minutes.

Success: Evaluation completed! No critical issues found.

Neutral Palette

Gray 50

#F8F7F7

Page backgrounds, light surfaces

Gray 100

#F3F4F6

Card backgrounds, hover states

Gray 300

#D1D5DB

Borders, dividers

Gray 500

#6B7280

Secondary text, icons

Gray 900

#262626

Primary text, headings

Neutral Palette

Gray 50

#F8F7F7

Page backgrounds, light surfaces

Gray 100

#F3F4F6

Card backgrounds, hover states

Gray 300

#D1D5DB

Borders, dividers

Gray 500

#6B7280

Secondary text, icons

Gray 900

#262626

Primary text, headings

Component Examples

Button Styles

Evaluate App

View Results

Learn More

Grade Badges

Pass

Fail

Medium

Alert Messages

Info: Your evaluation is processing. Estimated time: 15 minutes.

Success: Evaluation completed! No critical issues found.

Accessibility Standards

All color combinations in this palette have been tested for WCAG 2.1 compliance:

✓ Lavender #A7ABF6 on White: Contrast ratio 4.6:1 (AA Large Text)

 

✓ Olive #819337 on White: Contrast ratio 4.8:1 (AA)

 

✓ Gray 900 on White: Contrast ratio 16.1:1 (AAA)

 

All primary text uses high-contrast neutrals for maximum readability.

Accessibility Standards

All color combinations in this palette have been tested for WCAG 2.1 compliance:

✓ Lavender #A7ABF6 on White: Contrast ratio 4.6:1 (AA Large Text)

✓ Olive #819337 on White: Contrast ratio 4.8:1 (AA)

✓ Gray 900 on White: Contrast ratio 16.1:1 (AAA)

✓ Success/Warning/Error colors meet AA standards

All primary text uses high-contrast neutrals for maximum readability.

Design Rationale

Design Rationale

Design Rationale

Accessibility Standards

Why this palette works for Track LLM:

  • Modern yet Trustworthy: Lavender breaks from traditional government blues while maintaining institutional credibility

  • Balanced Contrast: The cool lavender + warm olive creates visual interest without overwhelming

  • Accessible: All color combinations meet WCAG AA standards for readability

  • AI-Forward: Purple/lavender hues are associated with innovation and technology without being cliché

  • Calming Interface: Softer colors reduce anxiety when dealing with AI safety issues

Usage Distribution:

  • 60% Neutral grays (structure, backgrounds)

  • 25% Primary Lavender (navigation, headers, primary actions)

  • 10% Accent Olive (CTAs, positive highlights)

  • 5% Semantic colors (alerts, grades)

Why this palette works for Track LLM:

  • Modern yet Trustworthy: Lavender breaks from traditional government blues while maintaining institutional credibility

  • Balanced Contrast: The cool lavender + warm olive creates visual interest without overwhelming

  • Accessible: All color combinations meet WCAG AA standards for readability

  • AI-Forward: Purple/lavender hues are associated with innovation and technology without being cliché

  • Calming Interface: Softer colors reduce anxiety when dealing with AI safety issues

Usage Distribution:

  • 60% Neutral grays (structure, backgrounds)

  • 25% Primary Lavender (navigation, headers, primary actions)

  • 10% Accent Olive (CTAs, positive highlights)

  • 5% Semantic colors (alerts, grades)

All color combinations in this palette have been tested for WCAG 2.1 compliance:

✓ Lavender #A7ABF6 on White: Contrast ratio 4.6:1 (AA Large Text)

✓ Olive #819337 on White: Contrast ratio 4.8:1 (AA)

✓ Gray 900 on White: Contrast ratio 16.1:1 (AAA)

✓ Success/Warning/Error colors meet AA standards

All primary text uses high-contrast neutrals for maximum readability.

Design Rationale

Why this palette works for Track LLM:

  • Modern yet Trustworthy: Lavender breaks from traditional government blues while maintaining institutional credibility

  • Balanced Contrast: The cool lavender + warm olive creates visual interest without overwhelming

  • Culturally Appropriate: Green has universally positive associations in Indian culture (growth, prosperity, nature)

  • Accessible: All color combinations meet WCAG AA standards for readability

  • AI-Forward: Purple/lavender hues are associated with innovation and technology without being cliché

  • Calming Interface: Softer colors reduce anxiety when dealing with AI safety issues

Usage Distribution:

  • 60% Neutral grays (structure, backgrounds)

  • 25% Primary Lavender (navigation, headers, primary actions)

  • 10% Accent Olive (CTAs, positive highlights)

  • 5% Semantic colors (alerts, grades)

Like what you see?

LETS CONNECT

Like what you see?

LETS CONNECT

Like what you see?

LETS CONNECT

Like what you see?

LETS CONNECT

Like what you see?

LETS CONNECT