Track LLM

A government-backed evaluation framework for LLM applications ensuring safer AI deployments in India

Role

Product Designer

Responsibilities

Branding, Design System, Low and High fidelity prototyping Usability testing, visuals

Research- Amrita University

Branding, Design System, Low and High fidelity prototyping Usability testing, visuals

Research- Amrita University

Tools

Figma, Miro, Bolt AI, Claude AI, Chat GPT

Challenge

The Challenge

India faces critical AI safety risks across three vital sectors:

Healthcare: Unsafe LLM medical advice risks patient safety
Finance: Misleading AI guidance leads to exploitation of citizens
Education: Incorrect content creates learning inequities

The Challenge

India faces critical AI safety risks across three vital sectors:

Healthcare: Unsafe LLM medical advice risks patient safety
Finance: Misleading AI guidance leads to exploitation of citizens
Education: Incorrect content creates learning inequities

India faces critical AI safety risks across three vital sectors:

Healthcare: Unsafe LLM medical advice risks patient safety
Finance: Misleading AI guidance leads to exploitation of citizens
Education: Incorrect content creates learning inequities

Problem

While global governments create AI safety frameworks, existing solutions are generic and fail to capture India's diversity of languages, cultures, and socio-economic need

Opportunity

Track LLM is a context-aware evaluation framework that enables developers, testers, and GRC teams to evaluate their LLM applications for bias, fairness, toxicity, and truthfulness, specifically within India's diverse context.

Discovery & Research Phase

Problem Space

Problem Space Research

Global AI Safety Context:

Governments worldwide creating AI safety frameworks
Focus on risk assessments, bias checks, accountability
Generic solutions don't address India's unique context

India-Specific Challenges Identified:

Linguistic Diversity: 22+ scheduled languages underrepresented in LLMs
Cultural Context: Caste, religion, regional biases not addressed
Socio-Economic Disparity: Rural vs urban, economic class variations
Domain Criticality: Healthcare, Finance, Education require highest safety

Key Insight- Existing LLM evaluation tools test models (e.g., 'Is GPT-5 biased?') but don't evaluate real-world applications (e.g., 'Is my Tutoring Bot safe for Indian students?')

Problem Space Research

Global AI Safety Context:

Governments worldwide creating AI safety frameworks
Focus on risk assessments, bias checks, accountability
Generic solutions don't address India's unique context

India-Specific Challenges Identified:

Linguistic Diversity: 22+ scheduled languages underrepresented in LLMs
Cultural Context: Caste, religion, regional biases not addressed
Socio-Economic Disparity: Rural vs urban, economic class variations
Domain Criticality: Healthcare, Finance, Education require highest safety

Key Insight- Existing LLM evaluation tools test models (e.g., 'Is GPT-5 biased?') but don't evaluate real-world applications (e.g., 'Is my Tutoring Bot safe for Indian students?')

Competitor Analysis

Tools Analyzed:

HELM, Latitude, LangWatch, LM Eval Harness
OpenAI Evals, PromptBench, MT-Bench

Competitive Advantage Identified:

First platform to evaluate LLM applications vs. models
Domain-specific risk assessment for India
Actionable analytics that show how to fix problems

Tools Analyzed:

HELM, Latitude, LangWatch, LM Eval Harness
OpenAI Evals, PromptBench, MT-Bench

Competitive Advantage Identified:

First platform to evaluate LLM applications vs. models
Domain-specific risk assessment for India
Actionable analytics that show how to fix problems

Key Research Findings

Pain Points Discovered:

Developers:
- "We don't know if our chatbot is safe to deploy"
- "Generic bias tests don't catch India-specific issues"
- "We need domain-specific evaluation for healthcare apps"
- "Current tools are too technical for our team"
GRC Teams:
- "Need compliance documentation for AI governance"
- "Can't explain AI risks to non-technical stakeholders"
- "Lack standardized evaluation frameworks for India"
Researchers:
- "No datasets covering Indian cultural contexts"
- "Western benchmarks miss caste, regional biases"
- "Need reproducible evaluation methodology"

Pain Points Discovered:

Developers:
- "We don't know if our chatbot is safe to deploy"
- "Generic bias tests don't catch India-specific issues"
- "We need domain-specific evaluation for healthcare apps"
- "Current tools are too technical for our team"
GRC Teams:
- "Need compliance documentation for AI governance"
- "Can't explain AI risks to non-technical stakeholders"
- "Lack standardized evaluation frameworks for India"
Researchers:
- "No datasets covering Indian cultural contexts"
- "Western benchmarks miss caste, regional biases"
- "Need reproducible evaluation methodology"

User Personas

User Needs

Functional Requirements:

Connect LLM application endpoints easily
Select domain-specific evaluation tests
Run automated bias, fairness, toxicity, truthfulness checks
View detailed, interpretable results
Compare results across different tests
Access evaluation methodology documentation

Non-Functional Requirements:

Accessible to non-technical users
India-specific context awareness
Government-backed credibility
Transparent evaluation methodology
Fast evaluation turnaround (<30 mins)
Secure handling of API credentials

Functional Requirements:

Connect LLM application endpoints easily
Select domain-specific evaluation tests
Run automated bias, fairness, toxicity, truthfulness checks
View detailed, interpretable results
Compare results across different tests
Access evaluation methodology documentation

Non-Functional Requirements:

Accessible to non-technical users
India-specific context awareness
Government-backed credibility
Transparent evaluation methodology
Fast evaluation turnaround (<30 mins)
Secure handling of API credentials

Functional Requirements:

Connect LLM application endpoints easily
Select domain-specific evaluation tests
Run automated bias, fairness, toxicity, truthfulness checks
View detailed, interpretable results
Compare results across different tests
Access evaluation methodology documentation

Non-Functional Requirements:

Accessible to non-technical users
India-specific context awareness
Government-backed credibility
Transparent evaluation methodology
Fast evaluation turnaround (<30 mins)
Secure handling of API credentials

Information Architecture

System Architecture

Based on the technical architecture, the system has 5 core components:

Track LLM Interface - User-facing web application
Recommendation Engine - Suggests domain-specific evaluations
Evaluation Engine - Runs tests against LLM apps
Dataset Repository - Indigenous Indian datasets
Results Dashboard - Analytics and reporting

Based on the technical architecture, the system has 5 core components:

Track LLM Interface - User-facing web application
Recommendation Engine - Suggests domain-specific evaluations
Evaluation Engine - Runs tests against LLM apps
Dataset Repository - Indigenous Indian datasets
Results Dashboard - Analytics and reporting

User Flow Mapping

Low-fidelity Design

Design System

Track LLM Color System

A modern, accessible color palette for India's AI evaluation platform

Primary Colors

Primary Lavender

#A7ABF6

Main brand color for headers, navigation, primary actions, and key UI elements.

Represents innovation and approachability.

✓ WCAG AA

Accent Olive

#819337

Main brand color for headers, navigation, success states, positive metrics. Represents growth, balance, and natural intelligence.

✓ WCAG AA

Color Psychology

Lavender: Calming yet modern. Breaks from traditional government blues while maintaining professionalism. Creates approachable AI platform feel

without appearing frivolous.

Olive Green: Grounded and trustworthy. Natural green holds universally positive connotations in Indian culture. Balances tech-forward lavender with organic stability.

Semantic Colors

Success Green

7FD798

Grade A, excellent performance,

completed states, positive outcomes.

Warning

#FFD19C

Grade B/C, moderate risk, areas

needing attention and review.

Critical Red

#F55E5E

Grade D/F, critical issues, errors

requiring immediate attention.

Information Blue

#7C7EEB

Informational alerts, neutral highlights.

Derived from primary lavender.

Neutral Palette

Gray 50

#F8F7F7

Page backgrounds, light surfaces

Gray 100

#F3F4F6

Card backgrounds, hover states

Gray 300

#D1D5DB

Borders, dividers

Gray 500

#6B7280

Secondary text, icons

Gray 900

#262626

Primary text, headings

Component Examples

Button Styles

Evaluate App

View Results

Learn More

Grade Badges

Pass

Fail

Medium

Alert Messages

Info: Your evaluation is processing. Estimated time: 15 minutes.

Success: Evaluation completed! No critical issues found.

Neutral Palette

Gray 50

#F8F7F7

Page backgrounds, light surfaces

Gray 100

#F3F4F6

Card backgrounds, hover states

Gray 300

#D1D5DB

Borders, dividers

Gray 500

#6B7280

Secondary text, icons

Gray 900

#262626

Primary text, headings

Neutral Palette

Gray 50

#F8F7F7

Page backgrounds, light surfaces

Gray 100

#F3F4F6

Card backgrounds, hover states

Gray 300

#D1D5DB

Borders, dividers

Gray 500

#6B7280

Secondary text, icons

Gray 900

#262626

Primary text, headings

Component Examples

Button Styles

Evaluate App

View Results

Learn More

Grade Badges

Pass

Fail

Medium

Alert Messages

Info: Your evaluation is processing. Estimated time: 15 minutes.

Success: Evaluation completed! No critical issues found.

Accessibility Standards

All color combinations in this palette have been tested for WCAG 2.1 compliance:

✓ Lavender #A7ABF6 on White: Contrast ratio 4.6:1 (AA Large Text)

✓ Olive #819337 on White: Contrast ratio 4.8:1 (AA)

✓ Gray 900 on White: Contrast ratio 16.1:1 (AAA)

All primary text uses high-contrast neutrals for maximum readability.

Accessibility Standards

All color combinations in this palette have been tested for WCAG 2.1 compliance:

✓ Lavender #A7ABF6 on White: Contrast ratio 4.6:1 (AA Large Text)

✓ Olive #819337 on White: Contrast ratio 4.8:1 (AA)

✓ Gray 900 on White: Contrast ratio 16.1:1 (AAA)

✓ Success/Warning/Error colors meet AA standards

All primary text uses high-contrast neutrals for maximum readability.

Design Rationale

Accessibility Standards

Why this palette works for Track LLM:

Modern yet Trustworthy: Lavender breaks from traditional government blues while maintaining institutional credibility
Balanced Contrast: The cool lavender + warm olive creates visual interest without overwhelming
Accessible: All color combinations meet WCAG AA standards for readability
AI-Forward: Purple/lavender hues are associated with innovation and technology without being cliché
Calming Interface: Softer colors reduce anxiety when dealing with AI safety issues

Usage Distribution:

60% Neutral grays (structure, backgrounds)
25% Primary Lavender (navigation, headers, primary actions)
10% Accent Olive (CTAs, positive highlights)
5% Semantic colors (alerts, grades)

Why this palette works for Track LLM:

Modern yet Trustworthy: Lavender breaks from traditional government blues while maintaining institutional credibility
Balanced Contrast: The cool lavender + warm olive creates visual interest without overwhelming
Accessible: All color combinations meet WCAG AA standards for readability
AI-Forward: Purple/lavender hues are associated with innovation and technology without being cliché
Calming Interface: Softer colors reduce anxiety when dealing with AI safety issues

Usage Distribution:

60% Neutral grays (structure, backgrounds)
25% Primary Lavender (navigation, headers, primary actions)
10% Accent Olive (CTAs, positive highlights)
5% Semantic colors (alerts, grades)

All color combinations in this palette have been tested for WCAG 2.1 compliance:

✓ Lavender #A7ABF6 on White: Contrast ratio 4.6:1 (AA Large Text)

✓ Olive #819337 on White: Contrast ratio 4.8:1 (AA)

✓ Gray 900 on White: Contrast ratio 16.1:1 (AAA)

✓ Success/Warning/Error colors meet AA standards

All primary text uses high-contrast neutrals for maximum readability.

Next Project

Previous Project

High -Fidelity Prototype

Design Rationale

Why this palette works for Track LLM:

Modern yet Trustworthy: Lavender breaks from traditional government blues while maintaining institutional credibility
Balanced Contrast: The cool lavender + warm olive creates visual interest without overwhelming
Culturally Appropriate: Green has universally positive associations in Indian culture (growth, prosperity, nature)
Accessible: All color combinations meet WCAG AA standards for readability
AI-Forward: Purple/lavender hues are associated with innovation and technology without being cliché
Calming Interface: Softer colors reduce anxiety when dealing with AI safety issues

Track LLM

Track LLM

Track LLM

Challenge

Challenge

Challenge

Problem

Problem

Problem

Opportunity

Opportunity

Opportunity

Discovery & Research Phase

Discovery & Research Phase

Discovery & Research Phase

Problem Space

Problem Space

Competitor Analysis

Competitor Analysis

Competitor Analysis

Key Research Findings

Key Research Findings

Key Research Findings

User Personas

User Personas

User Personas

User Needs

User Needs

User Needs

Information Architecture

Information Architecture

Information Architecture

System Architecture

System Architecture

System Architecture

User Flow Mapping

User Flow Mapping

User Flow Mapping

Low-fidelity Design

Low-fidelity Design

Low-fidelity Design

Design System

Design System

Track LLM Color System

Accessibility Standards

Design Rationale

Design Rationale

Design Rationale

Accessibility Standards

High -Fidelity Prototype

Design Rationale

Like what you see?

Like what you see?

Like what you see?

Like what you see?

Like what you see?