Track LLM
Track LLM
Track LLM
A government-backed evaluation framework for LLM applications ensuring safer AI deployments in India
A government-backed evaluation framework for LLM applications ensuring safer AI deployments in India
A government-backed evaluation framework for LLM applications ensuring safer AI deployments in India

Role
Role
Product Designer
Product Designer
Responsibilities
Responsibilities
Branding, Design System, Low and High fidelity prototyping Usability testing, visuals
Research- Amrita University
Branding, Design System, Low and High fidelity prototyping Usability testing, visuals
Research- Amrita University
Tools
Tools
Figma, Miro, Bolt AI, Claude AI, Chat GPT
Figma, Miro, Bolt AI, Claude AI, Chat GPT
Challenge
Challenge
Challenge
The Challenge
India faces critical AI safety risks across three vital sectors:
Healthcare: Unsafe LLM medical advice risks patient safety
Finance: Misleading AI guidance leads to exploitation of citizens
Education: Incorrect content creates learning inequities
The Challenge
India faces critical AI safety risks across three vital sectors:
Healthcare: Unsafe LLM medical advice risks patient safety
Finance: Misleading AI guidance leads to exploitation of citizens
Education: Incorrect content creates learning inequities
India faces critical AI safety risks across three vital sectors:
Healthcare: Unsafe LLM medical advice risks patient safety
Finance: Misleading AI guidance leads to exploitation of citizens
Education: Incorrect content creates learning inequities
Problem
Problem
Problem
While global governments create AI safety frameworks, existing solutions are generic and fail to capture India's diversity of languages, cultures, and socio-economic need
While global governments create AI safety frameworks, existing solutions are generic and fail to capture India's diversity of languages, cultures, and socio-economic need
Opportunity
Opportunity
Opportunity
Track LLM is a context-aware evaluation framework that enables developers, testers, and GRC teams to evaluate their LLM applications for bias, fairness, toxicity, and truthfulness, specifically within India's diverse context.
Track LLM is a context-aware evaluation framework that enables developers, testers, and GRC teams to evaluate their LLM applications for bias, fairness, toxicity, and truthfulness, specifically within India's diverse context.
Discovery & Research Phase
Discovery & Research Phase
Discovery & Research Phase
Problem Space
Problem Space
Problem Space Research
Global AI Safety Context:
Governments worldwide creating AI safety frameworks
Focus on risk assessments, bias checks, accountability
Generic solutions don't address India's unique context
India-Specific Challenges Identified:
Linguistic Diversity: 22+ scheduled languages underrepresented in LLMs
Cultural Context: Caste, religion, regional biases not addressed
Socio-Economic Disparity: Rural vs urban, economic class variations
Domain Criticality: Healthcare, Finance, Education require highest safety
Key Insight- Existing LLM evaluation tools test models (e.g., 'Is GPT-5 biased?') but don't evaluate real-world applications (e.g., 'Is my Tutoring Bot safe for Indian students?')
Problem Space Research
Global AI Safety Context:
Governments worldwide creating AI safety frameworks
Focus on risk assessments, bias checks, accountability
Generic solutions don't address India's unique context
India-Specific Challenges Identified:
Linguistic Diversity: 22+ scheduled languages underrepresented in LLMs
Cultural Context: Caste, religion, regional biases not addressed
Socio-Economic Disparity: Rural vs urban, economic class variations
Domain Criticality: Healthcare, Finance, Education require highest safety
Key Insight- Existing LLM evaluation tools test models (e.g., 'Is GPT-5 biased?') but don't evaluate real-world applications (e.g., 'Is my Tutoring Bot safe for Indian students?')
Competitor Analysis
Competitor Analysis
Competitor Analysis
Tools Analyzed:
HELM, Latitude, LangWatch, LM Eval Harness
OpenAI Evals, PromptBench, MT-Bench
Competitive Advantage Identified:
First platform to evaluate LLM applications vs. models
Domain-specific risk assessment for India
Actionable analytics that show how to fix problems
Tools Analyzed:
HELM, Latitude, LangWatch, LM Eval Harness
OpenAI Evals, PromptBench, MT-Bench
Competitive Advantage Identified:
First platform to evaluate LLM applications vs. models
Domain-specific risk assessment for India
Actionable analytics that show how to fix problems
Key Research Findings
Key Research Findings
Key Research Findings
Pain Points Discovered:
Developers:
"We don't know if our chatbot is safe to deploy"
"Generic bias tests don't catch India-specific issues"
"We need domain-specific evaluation for healthcare apps"
"Current tools are too technical for our team"
GRC Teams:
"Need compliance documentation for AI governance"
"Can't explain AI risks to non-technical stakeholders"
"Lack standardized evaluation frameworks for India"
Researchers:
"No datasets covering Indian cultural contexts"
"Western benchmarks miss caste, regional biases"
"Need reproducible evaluation methodology"
Pain Points Discovered:
Developers:
"We don't know if our chatbot is safe to deploy"
"Generic bias tests don't catch India-specific issues"
"We need domain-specific evaluation for healthcare apps"
"Current tools are too technical for our team"
GRC Teams:
"Need compliance documentation for AI governance"
"Can't explain AI risks to non-technical stakeholders"
"Lack standardized evaluation frameworks for India"
Researchers:
"No datasets covering Indian cultural contexts"
"Western benchmarks miss caste, regional biases"
"Need reproducible evaluation methodology"
User Personas
User Personas
User Personas
User Needs
User Needs
User Needs
Functional Requirements:
Connect LLM application endpoints easily
Select domain-specific evaluation tests
Run automated bias, fairness, toxicity, truthfulness checks
View detailed, interpretable results
Compare results across different tests
Access evaluation methodology documentation
Non-Functional Requirements:
Accessible to non-technical users
India-specific context awareness
Government-backed credibility
Transparent evaluation methodology
Fast evaluation turnaround (<30 mins)
Secure handling of API credentials
Functional Requirements:
Connect LLM application endpoints easily
Select domain-specific evaluation tests
Run automated bias, fairness, toxicity, truthfulness checks
View detailed, interpretable results
Compare results across different tests
Access evaluation methodology documentation
Non-Functional Requirements:
Accessible to non-technical users
India-specific context awareness
Government-backed credibility
Transparent evaluation methodology
Fast evaluation turnaround (<30 mins)
Secure handling of API credentials
Functional Requirements:
Connect LLM application endpoints easily
Select domain-specific evaluation tests
Run automated bias, fairness, toxicity, truthfulness checks
View detailed, interpretable results
Compare results across different tests
Access evaluation methodology documentation
Non-Functional Requirements:
Accessible to non-technical users
India-specific context awareness
Government-backed credibility
Transparent evaluation methodology
Fast evaluation turnaround (<30 mins)
Secure handling of API credentials
Information Architecture
Information Architecture
Information Architecture
System Architecture
System Architecture
System Architecture
Based on the technical architecture, the system has 5 core components:
Track LLM Interface - User-facing web application
Recommendation Engine - Suggests domain-specific evaluations
Evaluation Engine - Runs tests against LLM apps
Dataset Repository - Indigenous Indian datasets
Results Dashboard - Analytics and reporting
Based on the technical architecture, the system has 5 core components:
Track LLM Interface - User-facing web application
Recommendation Engine - Suggests domain-specific evaluations
Evaluation Engine - Runs tests against LLM apps
Dataset Repository - Indigenous Indian datasets
Results Dashboard - Analytics and reporting
User Flow Mapping
User Flow Mapping
User Flow Mapping

Low-fidelity Design
Low-fidelity Design
Low-fidelity Design
Design System
Design System
Track LLM Color System
A modern, accessible color palette for India's AI evaluation platform
Primary Colors
Primary Lavender
#A7ABF6
Main brand color for headers, navigation, primary actions, and key UI elements.
Represents innovation and approachability.
✓ WCAG AA
Accent Olive
#819337
Main brand color for headers, navigation, success states, positive metrics. Represents growth, balance, and natural intelligence.
✓ WCAG AA
Color Psychology
Lavender: Calming yet modern. Breaks from traditional government blues while maintaining professionalism. Creates approachable AI platform feel
without appearing frivolous.
Olive Green: Grounded and trustworthy. Natural green holds universally positive connotations in Indian culture. Balances tech-forward lavender with organic stability.
Semantic Colors
Success Green
7FD798
Grade A, excellent performance,
completed states, positive outcomes.
Warning
#FFD19C
Grade B/C, moderate risk, areas
needing attention and review.
Critical Red
#F55E5E
Grade D/F, critical issues, errors
requiring immediate attention.
Information Blue
#7C7EEB
Informational alerts, neutral highlights.
Derived from primary lavender.
Neutral Palette
Gray 50
#F8F7F7
Page backgrounds, light surfaces
Gray 100
#F3F4F6
Card backgrounds, hover states
Gray 300
#D1D5DB
Borders, dividers
Gray 500
#6B7280
Secondary text, icons
Gray 900
#262626
Primary text, headings
Component Examples
Button Styles
Evaluate App
View Results
Learn More
Grade Badges
Pass
Fail
Medium
Alert Messages
Info: Your evaluation is processing. Estimated time: 15 minutes.
Success: Evaluation completed! No critical issues found.
Neutral Palette
Gray 50
#F8F7F7
Page backgrounds, light surfaces
Gray 100
#F3F4F6
Card backgrounds, hover states
Gray 300
#D1D5DB
Borders, dividers
Gray 500
#6B7280
Secondary text, icons
Gray 900
#262626
Primary text, headings
Neutral Palette
Gray 50
#F8F7F7
Page backgrounds, light surfaces
Gray 100
#F3F4F6
Card backgrounds, hover states
Gray 300
#D1D5DB
Borders, dividers
Gray 500
#6B7280
Secondary text, icons
Gray 900
#262626
Primary text, headings
Component Examples
Button Styles
Evaluate App
View Results
Learn More
Grade Badges
Pass
Fail
Medium
Alert Messages
Info: Your evaluation is processing. Estimated time: 15 minutes.
Success: Evaluation completed! No critical issues found.
Accessibility Standards
All color combinations in this palette have been tested for WCAG 2.1 compliance:
✓ Lavender #A7ABF6 on White: Contrast ratio 4.6:1 (AA Large Text)
✓ Olive #819337 on White: Contrast ratio 4.8:1 (AA)
✓ Gray 900 on White: Contrast ratio 16.1:1 (AAA)
All primary text uses high-contrast neutrals for maximum readability.
Accessibility Standards
All color combinations in this palette have been tested for WCAG 2.1 compliance:
✓ Lavender #A7ABF6 on White: Contrast ratio 4.6:1 (AA Large Text)
✓ Olive #819337 on White: Contrast ratio 4.8:1 (AA)
✓ Gray 900 on White: Contrast ratio 16.1:1 (AAA)
✓ Success/Warning/Error colors meet AA standards
All primary text uses high-contrast neutrals for maximum readability.
Design Rationale
Design Rationale
Design Rationale
Accessibility Standards
Why this palette works for Track LLM:
Modern yet Trustworthy: Lavender breaks from traditional government blues while maintaining institutional credibility
Balanced Contrast: The cool lavender + warm olive creates visual interest without overwhelming
Accessible: All color combinations meet WCAG AA standards for readability
AI-Forward: Purple/lavender hues are associated with innovation and technology without being cliché
Calming Interface: Softer colors reduce anxiety when dealing with AI safety issues
Usage Distribution:
60% Neutral grays (structure, backgrounds)
25% Primary Lavender (navigation, headers, primary actions)
10% Accent Olive (CTAs, positive highlights)
5% Semantic colors (alerts, grades)
Why this palette works for Track LLM:
Modern yet Trustworthy: Lavender breaks from traditional government blues while maintaining institutional credibility
Balanced Contrast: The cool lavender + warm olive creates visual interest without overwhelming
Accessible: All color combinations meet WCAG AA standards for readability
AI-Forward: Purple/lavender hues are associated with innovation and technology without being cliché
Calming Interface: Softer colors reduce anxiety when dealing with AI safety issues
Usage Distribution:
60% Neutral grays (structure, backgrounds)
25% Primary Lavender (navigation, headers, primary actions)
10% Accent Olive (CTAs, positive highlights)
5% Semantic colors (alerts, grades)
All color combinations in this palette have been tested for WCAG 2.1 compliance:
✓ Lavender #A7ABF6 on White: Contrast ratio 4.6:1 (AA Large Text)
✓ Olive #819337 on White: Contrast ratio 4.8:1 (AA)
✓ Gray 900 on White: Contrast ratio 16.1:1 (AAA)
✓ Success/Warning/Error colors meet AA standards
All primary text uses high-contrast neutrals for maximum readability.
High -Fidelity Prototype
Design Rationale
Why this palette works for Track LLM:
Modern yet Trustworthy: Lavender breaks from traditional government blues while maintaining institutional credibility
Balanced Contrast: The cool lavender + warm olive creates visual interest without overwhelming
Culturally Appropriate: Green has universally positive associations in Indian culture (growth, prosperity, nature)
Accessible: All color combinations meet WCAG AA standards for readability
AI-Forward: Purple/lavender hues are associated with innovation and technology without being cliché
Calming Interface: Softer colors reduce anxiety when dealing with AI safety issues
Usage Distribution:
60% Neutral grays (structure, backgrounds)
25% Primary Lavender (navigation, headers, primary actions)
10% Accent Olive (CTAs, positive highlights)
5% Semantic colors (alerts, grades)












