As a QA practitioner, I have always been deeply interested in understanding systems as a
whole — not just the test surface, but the architecture behind it. That habit of reading
systems end-to-end has naturally extended into an architectural perspective: how services
are composed, where failure domains live, and how design decisions upstream shape what
quality looks like downstream. The projects below reflect both sides of that lens —
how I test systems, and how I think about building them.
System-Wide Analysis
Risk-Based Coverage
Shift-Left & CI/CD
Contract & Integration Testing
Observability-First
AI/GenAI Validation
Regulatory Compliance
Architecture Thinking
AI/ML Odyssey
A documented learning journey from QA Lead to AI/ML Engineer — built in public.
Covers classical ML, deep learning, NLP, and MLOps through a mix of self-written
code and vibe-coded experiments. Each session is logged, every concept noted in
plain language, and every mistake kept. Structured across 8 modules with weekly
journal entries. Currently active: Module 01 — Python for ML (8 exercises + capstone).
Python
PyTorch
Scikit-learn
NLP
MLOps
Vibe Coding
In Progress
MCP Implementation Patterns
Comparative study of four Model Context Protocol (MCP) server architectures demonstrating
how server design — not model choice — determines output quality. Each implementation
exposes the same HR domain to the same model: Flat Tools (unstructured strings, high
hallucination risk), Resource Injection (typed JSON + pre-loaded schema resources, low
hallucination), Prompt Templates (server-side chain-of-thought templates, guaranteed
output structure), and Stateful Memory (session store enabling multi-step reasoning with
context carry-over). Includes benchmark client that scores each pattern on completeness,
format consistency, and token cost. Separate git branch per pattern; main branch contains
comparison matrix and decision guide.
Python
MCP
FastMCP
Anthropic API
AI Architecture
Tool Design
Prompt Engineering
RAG Implementation
Fully local, containerised Retrieval-Augmented Generation system — no API keys required.
Upload .txt or .pdf documents, ask natural-language questions, and compare RAG-grounded
answers against the same LLM answering from memory alone. FastAPI backend with
paragraph-aware chunking, Qdrant vector store (cosine similarity), and Ollama serving
both the embedding model (Nomic Embed Text) and generation model (Llama 3.2). Streamlit
UI shows retrieved chunks with similarity scores and optional raw prompt view.
Python
FastAPI
Qdrant
Ollama
Streamlit
Llama 3.2
Nomic Embed Text
Docker Compose
LLM Eval Toolkit
Modular Python toolkit for evaluating large language models in production.
Covers faithfulness, answer relevance, context precision, and hallucination detection
with RAGAS and DeepEval backends. Designed for CI/CD integration and enterprise
RAG pipeline validation.
Python
RAGAS
DeepEval
LangChain
pytest
GitHub Actions
Playwright Enterprise Framework
Production-grade Playwright framework with TypeScript, Page Object Model, BrowserStack
cross-browser matrix, Azure DevOps multi-stage pipeline, and Allure reporting.
Includes custom fixtures, API testing suite, and reusable pipeline templates.
Playwright
TypeScript
BrowserStack
Azure DevOps
Allure
Python API Automation Framework
Production-grade backend API test framework (PyAPIElite) supporting REST, GraphQL,
SOAP, gRPC, and Contract testing. Features AI agent output validation via Arize
Phoenix Evals — LLM-as-judge evaluation for hallucination, relevance, QA correctness,
and toxicity across 9 test cases. Allure reporting, Docker, Azure Pipelines CI/CD.
Python
pytest
Arize Phoenix
LLM Evals
REST / gRPC
Allure
Docker
Auth Testing Framework
Comprehensive test framework covering all major enterprise authentication and
authorisation protocols — LDAP/AD, OAuth 2.0/OIDC, JWT, SAML 2.0, TACACS+,
RADIUS/EAP, MFA/TOTP, RBAC, and IDOR. Mocked servers, security attack vector
tests, and 9 Mermaid reference diagrams.
Python
LDAP/AD
OAuth 2.0
JWT
SAML 2.0
RBAC
pytest
Allure
QA System Case Studies
A living reference of how I approach testing real-world systems — banking platforms,
AI/ML pipelines, microservices, and healthcare applications. Each case study covers
system analysis, risk identification, test strategy design, and observability.
Updated as new AI applications are built and shipped.
Test Strategy
Risk Analysis
Systems Thinking
FinTech
AI/ML
Healthcare
Microservices
k6 Performance Testing + Prometheus
Production-grade k6 load testing framework with a full observability stack.
k6 pushes metrics to Prometheus via remote-write in real time; Grafana displays
VU ramp, p50/p95/p99 latency, error rate, and API container CPU/memory from
cAdvisor — all in a pre-provisioned dashboard. API instrumented with prom-client
for per-route duration histograms and Node.js runtime metrics.
k6
Prometheus
Grafana
cAdvisor
Docker Compose
Node.js
prom-client
Architecture Diagrams
Reference architecture diagrams for Contact Centre as a Service across GCP and Azure —
multiple configurations covering Dialogflow CX + Genesys, Vertex AI Agent Builder,
Azure Communication Services + OpenAI, Teams Direct Routing, hybrid multi-cloud,
and high-availability patterns. Built from the architectural lens that QA thinking develops.
GCP
Azure
Dialogflow CX
Vertex AI
Genesys Cloud
Azure OpenAI
CCaaS
Architecture
Perf Bottleneck Runbook
A multi-environment performance investigation handbook spanning Linux/eBPF, Kubernetes,
Mobile (Android + iOS), and Database layers — combining operational runbooks, bpftrace
scripts, and tool decision trees with USE, RED, and Four Golden Signals methodology
baked into every investigation phase. Built as a practitioner reference for teams who
need to go from symptom to root cause without guessing at tooling.
eBPF/bpftrace
BCC Toolkit
Perfetto
Instruments
async-profiler
Pixie
Parca
OpenTelemetry
k6
pg_stat_statements
USE Method
RED Method
Flame Graphs
Kubernetes
Prometheus
Grafana