Senior Staff Software Engineer ( Machine learning Platform)

Tekion · Bangalore HQ

experiencedBangalore HQPosted 7 May 2026

Tekion is hiring a Senior Staff Software Engineer ( Machine learning Platform) in Bangalore HQ.

Why This Role Matters This role powers Tekion’s AI‑native, end‑to‑end automotive platform by turning unified dealership data across DMS, CRM, Digital Retail, Service, and Payments into real‑time intelligence. You’ll operationalize a graph‑based contextual ecosystem so agents can retrieve the right context, enforce policy, and personalize experiences that drive measurable dealer outcomes. You’ll also build the resilient control layer - MCP and the LLM Gateway - that enables safe, cost‑efficient, multi‑provider LLM usage. Finally, you’ll define the standards for building, evaluating, deploying,

Responsibilities

Build and run the LLM control plane/gateway: smart routing, rate limits/quotas, failover, and token/cost tracking.

Ship a unified API and SDKs (REST/gRPC) with normalized schemas, structured outputs, caching, and full observability (traces/logs/metrics).

Enforce safety and privacy by default: content filtering, prompt/response validation, and PII redaction.

Enable multi‑model, multi‑vendor use LLMs with automated canarying and versioning.

Own the agent runtime: tool registry, permissions, function calling, grounding, and retrieval.

Design orchestration patterns (sequential, planner‑executor, streaming) and manage agent state and long‑running workflows.

Enabling platform components for training and scoring pipelines for classical ML (e.g., XGBoost/LightGBM/linear/trees) and deep models; standardize experiment tracking and packaging.

Create components to Monitor model and data drift, retraining and tuning models as needed to maintain accuracy and relevance.

Add human‑in‑the‑loop review and safe‑actioning before agents touch dealer systems.

Evolve the domain graph and entity resolution; build reliable data ingestion pipelines.

Serve real‑time context to agents (profiles, inventory, pricing, appointments, service history) with access controls and lineage.

Power retrieval with hybrid search (graph + vector + keyword) and smart cache/TTL to balance accuracy, latency, and cost.

Run continuous offline/online evaluations for quality, factuality, bias, and safety for the platform sanity.

Define SLOs for latency (p50/p95), uptime, and cost view capabilities; enable autoscaling and spend controls.

Maintain a model/agent registry, versioning, approvals, audit trails, and reproducibility; support compliances where needed.

Provide templates/CLIs, sandboxes, and docs so product teams can build and ship fast; mentor engineers and champion MLOps and AI safety best practices.

Preferred qualifications

Platform‑as‑product: obsess over developer experience, paved roads, and clear SLAs.

Thinks in systems - observability, fallback, access control are core, not afterthoughts.

Passionate about AI - enjoys enabling real-world LLM and agentic use cases.

Cost‑aware builder: you treat latency and dollars as first‑class metrics and design for graceful degradation.

Vendor‑agnostic thinker: choose the right model/provider per use case; build for portability and resilience.

Documentation and teaching: you make complex systems understandable; you uplevel teams.

About Tekion

See the company's official careers page for full details, then apply using the button below.

Apply Now

Browse more:All jobs at Tekion·Jobs in Bangalore