← denis irkhin

FlowLinker — AI in production.

FlowLinker is a sales-conversation intelligence platform: it captures calls (Teams, Meet, and via meeting bots), then extracts structured requirements and solution-fit signals from the transcript. I work across the stack, but the part I own end-to-end is the production AI and the observability that makes it trustworthy. Here's how it actually works.

the pipeline

A finished call becomes a multi-stage LLM pipeline, not a single prompt. The transcript is chunked and embedded; relevant context is retrieved with RAG over a vector store (the product catalogue and prior solution-fit signals); successive stages then turn raw conversation into a structured shape — extracted requirements, objections, and a fit assessment. Splitting the work into stages keeps each prompt small and checkable, and lets a weak stage be improved without destabilising the rest.

real-time and batch

The AI runs as three Python / FastAPI services, split by latency budget: a real-time path that surfaces guidance during a live call, a batch path that does the heavier post-call synthesis, and a dedicated vector service for retrieval. The real-time path is the hard one — it has to produce something useful inside the span of a sentence, which is a constraint on chunking, model choice, and how much context each call can afford.

models and failover

Models run on a managed LLM provider with automatic failover to a backup provider when capacity is exhausted, so a token-limit event degrades to a backup instead of an outage. I own the prompt design for the pipeline stages and the LLM observability — token usage, latency, and output quality traced end to end — because "it works on my prompt" is not the same as "it works in production under cost and rate limits."

observability, from scratch

I built the backend observability the whole system is judged by: application traces, RUM, and database monitoring with log-trace correlation across the Next.js, NestJS, and Python services, all defined as infrastructure-as-code. A request can be followed from the browser, through the API gateway and the queue, into the LLM pipeline and the database, and back — so when something is slow or wrong, the answer to "where?" takes seconds, not an afternoon. This is the differentiator: I can build the AI feature and answer "how do you know it's working?"

multi-tenant isolation

It's a SaaS, so customer data must never cross tenants. Postgres access is tenant-scoped, and I reviewed the vector path specifically — the vector-store retrieval depends on every caller passing the tenant id, so the boundary has to be enforced where it can't be bypassed, not just trusted at the edges. Finding and closing that kind of gap is exactly the work that doesn't show up in a demo.

shipping discipline

Around the AI: an LLM-based code-review bot on GitHub Actions reviewing every pull request, CI/CD, and Playwright end-to-end tests that run against live meeting bots — testing an AI-over-real-calls product means driving the real calls. I also led the frontend modernisation to Next.js 16 / React 19 / Tailwind v4 (SSR, server actions, a data-access layer), so the surface that shows all this is as current as the backend behind it.

stack

TypeScript · Turborepo + Bun · NestJS · Next.js / React / Tailwind · Python / FastAPI · PostgreSQL + Prisma · Redis-backed queues · managed auth · a vector store · managed LLM providers · from-scratch observability · infrastructure-as-code · Cloudflare · Playwright.