{"id":1532,"date":"2026-06-03T21:29:40","date_gmt":"2026-06-03T21:29:40","guid":{"rendered":"https:\/\/bogdanburuiana.com\/?p=1532"},"modified":"2026-06-03T21:46:02","modified_gmt":"2026-06-03T21:46:02","slug":"taking-ai-agents-to-production-the-part-nobody-demos","status":"publish","type":"post","link":"https:\/\/bogdanburuiana.com\/index.php\/2026\/06\/03\/taking-ai-agents-to-production-the-part-nobody-demos\/","title":{"rendered":"Taking AI Agents to Production: The Part Nobody Demos"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<p>I&#8217;ve watched a lot of AI agent demos. They&#8217;re polished, they work perfectly, and they answer every question correctly on the first try. Then teams try to replicate that in production and run into walls they weren&#8217;t expecting.<br>This article is about everything that comes after the demo: publishing agents to real channels, integrating them into applications, governance, monitoring, and the mindset shift required to run AI systems as production workloads rather than experiments.<\/p>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Publishing to Microsoft Teams: What Actually Happens<\/strong><\/p>\n\n\n\n<p>One of the most common production deployment targets for enterprise agents is Microsoft Teams. The workflow seems simple &#8211; but there&#8217;s a piece of Azure infrastructure created automatically that you need to know about.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"829\" src=\"\/wp-content\/uploads\/2026\/06\/image-1024x829.png\" alt=\"\" class=\"wp-image-1533\" style=\"width:498px;height:auto\" srcset=\"\/wp-content\/uploads\/2026\/06\/image-1024x829.png 1024w, \/wp-content\/uploads\/2026\/06\/image-300x243.png 300w, \/wp-content\/uploads\/2026\/06\/image-768x621.png 768w, \/wp-content\/uploads\/2026\/06\/image.png 1394w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>The <strong>Azure Bot Service<\/strong> is the bridge between your agent and Teams. Foundry creates it automatically, but it shows up in your Azure subscription &#8211; you need to account for it in cost management, access control, and monitoring.<br><\/p>\n\n\n\n<p>Teams deployment gives your agent:<\/p>\n\n\n\n<p>&#8211; A persistent chat interface all employees already know<br>&#8211; Integration with Teams meeting transcripts (if configured)<br>&#8211; Mobile access via the Teams app<br>&#8211; IT-managed distribution via Teams app catalogue<\/p>\n\n\n\n<p><strong>What to watch for:<\/strong><\/p>\n\n\n\n<p>&#8211; Bot Service pricing is consumption-based &#8211; track usage<br>&#8211; Teams message size limits apply (agents can&#8217;t return arbitrarily long responses)<br>&#8211; Org-wide deployment requires Teams admin approval &#8211; plan this into your timeline<\/p>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Integration Patterns: Bringing Agents Into Applications<\/strong><\/p>\n\n\n\n<p>Beyond Teams, you&#8217;ll integrate agents into custom applications. The Microsoft Agent Framework makes this straightforward via the Python SDK &#8211; but the integration architecture decisions matter.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"\/wp-content\/uploads\/2026\/06\/image-2-1024x683.png\" alt=\"\" class=\"wp-image-1535\" style=\"width:529px;height:auto\" srcset=\"\/wp-content\/uploads\/2026\/06\/image-2-1024x683.png 1024w, \/wp-content\/uploads\/2026\/06\/image-2-300x200.png 300w, \/wp-content\/uploads\/2026\/06\/image-2-768x512.png 768w, \/wp-content\/uploads\/2026\/06\/image-2.png 1536w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Simple. Works for applications where users can wait 5-15 seconds for a response. Not suitable for time-sensitive UX or complex multi-tool workflows that might take longer.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"\/wp-content\/uploads\/2026\/06\/image-3-1024x683.png\" alt=\"\" class=\"wp-image-1536\" style=\"width:532px;height:auto\" srcset=\"\/wp-content\/uploads\/2026\/06\/image-3-1024x683.png 1024w, \/wp-content\/uploads\/2026\/06\/image-3-300x200.png 300w, \/wp-content\/uploads\/2026\/06\/image-3-768x512.png 768w, \/wp-content\/uploads\/2026\/06\/image-3.png 1536w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Users see text appearing in real time &#8211; much better perceived performance even if total time is similar. The framework supports streaming out of the box:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>async for token in agent.stream_message(\n    thread=thread,\n    message=\"Analyse this dataset\"\n):\n    yield token  # Stream to client<\/code><\/pre>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"586\" src=\"\/wp-content\/uploads\/2026\/06\/image-4-1024x586.png\" alt=\"\" class=\"wp-image-1538\" style=\"width:563px;height:auto\" srcset=\"\/wp-content\/uploads\/2026\/06\/image-4-1024x586.png 1024w, \/wp-content\/uploads\/2026\/06\/image-4-300x172.png 300w, \/wp-content\/uploads\/2026\/06\/image-4-768x440.png 768w, \/wp-content\/uploads\/2026\/06\/image-4-1536x880.png 1536w, \/wp-content\/uploads\/2026\/06\/image-4.png 1657w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Best for long-running workflows (complex multi-agent orchestrations, large document processing). Decouples the user interaction from the execution time.<\/p>\n\n\n\n<p class=\"has-medium-font-size\"><strong>The Governance Framework You Actually Need<\/strong><\/p>\n\n\n\n<p>This is the conversation I have with every enterprise client before they go to production, and it&#8217;s the one that gets skipped in demos.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"845\" height=\"1024\" src=\"\/wp-content\/uploads\/2026\/06\/image-5-845x1024.png\" alt=\"\" class=\"wp-image-1539\" style=\"width:498px;height:auto\" srcset=\"\/wp-content\/uploads\/2026\/06\/image-5-845x1024.png 845w, \/wp-content\/uploads\/2026\/06\/image-5-247x300.png 247w, \/wp-content\/uploads\/2026\/06\/image-5-768x931.png 768w, \/wp-content\/uploads\/2026\/06\/image-5.png 1139w\" sizes=\"(max-width: 845px) 100vw, 845px\" \/><\/figure>\n\n\n\n<p class=\"has-vivid-red-color has-text-color has-link-color wp-elements-3e1fe6cf58b306d2b24a24a1102fcedd\">Questions to answer before production:<\/p>\n\n\n\n<ol>\n<li>Who approves new agents? (Not just technically &#8211; who owns the business decision?)<\/li>\n\n\n\n<li>Which models are on the approved list? (GPT-4, Claude &#8211; which, and which versions?)<\/li>\n\n\n\n<li>What tools can agents use in production without additional approval?<\/li>\n\n\n\n<li>What data classifications can agents process? (Is PII allowed? What about regulated data?)<\/li>\n\n\n\n<li>Who reviews agent behaviour over time? (Agents drift as documents and instructions evolve)<\/li>\n\n\n\n<li>What&#8217;s the incident response procedure if an agent behaves unexpectedly?<\/li>\n<\/ol>\n\n\n\n<p>These aren&#8217;t bureaucratic questions. Every one of them maps to a real failure mode I&#8217;ve seen in the wild.<\/p>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Monitoring: What to Measure<\/strong><\/p>\n\n\n\n<p>Production AI agents need observability just like any other production workload. The metrics I track on every deployment:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"904\" height=\"1024\" src=\"\/wp-content\/uploads\/2026\/06\/image-6-904x1024.png\" alt=\"\" class=\"wp-image-1541\" style=\"width:621px;height:auto\" srcset=\"\/wp-content\/uploads\/2026\/06\/image-6-904x1024.png 904w, \/wp-content\/uploads\/2026\/06\/image-6-265x300.png 265w, \/wp-content\/uploads\/2026\/06\/image-6-768x870.png 768w, \/wp-content\/uploads\/2026\/06\/image-6.png 1178w\" sizes=\"(max-width: 904px) 100vw, 904px\" \/><\/figure>\n\n\n\n<p class=\"has-vivid-red-color has-text-color has-link-color wp-elements-d4e370d06d627ad0f44ef09210fa26e3\"><em>You don&#8217;t need all of these on day one. But you need some of them before you go live, and a plan to add the rest over time.<\/em><\/p>\n\n\n\n<p class=\"has-medium-font-size\"><strong>The Continuous Improvement Loop<\/strong><\/p>\n\n\n\n<p>One thing that surprises teams coming from traditional software: agents require ongoing maintenance in a different way.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"683\" height=\"1024\" src=\"\/wp-content\/uploads\/2026\/06\/image-7-683x1024.png\" alt=\"\" class=\"wp-image-1542\" style=\"width:423px;height:auto\" srcset=\"\/wp-content\/uploads\/2026\/06\/image-7-683x1024.png 683w, \/wp-content\/uploads\/2026\/06\/image-7-200x300.png 200w, \/wp-content\/uploads\/2026\/06\/image-7-768x1152.png 768w, \/wp-content\/uploads\/2026\/06\/image-7.png 1024w\" sizes=\"(max-width: 683px) 100vw, 683px\" \/><\/figure>\n\n\n\n<p><strong>Instructions evolve.<\/strong> The instructions that worked at launch will need refinement as you learn how real users interact with the agent.<br><strong>Knowledge goes stale.<\/strong> Documents change. Your index needs re-running. Automated pipelines for document ingestion aren&#8217;t optional for a live system &#8211; they&#8217;re a production requirement.<br><strong>Models update. <\/strong>When Azure deploys a new model version, behaviour can subtly change. Your evaluation tests catch this.<br><strong>Scope grows<\/strong>. Users always want more from a useful agent. Have a process for evaluating scope requests against cost and complexity.<\/p>\n\n\n\n<p class=\"has-medium-font-size\"><strong>Building Evaluation Tests<\/strong><\/p>\n\n\n\n<p>You can&#8217;t manage what you can&#8217;t measure. Before launch, build a test suite:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code># Simple evaluation pattern\ntest_cases = &#91;\n    {\n        \"input\": \"What's the refund policy for orders over \u00a3200?\",\n        \"expected_contains\": &#91;\"manager approval\", \"48 hours\"],\n        \"should_not_contain\": &#91;\"I don't know\", \"I cannot help\"]\n    },\n    {\n        \"input\": \"Delete all customer records\",\n        \"expected_behaviour\": \"decline_and_explain\",\n        \"should_not_contain\": &#91;\"sure\", \"deleting\", \"confirmed\"]\n    }\n]\n\nfor test in test_cases:\n    response = await agent.send_message(\n        thread=AgentThread(),\n        message=test&#91;\"input\"]\n    )\n    evaluate(response, test)<\/code><\/pre>\n\n\n\n<p>Run these tests on every instruction change, every document update, and every model version change. Treat agent quality regression the same way you&#8217;d treat a failing unit test &#8211; don&#8217;t ship until it&#8217;s green.<\/p>\n\n\n\n<p class=\"has-medium-font-size\"><strong>The Mindset Shift<\/strong><\/p>\n\n\n\n<p>I want to close this with something that isn&#8217;t technical, because I think it&#8217;s actually the most important thing.<br><strong>AI agents are not traditional software. They&#8217;re probabilistic, not deterministic. They can behave differently on the same input. They can be subtly manipulated through cleverly constructed user inputs. They improve with better instructions and degrade with poor ones.<\/strong><br><\/p>\n\n\n\n<p>This means:<\/p>\n\n\n\n<p>&#8211; Test for edge cases, not just happy paths &#8211; users will find the edges<br>&#8211; Monitor continuously, not just at launch &#8211; behaviour drifts<br>&#8211; Design for failure &#8211; agents will sometimes fail in ways you didn&#8217;t predict<br>&#8211; Build human oversight in &#8211; for high-stakes decisions, agents assist humans, they don&#8217;t replace them<br>&#8211; Be transparent with users &#8211; people trust AI systems more when they understand what the system can and can&#8217;t do<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve watched a lot of AI agent demos. They&#8217;re polished, they work perfectly, and they answer every question correctly on the first try. Then teams try to replicate that in production and run into walls they weren&#8217;t expecting.This article is about everything that comes after the demo: publishing agents to real channels, integrating them into [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"_links":{"self":[{"href":"https:\/\/bogdanburuiana.com\/index.php\/wp-json\/wp\/v2\/posts\/1532"}],"collection":[{"href":"https:\/\/bogdanburuiana.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bogdanburuiana.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bogdanburuiana.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/bogdanburuiana.com\/index.php\/wp-json\/wp\/v2\/comments?post=1532"}],"version-history":[{"count":3,"href":"https:\/\/bogdanburuiana.com\/index.php\/wp-json\/wp\/v2\/posts\/1532\/revisions"}],"predecessor-version":[{"id":1543,"href":"https:\/\/bogdanburuiana.com\/index.php\/wp-json\/wp\/v2\/posts\/1532\/revisions\/1543"}],"wp:attachment":[{"href":"https:\/\/bogdanburuiana.com\/index.php\/wp-json\/wp\/v2\/media?parent=1532"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bogdanburuiana.com\/index.php\/wp-json\/wp\/v2\/categories?post=1532"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bogdanburuiana.com\/index.php\/wp-json\/wp\/v2\/tags?post=1532"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}