Datenanalyse mit Chat-Ansatz in SAP Analytics Cloud

Introduction

In today’s data-driven enterprises, the ability to interact with complex analytics systems through natural language is rapidly becoming a strategic differentiator. Traditional BI platforms like SAP Analytics Cloud (SAC) excel at delivering rich visualizations and pre- built reports, but they often require users to navigate menus, configure filters, and interpret charts manually. By embedding a conversational interface directly within SAC, organizations can empower business users—analysts, managers, and executives alike—to ask questions in plain English (or any supported language) and receive immediate, contextualized insights without leaving their story page. For our proof-of-concept, we’ve built an ASGI-based chat server using Django, Channels, and Daphne, exposing it securely to SAC via an ngrok tunnel configured with Basic Auth. This local development setup allows us to iterate quickly on streaming chat flows against real BW data without deploying to a remote cloud environment. In this series, we’ll demonstrate how to:

Leverage Django (with Channels) and Daphne to handle high-throughput WebSocket connections and stream LLM responses in real time.
Prototype a lightweight front-end using plain HTML and JavaScript, served locally over ngrok, to validate the conversational flow.
Transform that prototype into a fully supported SAC custom widget, so administrators can drop the chat interface directly into any story or dashboard.
Secure communication between SAC, our Django server (via ngrok with Basic Auth), and the chosen LLM provider using enterprise-grade authentication and token exchange.
Handle streaming chat scenarios with low latency, graceful error handling, and user-friendly fallbacks. By the end of this series, you’ll have a robust blueprint for delivering conversational analytics within SAP Analytics Cloud—prototype locally with Django and Daphne, expose it securely via ngrok, and embed it seamlessly into your SAC stories.

1. Setting up a Basic ASGI Server with Django, Channels, and Daphne

To power real-time chat, we need an asynchronous server that speaks WebSocket and can stream data. Django Channels and Daphne provide this out of the box.

1.1 Install Required Packages

pip install django channels channels_redis daphne

django: the web framework.
channels: adds ASGI support and routing for WebSocket.
channels_redis: a production-ready channel layer using Redis (for development you can omit and use InMemoryChannelLayer).
daphne: ASGI server that serves your Django project.

1.2 Update Django Settings

In settings.py, enable Channels and configure a channel layer:

INSTALLED_APPS = [
    # ... existing apps ...
    'channels',
    'mychatapp', # your app containing consumers
]

# Use Redis in production; for PoC you can use InMemoryChannelLayer
CHANNEL_LAYERS = {
    'default': {
        'BACKEND': 'channels.layers.InMemoryChannelLayer',
    },
}

# Tell Django to use Channels’ ASGI application
ASGI_APPLICATION = 'myproject.asgi.application'

1.3 Define the WebSocket Consumer (consumer.py)

Create mychatapp/consumer.py:

import json
from channels.generic.websocket import AsyncWebsocketConsumer
from .ai_service import chat # placeholder for your AI integration

class ChatConsumer(AsyncWebsocketConsumer):
    async def connect(self):
        self.room_name = 'global_chat'
        self.room_group = f'chat_{self.room_name}'

        # Join the channel group
        await self.channel_layer.group_add(
            self.room_group,
            self.channel_name
        )
        await self.accept()

    async def disconnect(self, close_code):
        # Leave the channel group
        await self.channel_layer.group_discard(
            self.room_group,
            self.channel_name
        )

    async def receive(self, text_data):
        # Parse incoming JSON message
        text_data_json = json.loads(text_data)
        message = text_data_json.get('message', '')

        # Delegate to AI service, streaming responses via self.send
        await chat(message, self.send)

    # Handler for messages sent to the group (if you broadcast)
    async def chat_message(self, event):
        await self.send(text_data=json.dumps(event))

Key points:

connect(): called when a WebSocket is opened. We accept and join a channel group for broadcast.
receive(): handles incoming text messages, invokes chat() to process and stream AI responses.
chat_message(): optional handler if you broadcast messages to multiple clients.

1.4 Define WebSocket Routing (routing.py)

In mychatapp/routing.py, map the WebSocket path to your consumer:

from django.urls import re_path
from .consumer import ChatConsumer

websocket_urlpatterns = [
re_path(r'ws/chat/$', ChatConsumer.as_asgi()),
]

Then include this in your project-level asgi.py (next section).

1.5 Configure the ASGI Application (asgi.py)

Create or update myproject/asgi.py:

import os
from channels.auth import AuthMiddlewareStack
from channels.routing import ProtocolTypeRouter, URLRouter
from django.core.asgi import get_asgi_application
import mychatapp.routing

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myproject.settings')

# Standard Django ASGI application for HTTP
django_asgi_app = get_asgi_application()

application = ProtocolTypeRouter({
    # Route HTTP requests to the Django ASGI app
    'http': django_asgi_app,
    # Route WebSocket requests to Channels
    'websocket': AuthMiddlewareStack(
        URLRouter(
            mychatapp.routing.websocket_urlpatterns
        )
    ),
})

This tells Daphne how to dispatch HTTP vs. WebSocket traffic.

1.6 Placeholder AI Service (ai_service.py)

Under mychatapp/ai_service.py, include a minimal stub:

import asyncio

async def chat(message: str, sender):
    """
    Placeholder: echo back the input string one character at a time.
    Replace this with actual LLM integration, streaming tokens.
    """
    for char in message:
        await asyncio.sleep(0.05)
        # send each character as a delta
        await sender({ 'delta': char })

This stub simulates streaming behavior; when you integrate your LLM, replace its logic with HTTP or SDK calls that yield partial results.

With these components in place, you can start your dev server and Daphne, then connect a WebSocket client at ws://localhost:8000/ws/chat/ to test streaming functionality. In the next section, we’ll build a simple HTML/JavaScript front-end to validate this chat flow before moving into SAC widget development.

2. Developer Frontend Prototype with HTML, Tailwind, and WebSocket

Before embedding into SAC, we build a sandboxed HTML/JS page to validate streaming chat behavior.

2.1 HTML Structure and Styling

<!DOCTYPE html> <html lang="en"> <head>  <script src="https://cdn.tailwindcss.com"></script> <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>  <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/styles/github.min.css"> <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/highlight.min.js"></script> </head> <body class="bg-gray-100 flex items-center justify-center h-screen"> <div class="w-full max-w-md bg-white rounded-lg shadow-lg flex flex-col h-4/5">  <header class="p-4 border-b"> <h1 class="text-xl font-semibold text-gray-700">Chat Room</h1> </header>  <div id="messages" class="flex-1 p-4 overflow-y-auto space-y-2"></div>  <div class="p-4 border-t flex"> <input id="input" class="flex-1 px-3 py-2 border rounded-l-lg" /> <button id="send" class="px-4 bg-indigo-500 text-white rounded-r-lg disabled:bg-gray-400" disabled>Send</button> </div> </div> <script> const socket = new WebSocket(`ws://${location.host}/ws/chat/`); const messagesEl = document.getElementById('messages'); const inputEl = document.getElementById('input'); const sendBtn = document.getElementById('send'); let currentDiv, currentContent = '', typingIndicator; // Enable send button on open socket.addEventListener('open', () => sendBtn.disabled = false); socket.addEventListener('close', () => sendBtn.disabled = true); sendBtn.addEventListener('click', () => { const msg = inputEl.value.trim(); if (!msg) return; socket.send(JSON.stringify({ message: msg })); appendMessage('You: ' + msg, true); inputEl.value = ''; }); socket.onmessage = ({ data }) => { const msg = JSON.parse(data); switch (msg.type) { case 'status': showStatus(msg.message); break; case 'token': appendAiToken(msg.token); break; case 'done': finalizeResponse(); break; case 'error': appendSystem('Error: ' + msg.message); } }; function appendMessage(text, outgoing) { const div = document.createElement('div'); div.textContent = text; div.className = outgoing ? 'self-end bg-indigo-100' : 'self-start bg-gray-200'; messagesEl.appendChild(div); messagesEl.scrollTop = messagesEl.scrollHeight; } // ...additional helper functions for status, tokens, markdown, etc... </script> </body> </html>

Explanation:

Tailwind & Marked.js: We include Tailwind via CDN for quick styling, and Marked.js plus Highlight.js for rendering and syntax-highlighting Markdown content.
WebSocket Initialization: new WebSocket(...) connects to /ws/chat/ on our ASGI server.
Event Listeners (open, close, onmessage): Manage UI state (enable/disable send button, show errors) and route incoming messages by msg.type.
Message Flow:

User clicks Send → message is sent as JSON.
Server streams back partial results with { type: 'token', token: '...' }; UI appends tokens in a markdown-supported bubble.
A final { type: 'done' } signals end of stream.

2.2 Defining the Corresponding Django View

To serve this HTML prototype within Django (e.g., on /chat/), add a simple view and URL:

# views.py
from django.shortcuts import render

def chat_view(request):
return render(request, 'chat/index.html')

# urls.py
from django.urls import path
from .views import chat_view

urlpatterns = [
path('chat/', chat_view, name='chat'),
]

Place the HTML file at templates/chat/index.html.
Ensure TEMPLATES in py includes the app’s templates directory.

This view delivers our front-end prototype so you can browse to http://localhost:8000/chat/ (or over ngrok) to test the chat UI.

With the prototype running end-to-end, you can validate WebSocket connectivity, UI responsiveness, and streaming behavior before packaging into an SAC widget. In the next section, we’ll convert this prototype into a custom widget for SAP Analytics Cloud. ## 3. Converting the Frontend into an SAC Custom Widget

SAP Analytics Cloud (SAC) supports a Custom Widget framework that lets you extend the set of out-of-the-box components with your own Web Components—HTML/CSS/JavaScript bundles that integrate seamlessly into the story canvas (help.sap.com). Under the hood, SAC loads each widget in its own shadow DOM, ensuring style isolation and lifecycle hooks like onCustomWidgetAfterUpdate and onCustomWidgetResize for reactivity and sizing (help.sap.com).

To transform our standalone HTML/JS prototype into an SAC widget, we make the following key changes:

Wrap in a Web Component: Define a <sap-custom-streaming-chat> class extending HTMLElement. We attach a shadow root in the constructor and clone a <template> that contains our markup and scoped styles.
Lazy-load dependencies: Instead of global <script> tags, we dynamically load Marked.js and Highlight.js inside onCustomWidgetAfterUpdate(), ensuring they’re only fetched once per widget instance.
Configuration methods: Expose widget properties—setWebSocketUrl(url) and setRestUrl(url)—that SAC will call based on JSON-configured URLs. These methods initialize the socket or REST mode, apply Basic Auth (for ngrok), and wire up send/input handlers.
Lifecycle hooks: Use onCustomWidgetAfterUpdate() to kick off or reconnect streams whenever widget properties change, and onCustomWidgetResize() to let the flexbox layout naturally adapt to container size.
Unified messaging logic: Consolidate the send/receive and rendering code into helper methods (_sendMessage(), _appendAiToken(), _renderUserMessage(), _system()) that work identically across both streaming (WebSocket) and non‑streaming (REST) modes.

Below is the resulting widget JavaScript file, ready to be referenced in your widget’s manifest.json alongside a minimal JSON descriptor and optional styling/builder scripts.

(function() {
// ——— Helpers to load scripts & styles —————————————————————————
async function loadScript(url) {
    return new Promise((res, rej) => {
      const s = document.createElement('script');
      s.src = url;
      s.onload = res;
      s.onerror = rej;
      document.head.appendChild(s);
    });
}
async function loadStyle(url, shadowRoot) {
    const l = document.createElement('link');
    l.rel = 'stylesheet';
    l.href = url;
    shadowRoot.appendChild(l);
}

// ——— Template ———————————————————————————————————————————————
const template = document.createElement('template');
template.innerHTML = `
    <style>
      :host { display:block; height:100%; font-family:sans-serif; }
      #container { display:flex; flex-direction:column; width:100%; height:100%; }
      .header { padding:1rem; border-bottom:1px solid #ddd; font-weight:600; }
      #messages {
        flex:1; display:flex; flex-direction:column;
        gap:0.5rem; padding:1rem; overflow-y:auto;
      }
      .input-area { display:flex; border-top:1px solid #ddd; }
      .input-area input {
        flex:1; padding:0.5rem; border:1px solid #ccc;
        border-right:none; border-radius:0 0.25rem 0.25rem 0;
      }
      .input-area button {
        padding:0 .75rem; background:#007cc0; color:#fff;
        border:none; cursor:pointer; border-radius:0 0.25rem 0.25rem 0;
      }
      .input-area button:disabled { background:#999; cursor:default; }
      .message {
        max-width:80%; padding:0.5rem; border-radius:0.5rem;
      }
      .message.you   { align-self:flex-end; background:#e0f7ff; color:#005f8a; }
      .message.peer { align-self:flex-start; background:#f3f3f3; color:#333; }
      .system { text-align:center; font-size:0.85rem; color:#666; }
      .markdown-content { font-size:0.9rem; line-height:1.3; }
    </style>
    <div id="container">
      <div class="header">Chat Room</div>
      <div id="messages"></div>
      <div class="input-area">
        <input id="input" type="text" placeholder="Type a message…" />
        <button id="send" disabled>Send</button>
      </div>
    </div>
`;

// ——— Element Definition —————————————————————————————————————
class Main extends HTMLElement {
    constructor() {
      super();
      this.attachShadow({ mode: 'open' })
          .appendChild(template.content.cloneNode(true));

      // container refs
      this.messagesEl = this.shadowRoot.getElementById('messages');
      this.inputEl    = this.shadowRoot.getElementById('input');
      this.sendBtn    = this.shadowRoot.getElementById('send');

      // streaming state
      this.socket               = null;
      this.currentResponseDiv   = null;
      this.currentResponseText = '';
      this.typingIndicator      = null;
      this._depsLoaded          = false;
      this.mode       = 'socket';
      this.password   = 'cstartup';
    }

    // Called when SAC properties change
    async onCustomWidgetAfterUpdate() {
      // Load marked & highlight.js once
      if (!this._depsLoaded) {
        await loadScript('https://cdn.jsdelivr.net/npm/marked/marked.min.js');
        await loadStyle(
          'https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/styles/github.min.css',
          this.shadowRoot
        );
        await loadScript('https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/highlight.min.js');

        marked.setOptions({
          gfm: true, breaks: true, headerIds: true,
          mangle: false, sanitize: false,
          smartLists: true, smartypants: true,
          highlight: (code, lang) =>
            lang && hljs.getLanguage(lang)
              ? hljs.highlight(code, { language: lang }).value
              : hljs.highlightAuto(code).value
        });
        this._depsLoaded = true;
      }

      // After dependencies are loaded, reconnect if in socket mode
      if (this.mode === 'socket' && !this.socket) {
        this._connect(this.restUrl.replace('/api/generate', ''));
      }
    }

    onCustomWidgetResize() {
      // Flex layout auto‑resizes
    }

    setRestUrl(url) {
      this.restUrl = url + "/api/generate";
      this.restAuth = "Basic " + btoa("caleodev:" + this.password);
      this.mode = 'rest';
      this.sendBtn.disabled = false;
    }

    setWebSocketUrl(url) {
      this.mode = 'socket';
      this._connect(url + "/ws/chat/");
    }

    _connect(url) {
      if (this.socket) this.socket.close();
      this.socket = new WebSocket(url);

      this.socket.addEventListener('open', () => {
        this.sendBtn.disabled = false;
        this._system('Connected');
      });
      this.socket.addEventListener('close', () => {
        this.sendBtn.disabled = true;
        this._system('Connection closed');
      });
      this.socket.addEventListener('error', () => {
        this._system('WebSocket error');
      });
      this.socket.addEventListener('message', ({ data }) => {
        const msg = JSON.parse(data);
        switch (msg.type) {
          case 'status': this._showStatus(msg.message);            break;
          case 'token': this._appendAiToken(msg.token);           break;
          case 'done':   this._finalizeAiResponse();               break;
          case 'error': this._system(`Error: ${msg.message}`);    break;
        }
      });
    }

    _sendMessage() { /* omitted for brevity */ }
    _appendAiToken() { /* ... */ }
    _finalizeAiResponse() { /* ... */ }
    _renderUserMessage() { /* ... */ }
    _system() { /* ... */ }
}

customElements.define('sap-custom-streaming-chat', Main);
})();

4. Exposing the Local ASGI Server via ngrok with Basic Auth

For rapid proof-of-concept development, we use ngrok to tunnel the local Daphne/Channels server to a public URL that SAC can consume. To secure this endpoint, we enable ngrok’s built-in HTTP Basic Authentication, ensuring only authorized users (your SAC widget) can connect.

4.1 Install and Authenticate ngrok

Download ngrok for your OS from ngrok.com.
Install by unzipping or placing the binary in your PATH.
Authenticate your ngrok client with your authtoken (found in your ngrok dashboard):

ngrok config add-authtoken <YOUR_NGROK_AUTHTOKEN>

4.2 Create an ngrok Configuration File

Instead of passing flags on the command line, define a reusable configuration in ~/.ngrok2/ngrok.yml:

version: "2"
# Define a tunnel named "sac-chat"
tunnels:
sac-chat:
    proto: http
    addr: 8000            # local Daphne server port
    auth: "caleodev:cstartup" # Basic Auth: username: caleodev, password: cstartup
    host_header: "rewrite"

# Optional: use a custom subdomain if you have ngrok paid plan
#    subdomain: sac-chat-demo

proto: protocol (http for both HTTP and WebSocket over HTTP).
addr: local port (where Daphne is listening).
auth: username:password for Basic Auth on every HTTP/WebSocket request.
host_header: rewrites the Host header to match your local server if needed.
subdomain (paid feature): reserve a custom URL like https://sac-chat-demo.ngrok.io.

4.3 Start the Tunnel

ngrok start sac-chat

This outputs something like:

Session Status                online
Account                       Your Name (Plan: Free)
Version                       3.x.x
Region                        United States (us)

Tunnel "sac-chat" -> http://localhost:8000
Forwarding                    https://abcd1234ngrok.io -> http://localhost:8000

HTTP Requests
------------
GET /                          200 OK
...

Your public URL is https://abcd1234ngrok.io.
Because of the auth setting, any attempt to open https://abcd1234ngrok.io requires Basic Auth credentials (caleodev / cstartup).

4.4 Configure the SAC Widget

In your SAC Custom Widget manifest.json, define a property for the WebSocket URL. In the Story Designer, administrators can set this property to your ngrok URL:

"resources": {
"client": { "js": ["widget.js"] }
},
"sap.custom": {
"properties": [
    {
      "name": "webSocketUrl",
      "type": { "type": "String" },
      "defaultValue": "https://abcd1234ngrok.io"
    }
]
}

Then in the widget’s Main class, SAC will invoke:

// Called by SAC after reading story properties
widget.setWebSocketUrl(widget.properties.webSocketUrl);

Behind the scenes, the widget:

Extracts the Basic Auth credentials (hard-coded or managed via widget settings).
Initiates a WebSocket to:

new WebSocket(
`wss://${host.replace(/^https?:\/\//, '')}/ws/chat/`,
[],
{
headers: { Authorization: 'Basic ' + btoa('caleodev:cstartup') }
}
);

The browser prompts for credentials if not supplied in the header, or silently authenticates if Authorizationis set.

Note: For production, replace Basic Auth with a secure token exchange (OAuth2/OIDC via SAP BTP XSUAA) and point to a stable domain or custom subdomain.

This ngrok setup lets SAC developers and testers connect to your local chat server securely—no cloud deployment needed until your proof of concept is solid. In the next section, we’ll demonstrate end-to-end chat streaming within SAC, including SSO and token propagation for BW data access. ## 5. Building the AI Service with Together API

The service.py (or ai_service.py) module encapsulates the logic for connecting to an LLM provider—Together via the huggingface_hub—and optionally performing Retrieval-Augmented Generation (RAG) with an index built through llama_index. Below is a breakdown of each part of the file and how it works:

# service.py
from __future__ import annotations
import asyncio, json, os
from huggingface_hub import InferenceClient # provider="together"
from llama_index.core import VectorStoreIndex
from chat.vector_store import build_index # your helper
from llama_index.llms.together import TogetherLLM

# ───────────────────────── globals ──────────────────────────
_INDEX: Optional[VectorStoreIndex] = None

Imports:

asyncio, json, os: standard libraries for async flow, JSON handling, and environment variables.
InferenceClient from huggingface_hub: used for streaming chat completions via Together’s API.
VectorStoreIndex and TogetherLLM from llama_index: support RAG workflows if you choose to query a vector index.
build_index: your custom helper that builds a VectorStoreIndex over your BW/extracted documents.

Global _INDEX: a singleton for caching the vector index across calls, preventing rebuilds on every request.

def _get_llm() -> TogetherLLM:
    key = os.getenv("TOGETHER_API_KEY")
    # fallback to local secrets.txt if env var missing
    if not key:
        try:
            with open("secrets.txt", "r", encoding="utf-8") as f:
                key = f.readline().strip()
        except FileNotFoundError:
            pass
    if not key:
        raise EnvironmentError("TOGETHER_API_KEY not set")
    return TogetherLLM(api_key=key, model="meta-llama/Llama-3.3-70B-Instruct-Turbo")

_get_llm():

Retrieves the Together API key from an environment variable (TOGETHER_API_KEY), falling back to reading the first line of txt for local development.
Throws an error if no key is found.
Returns a configured TogetherLLM instance, specifying your chosen model (e.g., meta-llama/Llama-3.3-70B-Instruct-Turbo).

def _get_index():
    global _INDEX
    if _INDEX is None:
        _INDEX = build_index().as_query_engine(
            llm=_get_llm(),
            response_mode="tree_summarize",
        )
    return _INDEX

_get_index():

Lazily builds and caches a VectorStoreIndex on first call via your build_index()
Wraps it in a query engine that uses _get_llm() for summarization and retrieval.
Enables you to perform semantic searches over your document corpus for RAG.

def _get_client() -> InferenceClient:
    token = os.getenv("TOGETHER_API_KEY")
    if not token:
        with open("secrets.txt") as f:
            token = f.readline().strip()
    if not token:
        raise EnvironmentError("TOGETHER_API_KEY not set")
    return InferenceClient(provider="together", api_key=token)

_get_client():

Similar key-retrieval logic to _get_llm(), but returns a low-level InferenceClient for direct streaming via huggingface_hub.

async def chat(message: str, send_func):
    """
    Streaming RAG chat endpoint.
    • message: user query
    • send_func: async callable that takes a JSON string
    """
    # 1) Send typing status to client UI
    await send_func(json.dumps({"type": "status", "message": "AI is typing…"}))

    # 2) (Optional) Build RAG context
    # context = _get_index().query(message).response

    # 3) Construct system + user messages
    system_prompt = (
        "You are an expert level SAP Consultant. Build answers using the provided context.\n"
        "Context:\n" + message + "\n\nUser Query:"
    )
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": message},
    ]

    # 4) Initialize streaming client
    client = _get_client()

    try:
        completion = client.chat.completions.create(
            model="deepseek-ai/DeepSeek-V3",
            messages=messages,
            max_tokens=1000,
            stream=True,
        )

        buffer = [] # collect tokens if you need to post-process
        async for chunk in _stream_tokens(completion):
            token = chunk
            buffer.append(token)
            # 5) Stream each token back to the WebSocket client
            await send_func(json.dumps({"type": "token", "token": token}))

        # 6) Signal completion
        await send_func(json.dumps({"type": "done"}))

    except Exception as e:
        # 7) Error handling
        await send_func(json.dumps({"type": "error", "message": str(e)}))

chat() coroutine:

Status: Notifies the front-end that the AI is preparing a response (shows a typing indicator).
RAG (optional): You can uncomment the _get_index().query(message) call to retrieve relevant context from your index.
Prompt Construction: Combines a custom system prompt (instructing the model to act as an expert SAP consultant) with the user’s query.
Streaming Completion: Calls chat.completions.create(..., stream=True) to get an iterable of incremental partial outputs.
Token Streaming: Iterates over _stream_tokens() to yield each token back to the UI via send_func, with type "token".
Completion Signal: Sends a final message of type "done" when the stream ends.
Error Handling: Catches exceptions and streams an "error" message to the UI.

# helper: convert sync iterator -> async tokens
async def _stream_tokens(sync_iter):
    loop = asyncio.get_event_loop()
    it = iter(sync_iter)

    def _next():
        try:
            return next(it)
        except StopIteration:
            return None

    while True:
        chunk = await loop.run_in_executor(None, _next)
        if chunk is None:
            break
        # huggingface_hub chunk format: delta content
        if hasattr(chunk.choices[0], "delta") and chunk.choices[0].delta.content:
            yield chunk.choices[0].delta.content

_stream_tokens():

Adapts the synchronous iterator returned by huggingface_hub into an async generator.
Uses run_in_executor to avoid blocking the event loop while waiting for the next chunk.
Filters only valid content fields from the API’s response.

Final Notes:

Model Selection: You can swap the model name in _get_llm() or chat.completions.create() with any supported Together API model.
Security: Never commit your API keys; use environment variables or a secure vault in production.
RAG Integration: Uncomment and adapt the RAG section to query _get_index() with your own vector store for context-rich answers.
Extensibility: You can post-process the buffer of tokens to apply filters, masks, or additional formatting before sending to clients.

With this service.py in place, your ChatConsumer can call await chat(message, self.send) and power a full streaming, RAG-enabled chatbot in SAC.

6. Summary & Outlook

Throughout this series, we’ve built a proof-of-concept Streaming Analytics Chatbot for SAP Analytics Cloud by combining:

Django, Channels & Daphne as an ASGI server backbone, handling WebSocket connections and streaming data with high concurrency.
A lightweight HTML/JavaScript prototype using Tailwind, Marked.js, and Highlight.js to validate the conversational flow before integration.
An SAC Custom Widget leveraging Web Components and shadow DOM, providing seamless embedding in SAC stories with isolation and lifecycle hooks.
ngrok with Basic Auth to expose our local server securely, enabling rapid iteration without cloud deployment.
A streaming AI service powered by the Together API via huggingface_hub, orchestrated through an async chat() coroutine with optional RAG capabilities.

This end-to-end architecture delivers:

Real-time, low-latency chat between SAC users and LLMs.
Secure connectivity that respects enterprise auth flows and role-based data masking.
Modular components that can be extended—whether swapping in different LLM providers, integrating BW queries for live data, or scaling to production on SAP BTP.

Outlook & Next Steps

Production Hardening: Replace ngrok + Basic Auth with SAP BTP Application Router and XSUAA for OAuth2-based SSO, ensuring enterprise-grade security.
Data Integration: Embed SAP BW query logic directly in the ASGI layer, dynamically retrieving live data for RAG or narrative analytics.
Role-Aware Masking: Implement middleware or prompt-layer redaction to enforce row- and column-level authorizations based on SAC user roles.
Context Management: Introduce session-based context windows, memory, or vector-based retrieval to support multi-turn dialogs.
Scalability & Observability: Containerize the stack (Django + Daphne + LLM), deploy to Kubernetes (Kyma), and integrate logging/metrics for SLAs and audit trails.

Contact us today to transform your SAP analytics into a conversational powerhouse. Let CALEO Consultinglight the path to AI-driven decision-making!