server: introduce API for serving / loading / unloading multiple models (#17470)

* server: add model management and proxy * fix compile error * does this fix windows? * fix windows build * use subprocess.h, better logging * add test * fix windows * feat: Model/Router server architecture WIP * more stable * fix unsafe pointer * also allow terminate loading model * add is_active() * refactor: Architecture improvements * tmp apply upstream fix * address most problems * address thread safety issue * address review comment * add docs (first version) * address review comment * feat: Improved UX for model information, modality interactions etc * chore: update webui build output * refactor: Use only the message data `model` property for displaying model used info * chore: update webui build output * add --models-dir param * feat: New Model Selection UX WIP * chore: update webui build output * feat: Add auto-mic setting * feat: Attachments UX improvements * implement LRU * remove default model path * better --models-dir * add env for args * address review comments * fix compile * refactor: Chat Form Submit component * ad endpoint docs * Merge remote-tracking branch 'webui/allozaur/server_model_management_v1_2' into xsn/server_model_maagement_v1_2 Co-authored-by: Aleksander <aleksander.grygier@gmail.com> * feat: Add copy to clipboard to model name in model info dialog * feat: Model unavailable UI state for model selector * feat: Chat Form Actions UI logic improvements * feat: Auto-select model from last assistant response * chore: update webui build output * expose args and exit_code in API * add note * support extra_args on loading model * allow reusing args if auto_load * typo docs * oai-compat /models endpoint * cleaner * address review comments * feat: Use `model` property for displaying the `repo/model-name` naming format * refactor: Attachments data * chore: update webui build output * refactor: Enum imports * feat: Improve Model Selector responsiveness * chore: update webui build output * refactor: Cleanup * refactor: Cleanup * refactor: Formatters * chore: update webui build output * refactor: Copy To Clipboard Icon component * chore: update webui build output * refactor: Cleanup * chore: update webui build output * refactor: UI badges * chore: update webui build output * refactor: Cleanup * refactor: Cleanup * chore: update webui build output * add --models-allow-extra-args for security * nits * add stdin_file * fix merge * fix: Retrieve lost setting after resolving merge conflict * refactor: DatabaseStore -> DatabaseService * refactor: Database, Conversations & Chat services + stores architecture improvements (WIP) * refactor: Remove redundant settings * refactor: Multi-model business logic WIP * chore: update webui build output * feat: Switching models logic for ChatForm or when regenerating messges + modality detection logic * chore: update webui build output * fix: Add `untrack` inside chat processing info data logic to prevent infinite effect * fix: Regenerate * feat: Remove redundant settigns + rearrange * fix: Audio attachments * refactor: Icons * chore: update webui build output * feat: Model management and selection features WIP * chore: update webui build output * refactor: Improve server properties management * refactor: Icons * chore: update webui build output * feat: Improve model loading/unloading status updates * chore: update webui build output * refactor: Improve API header management via utility functions * remove support for extra args * set hf_repo/docker_repo as model alias when posible * refactor: Remove ConversationsService * refactor: Chat requests abort handling * refactor: Server store * tmp webui build * refactor: Model modality handling * chore: update webui build output * refactor: Processing state reactivity * fix: UI * refactor: Services/Stores syntax + logic improvements Refactors components to access stores directly instead of using exported getter functions. This change centralizes store access and logic, simplifying component code and improving maintainability by reducing the number of exported functions and promoting direct store interaction. Removes exported getter functions from `chat.svelte.ts`, `conversations.svelte.ts`, `models.svelte.ts` and `settings.svelte.ts`. * refactor: Architecture cleanup * feat: Improve statistic badges * feat: Condition available models based on modality + better model loading strategy & UX * docs: Architecture documentation * feat: Update logic for PDF as Image * add TODO for http client * refactor: Enhance model info and attachment handling * chore: update webui build output * refactor: Components naming * chore: update webui build output * refactor: Cleanup * refactor: DRY `getAttachmentDisplayItems` function + fix UI * chore: update webui build output * fix: Modality detection improvement for text-based PDF attachments * refactor: Cleanup * docs: Add info comment * refactor: Cleanup * re * refactor: Cleanup * refactor: Cleanup * feat: Attachment logic & UI improvements * refactor: Constants * feat: Improve UI sidebar background color * chore: update webui build output * refactor: Utils imports + move types to `app.d.ts` * test: Fix Storybook mocks * chore: update webui build output * test: Update Chat Form UI tests * refactor: Tooltip Provider from core layout * refactor: Tests to separate location * decouple server_models from server_routes * test: Move demo test to tests/server * refactor: Remove redundant method * chore: update webui build output * also route anthropic endpoints * fix duplicated arg * fix invalid ptr to shutdown_handler * server : minor * rm unused fn * add ?autoload=true|false query param * refactor: Remove redundant code * docs: Update README documentations + architecture & data flow diagrams * fix: Disable autoload on calling server props for the model * chore: update webui build output * fix ubuntu build * fix: Model status reactivity * fix: Modality detection for MODEL mode * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-01 19:41:04 +01:00
parent 7733409734
commit ec18edfcba
178 changed files with 11643 additions and 4356 deletions
@@ -1,55 +1,42 @@
-import { config } from '$lib/stores/settings.svelte';
-import { selectedModelName } from '$lib/stores/models.svelte';
-import { slotsService } from './slots';
-import type {
-	ApiChatCompletionRequest,
-	ApiChatCompletionResponse,
-	ApiChatCompletionStreamChunk,
-	ApiChatCompletionToolCall,
-	ApiChatCompletionToolCallDelta,
-	ApiChatMessageData
-} from '$lib/types/api';
-import type {
-	DatabaseMessage,
-	DatabaseMessageExtra,
-	DatabaseMessageExtraAudioFile,
-	DatabaseMessageExtraImageFile,
-	DatabaseMessageExtraLegacyContext,
-	DatabaseMessageExtraPdfFile,
-	DatabaseMessageExtraTextFile
-} from '$lib/types/database';
-import type { ChatMessagePromptProgress, ChatMessageTimings } from '$lib/types/chat';
-import type { SettingsChatServiceOptions } from '$lib/types/settings';
+import { getJsonHeaders } from '$lib/utils';
+import { AttachmentType } from '$lib/enums';
+
 /**
- * ChatService - Low-level API communication layer for llama.cpp server interactions
+ * ChatService - Low-level API communication layer for Chat Completions
 *
- * This service handles direct communication with the llama.cpp server's chat completion API.
+ * **Terminology - Chat vs Conversation:**
+ * - **Chat**: The active interaction space with the Chat Completions API. This service
+ *   handles the real-time communication with the AI backend - sending messages, receiving
+ *   streaming responses, and managing request lifecycles. "Chat" is ephemeral and runtime-focused.
+ * - **Conversation**: The persistent database entity storing all messages and metadata.
+ *   Managed by ConversationsService/Store, conversations persist across sessions.
+ *
+ * This service handles direct communication with the llama-server's Chat Completions API.
 * It provides the network layer abstraction for AI model interactions while remaining
 * stateless and focused purely on API communication.
 *
- * **Architecture & Relationship with ChatStore:**
+ * **Architecture & Relationships:**
 * - **ChatService** (this class): Stateless API communication layer
- *   - Handles HTTP requests/responses with llama.cpp server
+ *   - Handles HTTP requests/responses with the llama-server
 *   - Manages streaming and non-streaming response parsing
- *   - Provides request abortion capabilities
+ *   - Provides per-conversation request abortion capabilities
 *   - Converts database messages to API format
 *   - Handles error translation for server responses
 *
- * - **ChatStore**: Stateful orchestration and UI state management
- *   - Uses ChatService for all AI model communication
- *   - Manages conversation state, message history, and UI reactivity
- *   - Coordinates with DatabaseStore for persistence
- *   - Handles complex workflows like branching and regeneration
+ * - **chatStore**: Uses ChatService for all AI model communication
+ * - **conversationsStore**: Provides message context for API requests
 *
 * **Key Responsibilities:**
 * - Message format conversion (DatabaseMessage → API format)
 * - Streaming response handling with real-time callbacks
 * - Reasoning content extraction and processing
 * - File attachment processing (images, PDFs, audio, text)
- * - Request lifecycle management (abort, cleanup)
+ * - Request lifecycle management (abort via AbortSignal)
 */
 export class ChatService {
-	private abortControllers: Map<string, AbortController> = new Map();
+	// ─────────────────────────────────────────────────────────────────────────────
+	// Messaging
+	// ─────────────────────────────────────────────────────────────────────────────

 	/**
 	 * Sends a chat completion request to the llama.cpp server.
@@ -61,10 +48,11 @@ export class ChatService {
 	 * @returns {Promise<string | void>} that resolves to the complete response string (non-streaming) or void (streaming)
 	 * @throws {Error} if the request fails or is aborted
 	 */
-	async sendMessage(
+	static async sendMessage(
 		messages: ApiChatMessageData[] | (DatabaseMessage & { extra?: DatabaseMessageExtra[] })[],
 		options: SettingsChatServiceOptions = {},
-		conversationId?: string
+		conversationId?: string,
+		signal?: AbortSignal
 	): Promise<string | void> {
 		const {
 			stream,
@@ -74,7 +62,7 @@ export class ChatService {
 			onReasoningChunk,
 			onToolCallChunk,
 			onModel,
-			onFirstValidChunk,
+			onTimings,
 			// Generation parameters
 			temperature,
 			max_tokens,
@@ -99,25 +87,17 @@ export class ChatService {
 			// Other parameters
 			samplers,
 			custom,
-			timings_per_token
+			timings_per_token,
+			// Config options
+			systemMessage,
+			disableReasoningFormat
 		} = options;

-		const currentConfig = config();
-
-		const requestId = conversationId || 'default';
-
-		if (this.abortControllers.has(requestId)) {
-			this.abortControllers.get(requestId)?.abort();
-		}
-
-		const abortController = new AbortController();
-		this.abortControllers.set(requestId, abortController);
-
 		const normalizedMessages: ApiChatMessageData[] = messages
 			.map((msg) => {
 				if ('id' in msg && 'convId' in msg && 'timestamp' in msg) {
 					const dbMsg = msg as DatabaseMessage & { extra?: DatabaseMessageExtra[] };
-					return ChatService.convertMessageToChatServiceData(dbMsg);
+					return ChatService.convertDbMessageToApiChatMessageData(dbMsg);
 				} else {
 					return msg as ApiChatMessageData;
 				}
@@ -132,7 +112,7 @@ export class ChatService {
 				return true;
 			});

-		const processedMessages = this.injectSystemMessage(normalizedMessages);
+		const processedMessages = ChatService.injectSystemMessage(normalizedMessages, systemMessage);

 		const requestBody: ApiChatCompletionRequest = {
 			messages: processedMessages.map((msg: ApiChatMessageData) => ({
@@ -142,14 +122,12 @@ export class ChatService {
 			stream
 		};

-		const modelSelectorEnabled = Boolean(currentConfig.modelSelectorEnabled);
-		const activeModel = modelSelectorEnabled ? selectedModelName() : null;
-
-		if (modelSelectorEnabled && activeModel) {
-			requestBody.model = activeModel;
+		// Include model in request if provided (required in ROUTER mode)
+		if (options.model) {
+			requestBody.model = options.model;
 		}

-		requestBody.reasoning_format = currentConfig.disableReasoningFormat ? 'none' : 'auto';
+		requestBody.reasoning_format = disableReasoningFormat ? 'none' : 'auto';

 		if (temperature !== undefined) requestBody.temperature = temperature;
 		if (max_tokens !== undefined) {
@@ -194,20 +172,15 @@ export class ChatService {
 		}

 		try {
-			const apiKey = currentConfig.apiKey?.toString().trim();
-
 			const response = await fetch(`./v1/chat/completions`, {
 				method: 'POST',
-				headers: {
-					'Content-Type': 'application/json',
-					...(apiKey ? { Authorization: `Bearer ${apiKey}` } : {})
-				},
+				headers: getJsonHeaders(),
 				body: JSON.stringify(requestBody),
-				signal: abortController.signal
+				signal
 			});

 			if (!response.ok) {
-				const error = await this.parseErrorResponse(response);
+				const error = await ChatService.parseErrorResponse(response);
 				if (onError) {
 					onError(error);
 				}
@@ -215,7 +188,7 @@ export class ChatService {
 			}

 			if (stream) {
-				await this.handleStreamResponse(
+				await ChatService.handleStreamResponse(
 					response,
 					onChunk,
 					onComplete,
@@ -223,13 +196,13 @@ export class ChatService {
 					onReasoningChunk,
 					onToolCallChunk,
 					onModel,
-					onFirstValidChunk,
+					onTimings,
 					conversationId,
-					abortController.signal
+					signal
 				);
 				return;
 			} else {
-				return this.handleNonStreamResponse(
+				return ChatService.handleNonStreamResponse(
 					response,
 					onComplete,
 					onError,
@@ -269,11 +242,13 @@ export class ChatService {
 				onError(userFriendlyError);
 			}
 			throw userFriendlyError;
-		} finally {
-			this.abortControllers.delete(requestId);
 		}
 	}

+	// ─────────────────────────────────────────────────────────────────────────────
+	// Streaming
+	// ─────────────────────────────────────────────────────────────────────────────
+
 	/**
 	 * Handles streaming response from the chat completion API
 	 * @param response - The Response object from the fetch request
@@ -285,7 +260,7 @@ export class ChatService {
 	 * @returns {Promise<void>} Promise that resolves when streaming is complete
 	 * @throws {Error} if the stream cannot be read or parsed
 	 */
-	private async handleStreamResponse(
+	private static async handleStreamResponse(
 		response: Response,
 		onChunk?: (chunk: string) => void,
 		onComplete?: (
@@ -298,7 +273,7 @@ export class ChatService {
 		onReasoningChunk?: (chunk: string) => void,
 		onToolCallChunk?: (chunk: string) => void,
 		onModel?: (model: string) => void,
-		onFirstValidChunk?: () => void,
+		onTimings?: (timings: ChatMessageTimings, promptProgress?: ChatMessagePromptProgress) => void,
 		conversationId?: string,
 		abortSignal?: AbortSignal
 	): Promise<void> {
@@ -315,7 +290,6 @@ export class ChatService {
 		let lastTimings: ChatMessageTimings | undefined;
 		let streamFinished = false;
 		let modelEmitted = false;
-		let firstValidChunkEmitted = false;
 		let toolCallIndexOffset = 0;
 		let hasOpenToolCallBatch = false;

@@ -333,7 +307,7 @@ export class ChatService {
 				return;
 			}

-			aggregatedToolCalls = this.mergeToolCallDeltas(
+			aggregatedToolCalls = ChatService.mergeToolCallDeltas(
 				aggregatedToolCalls,
 				toolCalls,
 				toolCallIndexOffset
@@ -382,29 +356,20 @@ export class ChatService {

 						try {
 							const parsed: ApiChatCompletionStreamChunk = JSON.parse(data);
-
-							if (!firstValidChunkEmitted && parsed.object === 'chat.completion.chunk') {
-								firstValidChunkEmitted = true;
-
-								if (!abortSignal?.aborted) {
-									onFirstValidChunk?.();
-								}
-							}
-
 							const content = parsed.choices[0]?.delta?.content;
 							const reasoningContent = parsed.choices[0]?.delta?.reasoning_content;
 							const toolCalls = parsed.choices[0]?.delta?.tool_calls;
 							const timings = parsed.timings;
 							const promptProgress = parsed.prompt_progress;

-							const chunkModel = this.extractModelName(parsed);
+							const chunkModel = ChatService.extractModelName(parsed);
 							if (chunkModel && !modelEmitted) {
 								modelEmitted = true;
 								onModel?.(chunkModel);
 							}

 							if (timings || promptProgress) {
-								this.updateProcessingState(timings, promptProgress, conversationId);
+								ChatService.notifyTimings(timings, promptProgress, onTimings);
 								if (timings) {
 									lastTimings = timings;
 								}
@@ -462,7 +427,91 @@ export class ChatService {
 		}
 	}

-	private mergeToolCallDeltas(
+	/**
+	 * Handles non-streaming response from the chat completion API.
+	 * Parses the JSON response and extracts the generated content.
+	 *
+	 * @param response - The fetch Response object containing the JSON data
+	 * @param onComplete - Optional callback invoked when response is successfully parsed
+	 * @param onError - Optional callback invoked if an error occurs during parsing
+	 * @returns {Promise<string>} Promise that resolves to the generated content string
+	 * @throws {Error} if the response cannot be parsed or is malformed
+	 */
+	private static async handleNonStreamResponse(
+		response: Response,
+		onComplete?: (
+			response: string,
+			reasoningContent?: string,
+			timings?: ChatMessageTimings,
+			toolCalls?: string
+		) => void,
+		onError?: (error: Error) => void,
+		onToolCallChunk?: (chunk: string) => void,
+		onModel?: (model: string) => void
+	): Promise<string> {
+		try {
+			const responseText = await response.text();
+
+			if (!responseText.trim()) {
+				const noResponseError = new Error('No response received from server. Please try again.');
+				throw noResponseError;
+			}
+
+			const data: ApiChatCompletionResponse = JSON.parse(responseText);
+
+			const responseModel = ChatService.extractModelName(data);
+			if (responseModel) {
+				onModel?.(responseModel);
+			}
+
+			const content = data.choices[0]?.message?.content || '';
+			const reasoningContent = data.choices[0]?.message?.reasoning_content;
+			const toolCalls = data.choices[0]?.message?.tool_calls;
+
+			if (reasoningContent) {
+				console.log('Full reasoning content:', reasoningContent);
+			}
+
+			let serializedToolCalls: string | undefined;
+
+			if (toolCalls && toolCalls.length > 0) {
+				const mergedToolCalls = ChatService.mergeToolCallDeltas([], toolCalls);
+
+				if (mergedToolCalls.length > 0) {
+					serializedToolCalls = JSON.stringify(mergedToolCalls);
+					if (serializedToolCalls) {
+						onToolCallChunk?.(serializedToolCalls);
+					}
+				}
+			}
+
+			if (!content.trim() && !serializedToolCalls) {
+				const noResponseError = new Error('No response received from server. Please try again.');
+				throw noResponseError;
+			}
+
+			onComplete?.(content, reasoningContent, undefined, serializedToolCalls);
+
+			return content;
+		} catch (error) {
+			const err = error instanceof Error ? error : new Error('Parse error');
+
+			onError?.(err);
+
+			throw err;
+		}
+	}
+
+	/**
+	 * Merges tool call deltas into an existing array of tool calls.
+	 * Handles both existing and new tool calls, updating existing ones and adding new ones.
+	 *
+	 * @param existing - The existing array of tool calls to merge into
+	 * @param deltas - The array of tool call deltas to merge
+	 * @param indexOffset - Optional offset to apply to the index of new tool calls
+	 * @returns {ApiChatCompletionToolCall[]} The merged array of tool calls
+	 */
+	private static mergeToolCallDeltas(
 		existing: ApiChatCompletionToolCall[],
 		deltas: ApiChatCompletionToolCallDelta[],
 		indexOffset = 0
@@ -510,80 +559,9 @@ export class ChatService {
 		return result;
 	}

-	/**
-	 * Handles non-streaming response from the chat completion API.
-	 * Parses the JSON response and extracts the generated content.
-	 *
-	 * @param response - The fetch Response object containing the JSON data
-	 * @param onComplete - Optional callback invoked when response is successfully parsed
-	 * @param onError - Optional callback invoked if an error occurs during parsing
-	 * @returns {Promise<string>} Promise that resolves to the generated content string
-	 * @throws {Error} if the response cannot be parsed or is malformed
-	 */
-	private async handleNonStreamResponse(
-		response: Response,
-		onComplete?: (
-			response: string,
-			reasoningContent?: string,
-			timings?: ChatMessageTimings,
-			toolCalls?: string
-		) => void,
-		onError?: (error: Error) => void,
-		onToolCallChunk?: (chunk: string) => void,
-		onModel?: (model: string) => void
-	): Promise<string> {
-		try {
-			const responseText = await response.text();
-
-			if (!responseText.trim()) {
-				const noResponseError = new Error('No response received from server. Please try again.');
-				throw noResponseError;
-			}
-
-			const data: ApiChatCompletionResponse = JSON.parse(responseText);
-
-			const responseModel = this.extractModelName(data);
-			if (responseModel) {
-				onModel?.(responseModel);
-			}
-
-			const content = data.choices[0]?.message?.content || '';
-			const reasoningContent = data.choices[0]?.message?.reasoning_content;
-			const toolCalls = data.choices[0]?.message?.tool_calls;
-
-			if (reasoningContent) {
-				console.log('Full reasoning content:', reasoningContent);
-			}
-
-			let serializedToolCalls: string | undefined;
-
-			if (toolCalls && toolCalls.length > 0) {
-				const mergedToolCalls = this.mergeToolCallDeltas([], toolCalls);
-
-				if (mergedToolCalls.length > 0) {
-					serializedToolCalls = JSON.stringify(mergedToolCalls);
-					if (serializedToolCalls) {
-						onToolCallChunk?.(serializedToolCalls);
-					}
-				}
-			}
-
-			if (!content.trim() && !serializedToolCalls) {
-				const noResponseError = new Error('No response received from server. Please try again.');
-				throw noResponseError;
-			}
-
-			onComplete?.(content, reasoningContent, undefined, serializedToolCalls);
-
-			return content;
-		} catch (error) {
-			const err = error instanceof Error ? error : new Error('Parse error');
-
-			onError?.(err);
-
-			throw err;
-		}
-	}
+	// ─────────────────────────────────────────────────────────────────────────────
+	// Conversion
+	// ─────────────────────────────────────────────────────────────────────────────

 	/**
 	 * Converts a database message with attachments to API chat message format.
@@ -597,7 +575,7 @@ export class ChatService {
 	 * @returns {ApiChatMessageData} object formatted for the chat completion API
 	 * @static
 	 */
-	static convertMessageToChatServiceData(
+	static convertDbMessageToApiChatMessageData(
 		message: DatabaseMessage & { extra?: DatabaseMessageExtra[] }
 	): ApiChatMessageData {
 		if (!message.extra || message.extra.length === 0) {
@@ -618,7 +596,7 @@ export class ChatService {

 		const imageFiles = message.extra.filter(
 			(extra: DatabaseMessageExtra): extra is DatabaseMessageExtraImageFile =>
-				extra.type === 'imageFile'
+				extra.type === AttachmentType.IMAGE
 		);

 		for (const image of imageFiles) {
@@ -630,7 +608,7 @@ export class ChatService {

 		const textFiles = message.extra.filter(
 			(extra: DatabaseMessageExtra): extra is DatabaseMessageExtraTextFile =>
-				extra.type === 'textFile'
+				extra.type === AttachmentType.TEXT
 		);

 		for (const textFile of textFiles) {
@@ -643,7 +621,7 @@ export class ChatService {
 		// Handle legacy 'context' type from old webui (pasted content)
 		const legacyContextFiles = message.extra.filter(
 			(extra: DatabaseMessageExtra): extra is DatabaseMessageExtraLegacyContext =>
-				extra.type === 'context'
+				extra.type === AttachmentType.LEGACY_CONTEXT
 		);

 		for (const legacyContextFile of legacyContextFiles) {
@@ -655,7 +633,7 @@ export class ChatService {

 		const audioFiles = message.extra.filter(
 			(extra: DatabaseMessageExtra): extra is DatabaseMessageExtraAudioFile =>
-				extra.type === 'audioFile'
+				extra.type === AttachmentType.AUDIO
 		);

 		for (const audio of audioFiles) {
@@ -670,7 +648,7 @@ export class ChatService {

 		const pdfFiles = message.extra.filter(
 			(extra: DatabaseMessageExtra): extra is DatabaseMessageExtraPdfFile =>
-				extra.type === 'pdfFile'
+				extra.type === AttachmentType.PDF
 		);

 		for (const pdfFile of pdfFiles) {
@@ -695,19 +673,17 @@ export class ChatService {
 		};
 	}

+	// ─────────────────────────────────────────────────────────────────────────────
+	// Utilities
+	// ─────────────────────────────────────────────────────────────────────────────
+
 	/**
-	 * Get server properties - static method for API compatibility
+	 * Get server properties - static method for API compatibility (to be refactored)
 	 */
 	static async getServerProps(): Promise<ApiLlamaCppServerProps> {
 		try {
-			const currentConfig = config();
-			const apiKey = currentConfig.apiKey?.toString().trim();
-
 			const response = await fetch(`./props`, {
-				headers: {
-					'Content-Type': 'application/json',
-					...(apiKey ? { Authorization: `Bearer ${apiKey}` } : {})
-				}
+				headers: getJsonHeaders()
 			});

 			if (!response.ok) {
@@ -723,49 +699,51 @@ export class ChatService {
 	}

 	/**
-	 * Aborts any ongoing chat completion request.
-	 * Cancels the current request and cleans up the abort controller.
-	 *
-	 * @public
+	 * Get model information from /models endpoint (to be refactored)
 	 */
-	public abort(conversationId?: string): void {
-		if (conversationId) {
-			const abortController = this.abortControllers.get(conversationId);
-			if (abortController) {
-				abortController.abort();
-				this.abortControllers.delete(conversationId);
+	static async getModels(): Promise<ApiModelListResponse> {
+		try {
+			const response = await fetch(`./models`, {
+				headers: getJsonHeaders()
+			});
+
+			if (!response.ok) {
+				throw new Error(`Failed to fetch models: ${response.status} ${response.statusText}`);
 			}
-		} else {
-			for (const controller of this.abortControllers.values()) {
-				controller.abort();
-			}
-			this.abortControllers.clear();
+
+			const data = await response.json();
+			return data;
+		} catch (error) {
+			console.error('Error fetching models:', error);
+			throw error;
 		}
 	}

 	/**
-	 * Injects a system message at the beginning of the conversation if configured in settings.
-	 * Checks for existing system messages to avoid duplication and retrieves the system message
-	 * from the current configuration settings.
+	 * Injects a system message at the beginning of the conversation if provided.
+	 * Checks for existing system messages to avoid duplication.
 	 *
 	 * @param messages - Array of chat messages to process
-	 * @returns Array of messages with system message injected at the beginning if configured
+	 * @param systemMessage - Optional system message to inject
+	 * @returns Array of messages with system message injected at the beginning if provided
 	 * @private
 	 */
-	private injectSystemMessage(messages: ApiChatMessageData[]): ApiChatMessageData[] {
-		const currentConfig = config();
-		const systemMessage = currentConfig.systemMessage?.toString().trim();
+	private static injectSystemMessage(
+		messages: ApiChatMessageData[],
+		systemMessage?: string
+	): ApiChatMessageData[] {
+		const trimmedSystemMessage = systemMessage?.trim();

-		if (!systemMessage) {
+		if (!trimmedSystemMessage) {
 			return messages;
 		}

 		if (messages.length > 0 && messages[0].role === 'system') {
-			if (messages[0].content !== systemMessage) {
+			if (messages[0].content !== trimmedSystemMessage) {
 				const updatedMessages = [...messages];
 				updatedMessages[0] = {
 					role: 'system',
-					content: systemMessage
+					content: trimmedSystemMessage
 				};
 				return updatedMessages;
 			}
@@ -775,7 +753,7 @@ export class ChatService {

 		const systemMsg: ApiChatMessageData = {
 			role: 'system',
-			content: systemMessage
+			content: trimmedSystemMessage
 		};

 		return [systemMsg, ...messages];
@@ -786,7 +764,7 @@ export class ChatService {
 	 * @param response - HTTP response object
 	 * @returns Promise<Error> - Parsed error with context info if available
 	 */
-	private async parseErrorResponse(response: Response): Promise<Error> {
+	private static async parseErrorResponse(response: Response): Promise<Error> {
 		try {
 			const errorText = await response.text();
 			const errorData: ApiErrorResponse = JSON.parse(errorText);
@@ -803,7 +781,18 @@ export class ChatService {
 		}
 	}

-	private extractModelName(data: unknown): string | undefined {
+	/**
+	 * Extracts model name from Chat Completions API response data.
+	 * Handles various response formats including streaming chunks and final responses.
+	 *
+	 * WORKAROUND: In single model mode, llama-server returns a default/incorrect model name
+	 * in the response. We override it with the actual model name from serverStore.
+	 *
+	 * @param data - Raw response data from the Chat Completions API
+	 * @returns Model name string if found, undefined otherwise
+	 * @private
+	 */
+	private static extractModelName(data: unknown): string | undefined {
 		const asRecord = (value: unknown): Record<string, unknown> | undefined => {
 			return typeof value === 'object' && value !== null
 				? (value as Record<string, unknown>)
@@ -836,31 +825,22 @@ export class ChatService {
 		return undefined;
 	}

-	private updateProcessingState(
-		timings?: ChatMessageTimings,
-		promptProgress?: ChatMessagePromptProgress,
-		conversationId?: string
+	/**
+	 * Calls the onTimings callback with timing data from streaming response.
+	 *
+	 * @param timings - Timing information from the Chat Completions API response
+	 * @param promptProgress - Prompt processing progress data
+	 * @param onTimingsCallback - Callback function to invoke with timing data
+	 * @private
+	 */
+	private static notifyTimings(
+		timings: ChatMessageTimings | undefined,
+		promptProgress: ChatMessagePromptProgress | undefined,
+		onTimingsCallback:
+			| ((timings: ChatMessageTimings, promptProgress?: ChatMessagePromptProgress) => void)
+			| undefined
 	): void {
-		const tokensPerSecond =
-			timings?.predicted_ms && timings?.predicted_n
-				? (timings.predicted_n / timings.predicted_ms) * 1000
-				: 0;
-
-		slotsService
-			.updateFromTimingData(
-				{
-					prompt_n: timings?.prompt_n || 0,
-					predicted_n: timings?.predicted_n || 0,
-					predicted_per_second: tokensPerSecond,
-					cache_n: timings?.cache_n || 0,
-					prompt_progress: promptProgress
-				},
-				conversationId
-			)
-			.catch((error) => {
-				console.warn('Failed to update processing state:', error);
-			});
+		if (!timings || !onTimingsCallback) return;
+		onTimingsCallback(timings, promptProgress);
 	}
 }
-
-export const chatService = new ChatService();
@@ -0,0 +1,357 @@
+import Dexie, { type EntityTable } from 'dexie';
+import { findDescendantMessages } from '$lib/utils';
+
+class LlamacppDatabase extends Dexie {
+	conversations!: EntityTable<DatabaseConversation, string>;
+	messages!: EntityTable<DatabaseMessage, string>;
+
+	constructor() {
+		super('LlamacppWebui');
+
+		this.version(1).stores({
+			conversations: 'id, lastModified, currNode, name',
+			messages: 'id, convId, type, role, timestamp, parent, children'
+		});
+	}
+}
+
+const db = new LlamacppDatabase();
+import { v4 as uuid } from 'uuid';
+
+/**
+ * DatabaseService - Stateless IndexedDB communication layer
+ *
+ * **Terminology - Chat vs Conversation:**
+ * - **Chat**: The active interaction space with the Chat Completions API (ephemeral, runtime).
+ * - **Conversation**: The persistent database entity storing all messages and metadata.
+ *   This service handles raw database operations for conversations - the lowest layer
+ *   in the persistence stack.
+ *
+ * This service provides a stateless data access layer built on IndexedDB using Dexie ORM.
+ * It handles all low-level storage operations for conversations and messages with support
+ * for complex branching and message threading. All methods are static - no instance state.
+ *
+ * **Architecture & Relationships (bottom to top):**
+ * - **DatabaseService** (this class): Stateless IndexedDB operations
+ *   - Lowest layer - direct Dexie/IndexedDB communication
+ *   - Pure CRUD operations without business logic
+ *   - Handles branching tree structure (parent-child relationships)
+ *   - Provides transaction safety for multi-table operations
+ *
+ * - **ConversationsService**: Stateless business logic layer
+ *   - Uses DatabaseService for all persistence operations
+ *   - Adds import/export, navigation, and higher-level operations
+ *
+ * - **conversationsStore**: Reactive state management for conversations
+ *   - Uses ConversationsService for database operations
+ *   - Manages conversation list, active conversation, and messages in memory
+ *
+ * - **chatStore**: Active AI interaction management
+ *   - Uses conversationsStore for conversation context
+ *   - Directly uses DatabaseService for message CRUD during streaming
+ *
+ * **Key Features:**
+ * - **Conversation CRUD**: Create, read, update, delete conversations
+ * - **Message CRUD**: Add, update, delete messages with branching support
+ * - **Branch Operations**: Create branches, find descendants, cascade deletions
+ * - **Transaction Safety**: Atomic operations for data consistency
+ *
+ * **Database Schema:**
+ * - `conversations`: id, lastModified, currNode, name
+ * - `messages`: id, convId, type, role, timestamp, parent, children
+ *
+ * **Branching Model:**
+ * Messages form a tree structure where each message can have multiple children,
+ * enabling conversation branching and alternative response paths. The conversation's
+ * `currNode` tracks the currently active branch endpoint.
+ */
+export class DatabaseService {
+	// ─────────────────────────────────────────────────────────────────────────────
+	// Conversations
+	// ─────────────────────────────────────────────────────────────────────────────
+
+	/**
+	 * Creates a new conversation.
+	 *
+	 * @param name - Name of the conversation
+	 * @returns The created conversation
+	 */
+	static async createConversation(name: string): Promise<DatabaseConversation> {
+		const conversation: DatabaseConversation = {
+			id: uuid(),
+			name,
+			lastModified: Date.now(),
+			currNode: ''
+		};
+
+		await db.conversations.add(conversation);
+		return conversation;
+	}
+
+	// ─────────────────────────────────────────────────────────────────────────────
+	// Messages
+	// ─────────────────────────────────────────────────────────────────────────────
+
+	/**
+	 * Creates a new message branch by adding a message and updating parent/child relationships.
+	 * Also updates the conversation's currNode to point to the new message.
+	 *
+	 * @param message - Message to add (without id)
+	 * @param parentId - Parent message ID to attach to
+	 * @returns The created message
+	 */
+	static async createMessageBranch(
+		message: Omit<DatabaseMessage, 'id'>,
+		parentId: string | null
+	): Promise<DatabaseMessage> {
+		return await db.transaction('rw', [db.conversations, db.messages], async () => {
+			// Handle null parent (root message case)
+			if (parentId !== null) {
+				const parentMessage = await db.messages.get(parentId);
+				if (!parentMessage) {
+					throw new Error(`Parent message ${parentId} not found`);
+				}
+			}
+
+			const newMessage: DatabaseMessage = {
+				...message,
+				id: uuid(),
+				parent: parentId,
+				toolCalls: message.toolCalls ?? '',
+				children: []
+			};
+
+			await db.messages.add(newMessage);
+
+			// Update parent's children array if parent exists
+			if (parentId !== null) {
+				const parentMessage = await db.messages.get(parentId);
+				if (parentMessage) {
+					await db.messages.update(parentId, {
+						children: [...parentMessage.children, newMessage.id]
+					});
+				}
+			}
+
+			await this.updateConversation(message.convId, {
+				currNode: newMessage.id
+			});
+
+			return newMessage;
+		});
+	}
+
+	/**
+	 * Creates a root message for a new conversation.
+	 * Root messages are not displayed but serve as the tree root for branching.
+	 *
+	 * @param convId - Conversation ID
+	 * @returns The created root message
+	 */
+	static async createRootMessage(convId: string): Promise<string> {
+		const rootMessage: DatabaseMessage = {
+			id: uuid(),
+			convId,
+			type: 'root',
+			timestamp: Date.now(),
+			role: 'system',
+			content: '',
+			parent: null,
+			thinking: '',
+			toolCalls: '',
+			children: []
+		};
+
+		await db.messages.add(rootMessage);
+		return rootMessage.id;
+	}
+
+	/**
+	 * Deletes a conversation and all its messages.
+	 *
+	 * @param id - Conversation ID
+	 */
+	static async deleteConversation(id: string): Promise<void> {
+		await db.transaction('rw', [db.conversations, db.messages], async () => {
+			await db.conversations.delete(id);
+			await db.messages.where('convId').equals(id).delete();
+		});
+	}
+
+	/**
+	 * Deletes a message and removes it from its parent's children array.
+	 *
+	 * @param messageId - ID of the message to delete
+	 */
+	static async deleteMessage(messageId: string): Promise<void> {
+		await db.transaction('rw', db.messages, async () => {
+			const message = await db.messages.get(messageId);
+			if (!message) return;
+
+			// Remove this message from its parent's children array
+			if (message.parent) {
+				const parent = await db.messages.get(message.parent);
+				if (parent) {
+					parent.children = parent.children.filter((childId: string) => childId !== messageId);
+					await db.messages.put(parent);
+				}
+			}
+
+			// Delete the message
+			await db.messages.delete(messageId);
+		});
+	}
+
+	/**
+	 * Deletes a message and all its descendant messages (cascading deletion).
+	 * This removes the entire branch starting from the specified message.
+	 *
+	 * @param conversationId - ID of the conversation containing the message
+	 * @param messageId - ID of the root message to delete (along with all descendants)
+	 * @returns Array of all deleted message IDs
+	 */
+	static async deleteMessageCascading(
+		conversationId: string,
+		messageId: string
+	): Promise<string[]> {
+		return await db.transaction('rw', db.messages, async () => {
+			// Get all messages in the conversation to find descendants
+			const allMessages = await db.messages.where('convId').equals(conversationId).toArray();
+
+			// Find all descendant messages
+			const descendants = findDescendantMessages(allMessages, messageId);
+			const allToDelete = [messageId, ...descendants];
+
+			// Get the message to delete for parent cleanup
+			const message = await db.messages.get(messageId);
+			if (message && message.parent) {
+				const parent = await db.messages.get(message.parent);
+				if (parent) {
+					parent.children = parent.children.filter((childId: string) => childId !== messageId);
+					await db.messages.put(parent);
+				}
+			}
+
+			// Delete all messages in the branch
+			await db.messages.bulkDelete(allToDelete);
+
+			return allToDelete;
+		});
+	}
+
+	/**
+	 * Gets all conversations, sorted by last modified time (newest first).
+	 *
+	 * @returns Array of conversations
+	 */
+	static async getAllConversations(): Promise<DatabaseConversation[]> {
+		return await db.conversations.orderBy('lastModified').reverse().toArray();
+	}
+
+	/**
+	 * Gets a conversation by ID.
+	 *
+	 * @param id - Conversation ID
+	 * @returns The conversation if found, otherwise undefined
+	 */
+	static async getConversation(id: string): Promise<DatabaseConversation | undefined> {
+		return await db.conversations.get(id);
+	}
+
+	/**
+	 * Gets all messages in a conversation, sorted by timestamp (oldest first).
+	 *
+	 * @param convId - Conversation ID
+	 * @returns Array of messages in the conversation
+	 */
+	static async getConversationMessages(convId: string): Promise<DatabaseMessage[]> {
+		return await db.messages.where('convId').equals(convId).sortBy('timestamp');
+	}
+
+	/**
+	 * Updates a conversation.
+	 *
+	 * @param id - Conversation ID
+	 * @param updates - Partial updates to apply
+	 * @returns Promise that resolves when the conversation is updated
+	 */
+	static async updateConversation(
+		id: string,
+		updates: Partial<Omit<DatabaseConversation, 'id'>>
+	): Promise<void> {
+		await db.conversations.update(id, {
+			...updates,
+			lastModified: Date.now()
+		});
+	}
+
+	// ─────────────────────────────────────────────────────────────────────────────
+	// Navigation
+	// ─────────────────────────────────────────────────────────────────────────────
+
+	/**
+	 * Updates the conversation's current node (active branch).
+	 * This determines which conversation path is currently being viewed.
+	 *
+	 * @param convId - Conversation ID
+	 * @param nodeId - Message ID to set as current node
+	 */
+	static async updateCurrentNode(convId: string, nodeId: string): Promise<void> {
+		await this.updateConversation(convId, {
+			currNode: nodeId
+		});
+	}
+
+	/**
+	 * Updates a message.
+	 *
+	 * @param id - Message ID
+	 * @param updates - Partial updates to apply
+	 * @returns Promise that resolves when the message is updated
+	 */
+	static async updateMessage(
+		id: string,
+		updates: Partial<Omit<DatabaseMessage, 'id'>>
+	): Promise<void> {
+		await db.messages.update(id, updates);
+	}
+
+	// ─────────────────────────────────────────────────────────────────────────────
+	// Import
+	// ─────────────────────────────────────────────────────────────────────────────
+
+	/**
+	 * Imports multiple conversations and their messages.
+	 * Skips conversations that already exist.
+	 *
+	 * @param data - Array of { conv, messages } objects
+	 */
+	static async importConversations(
+		data: { conv: DatabaseConversation; messages: DatabaseMessage[] }[]
+	): Promise<{ imported: number; skipped: number }> {
+		let importedCount = 0;
+		let skippedCount = 0;
+
+		return await db.transaction('rw', [db.conversations, db.messages], async () => {
+			for (const item of data) {
+				const { conv, messages } = item;
+
+				const existing = await db.conversations.get(conv.id);
+				if (existing) {
+					console.warn(`Conversation "${conv.name}" already exists, skipping...`);
+					skippedCount++;
+					continue;
+				}
+
+				await db.conversations.add(conv);
+				for (const msg of messages) {
+					await db.messages.put(msg);
+				}
+
+				importedCount++;
+			}
+
+			return { imported: importedCount, skipped: skippedCount };
+		});
+	}
+}
@@ -1,2 +1,5 @@
-export { chatService } from './chat';
-export { slotsService } from './slots';
+export { ChatService } from './chat';
+export { DatabaseService } from './database';
+export { ModelsService } from './models';
+export { PropsService } from './props';
+export { ParameterSyncService } from './parameter-sync';
@@ -1,16 +1,34 @@
 import { base } from '$app/paths';
-import { config } from '$lib/stores/settings.svelte';
-import type { ApiModelListResponse } from '$lib/types/api';
+import { ServerModelStatus } from '$lib/enums';
+import { getJsonHeaders } from '$lib/utils';

+/**
+ * ModelsService - Stateless service for model management API communication
+ *
+ * This service handles communication with model-related endpoints:
+ * - `/v1/models` - OpenAI-compatible model list (MODEL + ROUTER mode)
+ * - `/models` - Router-specific model management (ROUTER mode only)
+ *
+ * **Responsibilities:**
+ * - List available models
+ * - Load/unload models (ROUTER mode)
+ * - Check model status (ROUTER mode)
+ *
+ * **Used by:**
+ * - modelsStore: Primary consumer for model state management
+ */
 export class ModelsService {
-	static async list(): Promise<ApiModelListResponse> {
-		const currentConfig = config();
-		const apiKey = currentConfig.apiKey?.toString().trim();
+	// ─────────────────────────────────────────────────────────────────────────────
+	// Listing
+	// ─────────────────────────────────────────────────────────────────────────────

+	/**
+	 * Fetch list of models from OpenAI-compatible endpoint
+	 * Works in both MODEL and ROUTER modes
+	 */
+	static async list(): Promise<ApiModelListResponse> {
 		const response = await fetch(`${base}/v1/models`, {
-			headers: {
-				...(apiKey ? { Authorization: `Bearer ${apiKey}` } : {})
-			}
+			headers: getJsonHeaders()
 		});

 		if (!response.ok) {
@@ -19,4 +37,88 @@ export class ModelsService {

 		return response.json() as Promise<ApiModelListResponse>;
 	}
+
+	/**
+	 * Fetch list of all models with detailed metadata (ROUTER mode)
+	 * Returns models with load status, paths, and other metadata
+	 */
+	static async listRouter(): Promise<ApiRouterModelsListResponse> {
+		const response = await fetch(`${base}/models`, {
+			headers: getJsonHeaders()
+		});
+
+		if (!response.ok) {
+			throw new Error(`Failed to fetch router models list (status ${response.status})`);
+		}
+
+		return response.json() as Promise<ApiRouterModelsListResponse>;
+	}
+
+	// ─────────────────────────────────────────────────────────────────────────────
+	// Load/Unload
+	// ─────────────────────────────────────────────────────────────────────────────
+
+	/**
+	 * Load a model (ROUTER mode)
+	 * POST /models/load
+	 * @param modelId - Model identifier to load
+	 * @param extraArgs - Optional additional arguments to pass to the model instance
+	 */
+	static async load(modelId: string, extraArgs?: string[]): Promise<ApiRouterModelsLoadResponse> {
+		const payload: { model: string; extra_args?: string[] } = { model: modelId };
+		if (extraArgs && extraArgs.length > 0) {
+			payload.extra_args = extraArgs;
+		}
+
+		const response = await fetch(`${base}/models/load`, {
+			method: 'POST',
+			headers: getJsonHeaders(),
+			body: JSON.stringify(payload)
+		});
+
+		if (!response.ok) {
+			const errorData = await response.json().catch(() => ({}));
+			throw new Error(errorData.error || `Failed to load model (status ${response.status})`);
+		}
+
+		return response.json() as Promise<ApiRouterModelsLoadResponse>;
+	}
+
+	/**
+	 * Unload a model (ROUTER mode)
+	 * POST /models/unload
+	 * @param modelId - Model identifier to unload
+	 */
+	static async unload(modelId: string): Promise<ApiRouterModelsUnloadResponse> {
+		const response = await fetch(`${base}/models/unload`, {
+			method: 'POST',
+			headers: getJsonHeaders(),
+			body: JSON.stringify({ model: modelId })
+		});
+
+		if (!response.ok) {
+			const errorData = await response.json().catch(() => ({}));
+			throw new Error(errorData.error || `Failed to unload model (status ${response.status})`);
+		}
+
+		return response.json() as Promise<ApiRouterModelsUnloadResponse>;
+	}
+
+	// ─────────────────────────────────────────────────────────────────────────────
+	// Status
+	// ─────────────────────────────────────────────────────────────────────────────
+
+	/**
+	 * Check if a model is loaded based on its metadata
+	 */
+	static isModelLoaded(model: ApiModelDataEntry): boolean {
+		return model.status.value === ServerModelStatus.LOADED;
+	}
+
+	/**
+	 * Check if a model is currently loading
+	 */
+	static isModelLoading(model: ApiModelDataEntry): boolean {
+		return model.status.value === ServerModelStatus.LOADING;
+	}
 }
@@ -1,6 +1,5 @@
 import { describe, it, expect } from 'vitest';
 import { ParameterSyncService } from './parameter-sync';
-import type { ApiLlamaCppServerProps } from '$lib/types/api';

 describe('ParameterSyncService', () => {
 	describe('roundFloatingPoint', () => {
@@ -12,8 +12,7 @@
 * - Provide sync utilities for settings store integration
 */

-import type { ApiLlamaCppServerProps } from '$lib/types/api';
-import { normalizeFloatingPoint } from '$lib/utils/precision';
+import { normalizeFloatingPoint } from '$lib/utils';

 export type ParameterSource = 'default' | 'custom';
 export type ParameterValue = string | number | boolean;
@@ -60,6 +59,10 @@ export const SYNCABLE_PARAMETERS: SyncableParameter[] = [
 ];

 export class ParameterSyncService {
+	// ─────────────────────────────────────────────────────────────────────────────
+	// Extraction
+	// ─────────────────────────────────────────────────────────────────────────────
+
 	/**
 	 * Round floating-point numbers to avoid JavaScript precision issues
 	 */
@@ -95,6 +98,10 @@ export class ParameterSyncService {
 		return extracted;
 	}

+	// ─────────────────────────────────────────────────────────────────────────────
+	// Merging
+	// ─────────────────────────────────────────────────────────────────────────────
+
 	/**
 	 * Merge server defaults with current user settings
 	 * Returns updated settings that respect user overrides while using server defaults
@@ -116,6 +123,10 @@ export class ParameterSyncService {
 		return merged;
 	}

+	// ─────────────────────────────────────────────────────────────────────────────
+	// Info
+	// ─────────────────────────────────────────────────────────────────────────────
+
 	/**
 	 * Get parameter information including source and values
 	 */
@@ -172,6 +183,10 @@ export class ParameterSyncService {
 		}
 	}

+	// ─────────────────────────────────────────────────────────────────────────────
+	// Diff
+	// ─────────────────────────────────────────────────────────────────────────────
+
 	/**
 	 * Create a diff between current settings and server defaults
 	 */
@@ -0,0 +1,77 @@
+import { getAuthHeaders } from '$lib/utils';
+
+/**
+ * PropsService - Server properties management
+ *
+ * This service handles communication with the /props endpoint to retrieve
+ * server configuration, model information, and capabilities.
+ *
+ * **Responsibilities:**
+ * - Fetch server properties from /props endpoint
+ * - Handle API authentication
+ * - Parse and validate server response
+ *
+ * **Used by:**
+ * - serverStore: Primary consumer for server state management
+ */
+export class PropsService {
+	// ─────────────────────────────────────────────────────────────────────────────
+	// Fetching
+	// ─────────────────────────────────────────────────────────────────────────────
+
+	/**
+	 * Fetches server properties from the /props endpoint
+	 *
+	 * @param autoload - If false, prevents automatic model loading (default: false)
+	 * @returns {Promise<ApiLlamaCppServerProps>} Server properties
+	 * @throws {Error} If the request fails or returns invalid data
+	 */
+	static async fetch(autoload = false): Promise<ApiLlamaCppServerProps> {
+		const url = new URL('./props', window.location.href);
+		if (!autoload) {
+			url.searchParams.set('autoload', 'false');
+		}
+
+		const response = await fetch(url.toString(), {
+			headers: getAuthHeaders()
+		});
+
+		if (!response.ok) {
+			throw new Error(
+				`Failed to fetch server properties: ${response.status} ${response.statusText}`
+			);
+		}
+
+		const data = await response.json();
+		return data as ApiLlamaCppServerProps;
+	}
+
+	/**
+	 * Fetches server properties for a specific model (ROUTER mode)
+	 *
+	 * @param modelId - The model ID to fetch properties for
+	 * @param autoload - If false, prevents automatic model loading (default: false)
+	 * @returns {Promise<ApiLlamaCppServerProps>} Server properties for the model
+	 * @throws {Error} If the request fails or returns invalid data
+	 */
+	static async fetchForModel(modelId: string, autoload = false): Promise<ApiLlamaCppServerProps> {
+		const url = new URL('./props', window.location.href);
+		url.searchParams.set('model', modelId);
+		if (!autoload) {
+			url.searchParams.set('autoload', 'false');
+		}
+
+		const response = await fetch(url.toString(), {
+			headers: getAuthHeaders()
+		});
+
+		if (!response.ok) {
+			throw new Error(
+				`Failed to fetch model properties: ${response.status} ${response.statusText}`
+			);
+		}
+
+		const data = await response.json();
+		return data as ApiLlamaCppServerProps;
+	}
+}
@@ -1,322 +0,0 @@
-import { config } from '$lib/stores/settings.svelte';
-
-/**
- * SlotsService - Real-time processing state monitoring and token rate calculation
- *
- * This service provides real-time information about generation progress, token rates,
- * and context usage based on timing data from ChatService streaming responses.
- * It manages streaming session tracking and provides accurate processing state updates.
- *
- * **Architecture & Relationships:**
- * - **SlotsService** (this class): Processing state monitoring
- *   - Receives timing data from ChatService streaming responses
- *   - Calculates token generation rates and context usage
- *   - Manages streaming session lifecycle
- *   - Provides real-time updates to UI components
- *
- * - **ChatService**: Provides timing data from `/chat/completions` streaming
- * - **UI Components**: Subscribe to processing state for progress indicators
- *
- * **Key Features:**
- * - **Real-time Monitoring**: Live processing state during generation
- * - **Token Rate Calculation**: Accurate tokens/second from timing data
- * - **Context Tracking**: Current context usage and remaining capacity
- * - **Streaming Lifecycle**: Start/stop tracking for streaming sessions
- * - **Timing Data Processing**: Converts streaming timing data to structured state
- * - **Error Handling**: Graceful handling when timing data is unavailable
- *
- * **Processing States:**
- * - `idle`: No active processing
- * - `generating`: Actively generating tokens
- *
- * **Token Rate Calculation:**
- * Uses timing data from `/chat/completions` streaming response for accurate
- * real-time token generation rate measurement.
- */
-export class SlotsService {
-	private callbacks: Set<(state: ApiProcessingState | null) => void> = new Set();
-	private isStreamingActive: boolean = false;
-	private lastKnownState: ApiProcessingState | null = null;
-	private conversationStates: Map<string, ApiProcessingState | null> = new Map();
-	private activeConversationId: string | null = null;
-
-	/**
-	 * Start streaming session tracking
-	 */
-	startStreaming(): void {
-		this.isStreamingActive = true;
-	}
-
-	/**
-	 * Stop streaming session tracking
-	 */
-	stopStreaming(): void {
-		this.isStreamingActive = false;
-	}
-
-	/**
-	 * Clear the current processing state
-	 * Used when switching to a conversation without timing data
-	 */
-	clearState(): void {
-		this.lastKnownState = null;
-
-		for (const callback of this.callbacks) {
-			try {
-				callback(null);
-			} catch (error) {
-				console.error('Error in clearState callback:', error);
-			}
-		}
-	}
-
-	/**
-	 * Check if currently in a streaming session
-	 */
-	isStreaming(): boolean {
-		return this.isStreamingActive;
-	}
-
-	/**
-	 * Set the active conversation for statistics display
-	 */
-	setActiveConversation(conversationId: string | null): void {
-		this.activeConversationId = conversationId;
-		this.notifyCallbacks();
-	}
-
-	/**
-	 * Update processing state for a specific conversation
-	 */
-	updateConversationState(conversationId: string, state: ApiProcessingState | null): void {
-		this.conversationStates.set(conversationId, state);
-
-		if (conversationId === this.activeConversationId) {
-			this.lastKnownState = state;
-			this.notifyCallbacks();
-		}
-	}
-
-	/**
-	 * Get processing state for a specific conversation
-	 */
-	getConversationState(conversationId: string): ApiProcessingState | null {
-		return this.conversationStates.get(conversationId) || null;
-	}
-
-	/**
-	 * Clear state for a specific conversation
-	 */
-	clearConversationState(conversationId: string): void {
-		this.conversationStates.delete(conversationId);
-
-		if (conversationId === this.activeConversationId) {
-			this.lastKnownState = null;
-			this.notifyCallbacks();
-		}
-	}
-
-	/**
-	 * Notify all callbacks with current state
-	 */
-	private notifyCallbacks(): void {
-		const currentState = this.activeConversationId
-			? this.conversationStates.get(this.activeConversationId) || null
-			: this.lastKnownState;
-
-		for (const callback of this.callbacks) {
-			try {
-				callback(currentState);
-			} catch (error) {
-				console.error('Error in slots service callback:', error);
-			}
-		}
-	}
-
-	/**
-	 * @deprecated Polling is no longer used - timing data comes from ChatService streaming response
-	 * This method logs a warning if called to help identify outdated usage
-	 */
-	fetchAndNotify(): void {
-		console.warn(
-			'SlotsService.fetchAndNotify() is deprecated - use timing data from ChatService instead'
-		);
-	}
-
-	subscribe(callback: (state: ApiProcessingState | null) => void): () => void {
-		this.callbacks.add(callback);
-
-		if (this.lastKnownState) {
-			callback(this.lastKnownState);
-		}
-
-		return () => {
-			this.callbacks.delete(callback);
-		};
-	}
-
-	/**
-	 * Updates processing state with timing data from ChatService streaming response
-	 */
-	async updateFromTimingData(
-		timingData: {
-			prompt_n: number;
-			predicted_n: number;
-			predicted_per_second: number;
-			cache_n: number;
-			prompt_progress?: ChatMessagePromptProgress;
-		},
-		conversationId?: string
-	): Promise<void> {
-		const processingState = await this.parseCompletionTimingData(timingData);
-
-		if (processingState === null) {
-			console.warn('Failed to parse timing data - skipping update');
-
-			return;
-		}
-
-		if (conversationId) {
-			this.updateConversationState(conversationId, processingState);
-		} else {
-			this.lastKnownState = processingState;
-			this.notifyCallbacks();
-		}
-	}
-
-	/**
-	 * Gets context total from last known slots data or fetches from server
-	 */
-	private async getContextTotal(): Promise<number | null> {
-		if (this.lastKnownState && this.lastKnownState.contextTotal > 0) {
-			return this.lastKnownState.contextTotal;
-		}
-
-		try {
-			const currentConfig = config();
-			const apiKey = currentConfig.apiKey?.toString().trim();
-
-			const response = await fetch(`./slots`, {
-				headers: {
-					...(apiKey ? { Authorization: `Bearer ${apiKey}` } : {})
-				}
-			});
-
-			if (response.ok) {
-				const slotsData = await response.json();
-				if (Array.isArray(slotsData) && slotsData.length > 0) {
-					const slot = slotsData[0];
-					if (slot.n_ctx && slot.n_ctx > 0) {
-						return slot.n_ctx;
-					}
-				}
-			}
-		} catch (error) {
-			console.warn('Failed to fetch context total from /slots:', error);
-		}
-
-		return 4096;
-	}
-
-	private async parseCompletionTimingData(
-		timingData: Record<string, unknown>
-	): Promise<ApiProcessingState | null> {
-		const promptTokens = (timingData.prompt_n as number) || 0;
-		const predictedTokens = (timingData.predicted_n as number) || 0;
-		const tokensPerSecond = (timingData.predicted_per_second as number) || 0;
-		const cacheTokens = (timingData.cache_n as number) || 0;
-		const promptProgress = timingData.prompt_progress as
-			| {
-					total: number;
-					cache: number;
-					processed: number;
-					time_ms: number;
-			  }
-			| undefined;
-
-		const contextTotal = await this.getContextTotal();
-
-		if (contextTotal === null) {
-			console.warn('No context total available - cannot calculate processing state');
-
-			return null;
-		}
-
-		const currentConfig = config();
-		const outputTokensMax = currentConfig.max_tokens || -1;
-
-		const contextUsed = promptTokens + cacheTokens + predictedTokens;
-		const outputTokensUsed = predictedTokens;
-
-		const progressPercent = promptProgress
-			? Math.round((promptProgress.processed / promptProgress.total) * 100)
-			: undefined;
-
-		return {
-			status: predictedTokens > 0 ? 'generating' : promptProgress ? 'preparing' : 'idle',
-			tokensDecoded: predictedTokens,
-			tokensRemaining: outputTokensMax - predictedTokens,
-			contextUsed,
-			contextTotal,
-			outputTokensUsed,
-			outputTokensMax,
-			hasNextToken: predictedTokens > 0,
-			tokensPerSecond,
-			temperature: currentConfig.temperature ?? 0.8,
-			topP: currentConfig.top_p ?? 0.95,
-			speculative: false,
-			progressPercent,
-			promptTokens,
-			cacheTokens
-		};
-	}
-
-	/**
-	 * Get current processing state
-	 * Returns the last known state from timing data, or null if no data available
-	 * If activeConversationId is set, returns state for that conversation
-	 */
-	async getCurrentState(): Promise<ApiProcessingState | null> {
-		if (this.activeConversationId) {
-			const conversationState = this.conversationStates.get(this.activeConversationId);
-
-			if (conversationState) {
-				return conversationState;
-			}
-		}
-
-		if (this.lastKnownState) {
-			return this.lastKnownState;
-		}
-		try {
-			const { chatStore } = await import('$lib/stores/chat.svelte');
-			const messages = chatStore.activeMessages;
-
-			for (let i = messages.length - 1; i >= 0; i--) {
-				const message = messages[i];
-				if (message.role === 'assistant' && message.timings) {
-					const restoredState = await this.parseCompletionTimingData({
-						prompt_n: message.timings.prompt_n || 0,
-						predicted_n: message.timings.predicted_n || 0,
-						predicted_per_second:
-							message.timings.predicted_n && message.timings.predicted_ms
-								? (message.timings.predicted_n / message.timings.predicted_ms) * 1000
-								: 0,
-						cache_n: message.timings.cache_n || 0
-					});
-
-					if (restoredState) {
-						this.lastKnownState = restoredState;
-						return restoredState;
-					}
-				}
-			}
-		} catch (error) {
-			console.warn('Failed to restore timing data from messages:', error);
-		}
-
-		return null;
-	}
-}
-
-export const slotsService = new SlotsService();