server : add Anthropic Messages API support (#17570)

* server : add Anthropic Messages API support * remove -@pytest.mark.slow from tool calling/jinja tests * server : remove unused code and slow/skip on test_anthropic_vision_base64_with_multimodal_model in test_anthropic_api.py * server : removed redundant n field logic in anthropic_params_from_json * server : use single error object instead of error_array in streaming response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream() * server : refactor Anthropic API to use OAI conversion * make sure basic test always go first * clean up * clean up api key check, add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-11-28 12:57:04 +01:00
parent ff55414c42
commit ddf9f94389
11 changed files with 1553 additions and 70 deletions
@@ -294,6 +294,9 @@ json oaicompat_chat_params_parse(
    const oaicompat_parser_options & opt,
    std::vector<raw_buffer> & out_files);

+// convert Anthropic Messages API format to OpenAI Chat Completions API format
+json convert_anthropic_to_oai(const json & body);
+
 // TODO: move it to server-task.cpp
 json format_embeddings_response_oaicompat(const json & request, const json & embeddings, bool use_base64 = false);

@@ -320,7 +323,10 @@ std::string tokens_to_output_formatted_string(const llama_context * ctx, const l

 // format server-sent event (SSE), return the formatted string to send
 // note: if data is a json array, it will be sent as multiple events, one per item
-std::string format_sse(const json & data);
+std::string format_oai_sse(const json & data);
+
+// format Anthropic-style SSE with event types
+std::string format_anthropic_sse(const json & data);

 bool is_valid_utf8(const std::string & str);