Hermes とか llama3 のtool_calling で MCP を使うメモ

最近 Local LLM でオレオレ LLM 環境を作ろうといろいろ調べている中で、 Local LLM から MCP 使えると便利だよねでやったこと

tool calling

Local LLM で tool calling というのがある。

生プロンプトでの例

一例として、Hermes の tool calling だと（Qwenとかもおなじ）

system prompt

You are a function calling AI model. You are provided with function signatures within <tools> </tools> XML tags. You may call one or more functions to assist with the user query. If available tools are not relevant in assisting with user query, just respond in natural conversational language. Don't make assumptions about what values to plug into functions. After calling & executing the functions, you will be provided with function results within <tool_response> </tool_response> XML tags.
<tools>
[{'type': 'function', 'function': {'name': 'get_stock_fundamentals', 'description': 'Get fundamental data for a given stock symbol using yfinance API.', 'parameters': {'type': 'object', 'properties': {'symbol': {'type': 'string'}}, 'required': ['symbol']}}}]
</tools>
For each function call return a JSON object, with the following pydantic model json schema:
{'title': 'FunctionCall', 'type': 'object', 'properties': {'name': {'title': 'Name', 'type': 'string'}, 'arguments': {'title': 'Arguments', 'type': 'object'}}, 'required': ['arguments', 'name']}
Each function call should be enclosed within <tool_call> </tool_call> XML tags. You must use <scratch_pad> </scratch_pad> XML tags to record your reasoning and planning before you call the functions as follows.
Example:
<scratch_pad>
Goal: <state task assigned by user>
Actions:
<if tool calls need to be generated:>
- {result_var_name1} = functions.{function_name1}({param1}={value1},...)
- {result_var_name2, result_var_name3} = ...
<if no tool call needs to be generated:> None
Observation: <set observation 'None' with tool calls; plan final tools results summary when provided>
Reflection: <evaluate query-tool relevance and required parameters when tools called; analyze overall task status when observations made>
</scratch_pad>
<tool_call>
{'name': <function-name>, 'arguments': <args-dict>}
</tool_call>

こんなものを流すと

<|im_start|>user
Fetch the stock fundamentals data for Tesla (TSLA)<|im_end|>
<|im_start|>assistant
<tool_call>
{'arguments': {'symbol': 'TSLA'}, 'name': 'get_stock_fundamentals'}
</tool_call><|im_end|>

こんなかんじで JSON が返ってくるので、これを拾って

<|im_start|>tool
<tool_response>
{"name": "get_stock_fundamentals", "content": {'symbol': 'TSLA', 'company_name': 'Tesla, Inc.', 'sector': 'Consumer Cyclical', 'industry': 'Auto Manufacturers', 'market_cap': 611384164352, 'pe_ratio': 49.604652, 'pb_ratio': 9.762013, 'dividend_yield': None, 'eps': 4.3, 'beta': 2.427, '52_week_high': 299.29, '52_week_low': 152.37}}
</tool_response>
<|im_end|>

こんな感じでツール実行結果を返してあげると、この内容を解釈して LLM が結果を返してくれる仕組み。

transformers で chat templete を使う例

それで、これは面倒なんだけど、大体 chat templete を使うと tool 一覧を json で渡してあげるといい感じでやってくれる。

たとえば huggingface の transformers だと、

huggingface.co

# A simple function that takes no arguments
current_time = {
  "type": "function", 
  "function": {
    "name": "current_time",
    "description": "Get the current local time as a string.",
    "parameters": {
      'type': 'object',
      'properties': {}
    }
  }
}

# A more complete function that takes two numerical arguments
multiply = {
  'type': 'function',
  'function': {
    'name': 'multiply',
    'description': 'A function that multiplies two numbers', 
    'parameters': {
      'type': 'object', 
      'properties': {
        'a': {
          'type': 'number',
          'description': 'The first number to multiply'
        }, 
        'b': {
          'type': 'number', 'description': 'The second number to multiply'
        }
      }, 
      'required': ['a', 'b']
    }
  }
}

model_input = tokenizer.apply_chat_template(
    messages,
    tools = [current_time, multiply]
)

こんなかんじで名前や properties を apply_chat_templete で渡してあげると前述のコードを出してくれる。

対応モデルは、vLLM のこのページとかが参考になる。

個別モデルでは、chat_templete.json を見れば使えるかがわかる。

OpenAI API を使う方法

Ollama や Llama.cpp とかで使える OpenAI API でも同じフォーマットの json が使える。

platform.openai.com

OpenAI API の場合は、client.responses.create に tools として渡す。

from openai import OpenAI

client = OpenAI()

tools = [{
    "type": "function",
    "name": "get_weather",
    "description": "Get current temperature for a given location.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "City and country e.g. Bogotá, Colombia"
            }
        },
        "required": [
            "location"
        ],
        "additionalProperties": False
    }
}]

response = client.responses.create(
    model="gpt-4.1",
    input=[{"role": "user", "content": "What is the weather like in Paris today?"}],
    tools=tools
)

print(response.output)

私は OpenAI の API が使えないので、Ollama で使える API で同じようなことをした結果を参考に貼っておくと

# Ollama では client.responses が実装されていないので、 client.chat.completions で代替
stream = client.chat.completions.create(
    messages=messages,
    model='rinna-qwq-q4',
    tools=tools
)

以下のような形で帰ってくる。これは client.responses の場合少し違う結果になるけど、OpenAI のドキュメントを読んでもらえば多分わかる。

ChatCompletion(id='chatcmpl-243', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content='', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_n36j4m54', function=Function(arguments='{"url":"https://docs.vllm.ai/en/stable/features/tool_calling.html#llama-models-llama3-json"}', name='fetch'), type='function', index=0)]))], created=1745092883, model='rinna-qwq-q4k:latest', object='chat.completion', service_tier=None, system_fingerprint='fp_ollama', usage=CompletionUsage(completion_tokens=329, prompt_tokens=3097, total_tokens=3426, completion_tokens_details=None, prompt_tokens_details=None))

MCP

MCP を使うと、自分で機能を実装しなくていいので楽。

MCP Client

github.com

公式 Document はこの辺にある。

Client の公式サンプルはこの辺。

github.com

ほか、この辺も参考になるかも。

zenn.dev

MCP を tool_calling と連携する

モデルを初期化する前に、MCP のツールリストを取得して、tools として渡してあげると認識して使ってくれるようになる。

リストを取得する例

server_config = config.load_config("servers_config.json")
for name, srv_config in server_config["mcpServers"].items():
    mcp_server = Server(name, srv_config)
    mcp_servers.append(mcp_server)
    mcp_servers_dict[name] = mcp_server
for server in mcp_servers:
    try:
        await server.initialize()
    except Exception as e:
        logging.error(f"Failed to initialize server: {e}")
        for server in mcp_servers:
            await server.cleanup()
        return
all_tools = []
for server in mcp_servers:
    tools = await server.list_tools()
    all_tools.extend(tools)

def format_for_llm(self) -> str:
    """Format tool information for LLM.

    Returns:
        A formatted string describing the tool.
    """
    output = {
        "type": "function", 
        "function": {
            "name": self.name, 
            "description": self.description,
            "parameters": {"type":"object","properties":{}}}}
    for key, value in self.input_schema["properties"].items():
        output['function']['parameters']['properties'][key] = {
            "type": value['type'],
            "description": value['description']
        }
    
    if self.input_schema.get('required', []):
        output['function']['required'] = self.input_schema['required']
    return output

これで、LLM が MCP のツールを使おうとすると JSON でリクエストを出してくれるので、それを拾うといいかんじになる。

ほか

Spring を使うと、この辺が楽にできるらしい。
- ik.am
- gemma3 は tools を chat_template で受け付けてないのにどうしているのかは謎