Editing AI (section)

== Programs ==

=== UI ===
jan.ai

open webui

==== CLI ====
[https://github.com/nazdridoy/ngpt ngpt]

[https://github.com/adhikasp/mcp-client-cli mcp-client-cli]

[https://github.com/f/mcptools mcptools] - For inspecting mcp servers.

{| class="wikitable"
!'''Client Name'''
!'''Description'''
!'''Key Features'''
!'''Implementation'''
!'''URL'''
|-
|oterm
|A text-based terminal client for Ollama with MCP tools, prompts, and sampling.
|Supports MCP tools, prompts, sampling; Streamable HTTP & WebSocket transports.
|TUI (Terminal UI)
|[https://github.com/ggozad/oterm GitHub]
|-
|ollama-mcp-client
|Python-based client for integrating local Ollama models with MCP servers.
|Seamless MCP integration, Git operations support, tool discovery.
|Python CLI
|[https://github.com/mihirrd/ollama-mcp-client GitHub]
|-
|mcp-client-for-ollama
|TUI client for interacting with MCP servers using Ollama, offering interactivity.
|Multi-server support, streaming responses, fuzzy autocomplete.
|TUI (Terminal UI)
|[https://github.com/jonigl/mcp-client-for-ollama GitHub]
|-
|Mcp-cli
|General-purpose CLI for interacting with MCP servers, supporting Ollama.
|Supports multiple providers, modular chat, context-aware completions.
|Command-line
|[https://mcpmarket.com/server/mcp-cli Source]
|-
|Mcp Client Ollama
|Python-based CLI for connecting Ollama to MCP servers, focusing on tool execution.
|stdio and SSE transports, JSON configuration, multiple server support.
|Python CLI
|[https://mcpmarket.com/server/mcp-client-ollama Source]
|}

===== GUI =====

=== Tasksel ===
Not sure if this is common knowledge, but some advice to all fellow VRAMlets who are offloading to RAM. Setting the number of threads is not good enough, you can get extra speed by manually setting core affinity.

For context:

I have a 13600K which has 6 P-cores. I had read that you should set --threads to that number, so I would run koboldcpp with --threads 6 and from some testing this was indeed the best option with that argument alone.

BUT, I looked at which cores were actually used and found e-cores also being used sometimes.

So the next step was to set the core affinity to just P-cores. Each P-core has two threads and CPU0-11 was P-cores, CPU12-19 was E-cores. Thus, I ran koboldcpp with one thread from each core:
 taskset -c 0,2,4,6,8,10 python kobodcpp.py [args]
My speed running command-r went from ~2.3 T/s to 2.67 T/s
Pretty good.
But, what if I use them fully, I thought. So I set --threads 12 and taskset -c 0,1,2,3,4,5,6,7,8,9,10,11
And I get a generation with 3.09 T/s
That's a whooping 33% increase from my initial.
Hope this is helpful, it actually had my basedfacing
captcha: pic rel