Editing AI (section)

== New Shit ==
[https://github.com/ggml-org/llama.cpp/tree/master/tools/server#using-multiple-models llama.cpp using multiple models (llama.cpp router)]

"[https://old.reddit.com/r/LocalLLaMA/comments/1owskm6/windows_llamacpp_is_20_faster/ The bf16 support is a big difference. It will eventually end up on your machine next year]"

[https://www.youtube.com/@stanfordonline/videos Stanford Online][https://www.youtube.com/watch?v=Ub3GoFaUcds Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 1 - Transformer] 

https://github.com/onyx-dot-app/onyx [https://www.onyx.app/ Onyx] is a feature-rich, self-hostable Chat UI (Another psudo-open project) 

https://github.com/microsoft/ai-agents-for-beginners?tab=readme-ov-file

https://github.com/dwmkerr/terminal-ai

https://medevel.com/15-free-open-source-ai-terminal-assistants/?utm_source=chatgpt.com

[https://old.reddit.com/r/LocalLLaMA/comments/1nt2c38/llamacpp_moe_models_find_best_ncpumoe_value/ MOE Benchmark]

[https://www.librechat.ai/docs/features/code_interpreter https://www.librechat.ai/] https://github.com/danny-avila/LibreChat - Replace open-webui

[https://rentry.org/geechan geechan - SillyTavern GLM System prompts]

https://github.com/github/spec-kit https://www.youtube.com/watch?v=em3vIT9aUsg

https://joeyagreco.medium.com/reverse-engineering-the-hottest-new-game-5362cfe7c452

https://blog.plasticlabs.ai/blog/YouSim%3B-Explore-The-Multiverse-of-Identity?utm_source=chatgpt.com

https://worldsim.nousresearch.com/console

[https://www.reddit.com/r/LocalLLaMA/comments/1n6od0s/%E6%AE%8B%E5%BF%83_zanshin_navigate_through_media_by_speaker/ 残心 / Zanshin - Navigate through media by speaker]

[https://www.youtube.com/watch?v=Jaj_SQsF-BI LLM from Scratch Tutorial – Code & Train Qwen 3]

[https://github.com/Qetesh/miniflux-ai miniflux-ai]

[https://www.reddit.com/r/SillyTavernAI/comments/1msz8ao/glm_45_preset/ glm_45_preset]

[https://github.com/simstudioai/sim simstudioai/sim] [https://www.youtube.com/watch?v=mFKyiyGPu1I Install Sim Locally with Ollama: AI Agent Workflow Builder]

[https://huggingface.co/fofr/sdxl-emoji sdxl-emoji]

[https://github.com/gabber-dev/gabber gabber]

[https://www.youtube.com/watch?v=JbHKMibTb5Q RamaLama] - Ollama alternative

[https://github.com/GewoonJaap/qwen-code-cli-wrapper qwen-code-cli-wrapper]

Docling, DOTS OCR, and Ollama OCR

https://agents.md/

[https://github.com/ComfyUI-Workflow/awesome-comfyui awesome-comfyui] http rest

https://github.com/sakalond/StableGen - generate textures for blender

[https://www.reddit.com/r/comfyui/comments/1mvqeyw/what_upscaler_is_the_best_now/ what_upscaler_is_the_best_now/]

[https://www.reddit.com/r/LocalLLaMA/comments/1mi7bem/new_llamacpp_options_make_moe_offloading_trivial/ You can use `-ngl 49` and just pass `--n-cpu-moe 20`. Also add `-fa` and `-ctk q8_0 -ctv q8_0`.]

PIP_BREAK_SYSTEM_PACKAGES=1 comfy install

https://overpass-turbo.eu/

[https://www.youtube.com/watch?v=mi2KmpV3Wvg Qwen-3 Coder CLI Forgets Everything. I Gave It a Perfect Memory.]

https://docs.vllm.ai/en/latest/getting_started/installation/cpu.html#related-runtime-environment-variables

https://modal.com/blog/fast-cheap-batch-transcription

[https://www.youtube.com/watch?v=EUG65dIY-2k Make your AI Agents 10x Smarter with GraphRAG (n8n)]

https://huggingface.co/rednote-hilab/dots.ocr

[https://www.youtube.com/watch?v=yNPwsKa52zs YOLOE: Next Gen Computer Vision - Zero Training Required!]

[https://github.com/campfirein/cipher Cipher] - Cipher is an opensource memory layer specifically designed for coding agents.

https://smcleod.net/2024/12/bringing-k/v-context-quantisation-to-ollama/

[https://www.youtube.com/watch?v=g21royNJ4fw Local LightRAG: A GraphRAG Alternative but Fully Local with Ollama]

[https://www.youtube.com/watch?v=oetP9uksUwM Graph RAG Evolved: PathRAG (Relational Reasoning Paths)]

[https://uigenoutput.tesslate.com/uigen-x-4b-0729 UIGEN-X-4B-0729]

[https://www.youtube.com/watch?v=p7yRLIj9IyQ The Only Embedding Model You Need for RAG] 

https://ollama.com/library/smallthinker - Can be used as a draft model for QwQ-32B giving a %70 speed up.

sqrt(params * active) - A rule of thumb to calculate the equivalent number of parameters that a dense model would have.

[https://huggingface.co/wushuang98/Direct3D-S2 Direct3D‑S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention]

[https://www.youtube.com/watch?v=PxcOIINgiaA Make RAG 100x Better with Real-Time Knowledge Graphs]

[https://huggingface.co/rednote-hilab/dots.llm1.inst dots1]

[https://github.com/lmg-anon/mikupad mikupad]

[https://github.com/RVC-Boss/GPT-SoVITS/wiki/GPT%E2%80%90SoVITS%E2%80%90features-(%E5%90%84%E7%89%88%E6%9C%AC%E7%89%B9%E6%80%A7) GPT‐SoVITS‐features (各版本特性)]

[https://www.reddit.com/r/StableDiffusion/comments/1je3b9m/are_there_any_free_working_voice_cloning_ais/ Are there any free working voice cloning AIs?]

[https://github.com/zylon-ai/private-gpt privategpt] - [https://www.reddit.com/r/LocalLLaMA/comments/1ktm248/what_model_should_i_choose/mtunwvz/ privategpt imho Is the best for rag if you need the source, It not only  lists the PDF used for the answer but also the page, and Is quite precise. So for studyb and search in a library Is the best i know]

[https://diffusers-flux-quant.hf.space/ FLUX Model Quantization Challenge]

[https://www.reddit.com/r/LocalLLaMA/comments/1ki7tg7/dont_offload_gguf_layers_offload_tensors_200_gen/mrdqlfr/ dont_offload_gguf_layers_offload_tensors_200_gen]

[https://www.youtube.com/watch?v=ZoyPqXvnnZ8 I Built the Ultimate RAG MCP Server for AI Coding (Better than Context7)]

[https://www.youtube.com/watch?v=Dgo6dyPMv_Q NEW FramePack F1 Model - Much Better Results - Bonus How to Install Sage]

[https://huggingface.co/DavidAU/Qwen3-30B-A6B-16-Extreme Qwen3-30B-A6B-16-Extreme]

https://docs.google.com/document/d/12ATcyjCEKh8T-MPDZ-VMiQ1XMa9FUvvk2QazrsKoiR8/edit?tab=t.0

[https://www.youtube.com/watch?v=LMH62T_XCF4 This AI Model has me excited about the future of Local LLM's | Qwen3-30B-A3B]

https://www.reddit.com/r/LocalLLaMA/comments/1ev8n2s/exclude_top_choices_xtc_a_sampler_that_boosts/

https://blog.runpod.io/upscaling-videos-using/

[https://github.com/LostRuins/koboldcpp/wiki koboldcpp wiki]

[https://rentry.org/MixtralForRetards Mixtral for Retards]



koboldcpp.exe 13B-HyperMantis.ggmlv3.q4_K_M.bin --debug --usecublas --usemlock --contextsize 8192 --blasbatchsize 512 --psutil_set_threads --threads 6 --blasthreads 10 --gpulayers 5 --highpriority --stream --usemlock --unbantokens --smartcontext

If you're running 1.35 and a superHOT model, you should also add --linearrope which should make them perform better.

RP recommened models: "superhot, airoboros, wizard-vicuna, guanaco, chronos are a few commonly discussed models off the top of my head. For me, it's superhot or guanaco (one or the other, not the merge though)"