1  Model Architecture

To self-host a model, you first load its weights onto a GPU. How much memory that takes is set by the model’s architecture. Kimi K2.6 employs DeepSeek V3-inspired architecture with MoE parameters:

Want to derive these numbers from scratch? Read Kipply’s legendary parameter counting post.

If you want some background on transformers architecture you must read: The Illustrated Transformer and if you are too tired to read another blog try LLM Visualizer which is the greatest LLM visualization I have ever seen.