General-purpose instruct model. The default starting point for most LM Studio users — strong reasoning, fast on 16 GB hardware.
How fast does the LM Studio app run on your hardware?
LM Studio is the desktop application that turns a consumer laptop into a benchmarkable local AI workstation. Built for Mac and Windows users who want reproducible tokens-per-second numbers rather than vague cloud-API impressions, the lm studio app exposes every measurable surface of a local LLM run: GPU utilization, KV cache size, prompt-processing speed and generation throughput. Together with image generation and a plugin marketplace, it is the most measurement-friendly local AI tooling in 2026.
Try LM Studio with a local LLM, right here
Pick a model on the left, type a prompt or click a chip, and watch a simulated streaming response with live tokens-per-second readout. This is a deterministic mock — the real thing runs on your hardware after install.
Pick a prompt below or type your own. The mock streams tokens at the picked model's typical local throughput.
▸ no network call · the playground responds deterministically from a small set of curated answers
Will LM Studio run on my Mac or Windows PC?
Drag the RAM slider, pick your GPU class, and see which open-weight models will fit. Numbers assume Q4_K_M quantization — the default LM Studio recommends for most consumer hardware.
Apple Silicon (M1/M2/M3/M4) counts as "dedicated" — unified memory feeds the GPU directly.
LM Studio model library — open weights by family
LM Studio's model browser pulls open weights directly from Hugging Face. Below is a snapshot of the families most commonly downloaded in 2026 — click a chip to filter.
Heavy-class generalist. Comfortable on a 64 GB Apple Silicon Mac or a 48 GB+ Nvidia rig; reasoning approaches GPT-4 class.
Strong multilingual model with native tool-calling support. The current sweet spot for 32 GB machines.
Reasoning-tuned distill that shows its work. Excellent for code review and step-by-step math, on a 16 GB machine.
Apache-licensed mid-size workhorse with a 128K context window. Popular for retrieval-augmented workflows.
OpenAI's 2025 open-weight release. Slower per token than smaller models but the highest local quality short of 70B-class.
Google's open-weight family, tuned for instruction following with light hardware footprint.
Reasoning-focused 14B from Microsoft. Punches well above its weight on code and math benchmarks for the size.
Stable Diffusion XL — the open-weight image generator most commonly run inside LM Studio's diffusion mode.
Apache-licensed diffusion model. State-of-the-art quality at home on 24 GB+ unified or dedicated VRAM.
LM Studio tokens-per-second across Mac and Windows
Illustrative throughput numbers measured at Q4_K_M quantization. Pick a hardware preset and the bars race to the matching tokens-per-second rate.
LM Studio Mac vs LM Studio Windows — same UI, different back-ends
Same UI, same model catalogue, but the back-end stack changes per OS. The lm studio mac edition has the MLX runtime; the lmstudio windows edition has CUDA, ROCm and XPU.
lm studio mac
macOS 13.4 or later · Apple Silicon nativeNotarized .dmg with the MLX runtime for measurably faster on-device inference on M-series chips. Integrates with Spotlight, Shortcuts and the menu bar.
- Native Apple Silicon (M1 onwards)
- MLX runtime for diffusion + LLMs
- Unified memory used as VRAM
- Shortcuts & CLI hooks
min unified RAM≥ M1
Apple Silicon.dmg
notarized installer
lmstudio windows
Windows 10 / 11 · x64 + ARM64Signed MSI installer with full GPU acceleration across NVIDIA CUDA, AMD ROCm and Intel Arc XPU back-ends. SmartScreen-clean, AppLocker-friendly.
- NVIDIA CUDA back-end
- AMD ROCm back-end
- Intel Arc / XPU acceleration
- Vulkan fallback for any GPU
min system RAM8 GB
recommended VRAM.msi
signed installer
Extend the app with lm studio plugins
The plugin runtime turns LM Studio into a programmable surface. Browse community extensions, install with one click — every plugin runs in a sandboxed worker on your machine.
Live web search before generation. Hooks into SearXNG or a self-hosted Brave Search endpoint.
Index a local folder of PDFs and Markdown, then chat against it with citations.
Sandboxed Python and JavaScript execution for tool-calling workflows. Results stream back as messages.
Drops the current Git diff into the system prompt — great for code review with a local model.
Connect any MCP server (Model Context Protocol) to your local model as a tool-calling target.
Allow the model to run whitelisted shell commands. Useful for local agentic workflows.
Monochrome editor theme with monospace everywhere and reduced motion.
⌘K-style global command palette across models, prompts and conversations.
LM Studio quantization explained — Q4, Q5, Q8 and what they cost you
Quantization shrinks a model's weights by storing them in fewer bits. The slider walks you through the common levels — watch the file size shrink and the quality meter drop in real time.
▸ Q-suffixes from llama.cpp · K_M variants pack the most quality per byte at most levels
The plain answer to what is lm studio is this: a desktop GUI that downloads open-weight models from Hugging Face, loads them via llama.cpp or MLX, and reports the numbers that matter — tokens-per-second, time-to-first-token, memory footprint and GPU layer offload. LM Studio is the desktop application that turns a consumer laptop into a benchmarkable local AI workstation. Built for Mac and Windows users who want reproducible tokens-per-second numbers rather than vague cloud-API impressions, the lm studio app exposes every measurable surface of a local LLM run: GPU utilization, KV cache size, prompt-processing speed and generation throughput. Together with image generation and a plugin marketplace, it is the most measurement-friendly local AI tooling in 2026.
LM Studio for benchmarks, in plain terms
The plain answer to what is lm studio is this: a desktop GUI that downloads open-weight models from Hugging Face, loads them via llama.cpp or MLX, and reports the numbers that matter — tokens-per-second, time-to-first-token, memory footprint and GPU layer offload. For a benchmarks-driven audience, what is lm studio app comes down to a measurement harness with a chat UI on top. The lm studio ai workflow is straightforward: pick a quantization, run a fixed prompt set, record throughput. Because everything runs locally and the model weights are byte-stable across machines, the same prompt-and-model combination produces directly comparable numbers on any hardware.
Is LM Studio safe to leave running as a benchmark daemon
Anyone running benchmarks at scale will eventually ask is lm studio safe to leave open as a measurement daemon for hours. The answer is yes. Code-signed binaries on Mac and Windows, clean independent scans on VirusTotal, and the runtime opens no inbound connections by default. The lm studio safe story rests on architecture: model weights pulled from Hugging Face over HTTPS with verifiable hashes, license activation as the single outbound call, and a local API server bound to 127.0.0.1 unless explicitly opened. For a continuous-integration setup that drives benchmark sweeps, the absence of telemetry and the documented network boundary are the deciding factors.
The reason LM Studio wins in published benchmarks is consistency: the tokens-per-second number you see in the chat UI is the same number the CLI reports and the same number the local API server logs.Arjun Mehta — ML Performance Engineer
LM Studio Mac: MLX is the headline number
The lm studio mac edition is the platform of choice for repeatable Apple Silicon benchmarks. Distributed as a notarized .dmg, runs natively on Apple Silicon, requires macOS 13.4 or later. The MLX runtime is the main reason: on M-series chips, MLX-format models deliver materially higher tokens-per-second than the cross-platform llama.cpp back-end, and the difference compounds as model size grows. A 7B model on an M2 Air clocks around 22 tok/s under Q4_K_M; the same model on an M3 Max approaches 64 tok/s. The lmstudio mac install integrates with Shortcuts and the Metal system counters, which makes correlating throughput with GPU utilisation trivial.
LM Studio Windows: CUDA, ROCm, XPU and the variance problem
Windows benchmarks have wider variance because the GPU stack varies more. The lm studio download windows installer is an MSI for Windows 10 and Windows 11, both x64 and ARM64. The lmstudio windows edition wires up CUDA on NVIDIA cards, ROCm on AMD, and Intel Arc XPU on the integrated stack. An RTX 4060 with 8 GB of VRAM hits roughly 56 tok/s on Llama-3.1-8B-Q4, an RTX 4090 sees over 130 — dedicated VRAM and Tensor Core count both contribute. Vulkan is the universal fallback when no vendor SDK is installed, slower but portable. The Windows installer is digitally signed so it passes SmartScreen without flags.
Benchmarking the lm studio local llm workflow end to end
The lm studio local llm workflow has the same three measurable steps in every harness: pick a model, download it, load to memory and stream. Behind that the lm studio local llm app handles memory mapping, GPU layer offload decisions and KV cache sizing. As a lm studio local llm desktop app it competes with Ollama, GPT4All, llama.cpp's own server and various web UIs over it — and differentiates with a visual model catalogue, real-time tokens-per-second readouts, a side-by-side compare mode, and consistent benchmark output that other front-ends do not match.
Common families benchmarked
- Llama family — 3.1, 3.2 and 4 in 8B, 70B and 405B parameter sizes.
- Qwen, DeepSeek, Mistral, Gemma, gpt-oss and Phi — broad coverage of open releases from major labs.
On an M3 Max with the MLX runtime active, a 14B model crosses thirty-eight tokens per second at Q4_K_M. That is the moment a hardware reviewer realises the local AI category has shifted under their feet.Arjun Mehta — ML Performance Engineer
Extending benchmark pipelines with lm studio plugins
The lm studio plugins runtime shipped in late 2024 and is the most-watched extension surface in the local AI tooling category. Plugins run in a sandboxed worker, are written in TypeScript or Python, and can intercept inference requests, attach tool-calling back-ends, expose new prompt processors and add UI surfaces. From a benchmarks perspective, the plugin API is the place where you wire LM Studio into a wider measurement pipeline — logging tokens-per-second to a time-series database, attaching a custom prompt cache, scripting RAG over a local corpus for retrieval-pipeline benchmarks, or hooking a per-token cost calculator alongside the inference output so a reviewer can publish dollars-per-million-tokens numbers in the same chart as the throughput numbers. A well-designed measurement plugin sits in the same worker as the inference call, which means timing data is captured before any IPC overhead is added.
LM Studio image generation throughput across hardware
The lm studio image generation feature uses local diffusion models — Stable Diffusion 1.5, SDXL, FLUX-class — through the same model browser as the language models. Throughput shifts dramatically with available memory: SDXL-class diffusion benchmarks at 1024×1024 typically need 12 GB of fast memory to run without swapping; FLUX-class models cross the threshold around 16 GB on dedicated VRAM and slightly higher on Apple Silicon unified memory. On Apple Silicon the MLX runtime extends to diffusion and an M3 Max approaches mid-range NVIDIA throughput at SDXL. Outputs land in a configurable local folder, EXIF records prompt and seed, and benchmark harnesses can chase the same prompt-seed pair across hardware for reproducible side-by-side image comparison without any external API call.
The lm studio download for a measurement rig
The recommended lm studio download path for a benchmark rig is the publisher's website with checksums verified. The Mac edition arrives as a notarized .dmg, the Windows edition as a signed MSI suitable for unattended deployment, and a Linux AppImage rounds out the trio. Every feature is available from first launch, so a benchmark suite can validate the full extraction-and-reporting pipeline before any production model is touched. For shops standardised on storefronts, the Mac App Store and Microsoft Store carry the side-tools as separate downloads.
Benchmark-rig install checklist
- Pull the installer over HTTPS, verify the SHA-256 against the published release notes.
- Disable anonymous telemetry in Settings before the first run.
- Pin the GGUF or MLX model file to a fixed directory under source control.
Final word: lmstudio is still the most measurement-friendly tool
For 2026 hardware reviewers, lmstudio remains the most measurement-friendly entry point into the local AI ecosystem — the tokens-per-second readout in the chat UI matches what the CLI reports, which matches what the API server logs. That consistency is what makes the application defensible in published benchmarks. The combination of lm studio mac and lm studio for windows builds, Apple Silicon acceleration, GPU support across NVIDIA / AMD / Intel, plugins and image generation covers every measurable surface a reviewer might want. Hardware is the constraint: 16 GB is the floor, 32 GB or a dedicated GPU is where the experience becomes pleasant.
LM Studio FAQ
What is the minimum hardware to publish reproducible LM Studio benchmarks?
Does CPU-only inference produce comparable benchmark numbers?
How much disk space should a benchmark rig provision?
Does the benchmark daemon leak any data over the network?
Can anonymous telemetry be turned off?
Does LM Studio store conversation history?
Which model formats does LM Studio benchmark cleanly?
Is side-by-side model comparison built in?
Where do downloaded weights live?
Does the CLI emit machine-readable benchmark output?
lms CLI streams tokens-per-second, time-to-first-token and total-tokens stats in JSON when run with --json, suitable for piping into a time-series database.Is the local API server safe to leave running for an automated benchmark loop?
127.0.0.1 by default, only programs on the same machine can reach it, and the request and response payloads are loggable.Can a custom plugin record per-token throughput?
lms plugin create — the runtime exposes per-token callbacks suitable for wiring into a custom metrics pipeline.Download LM Studio for Mac and Windows
Install LM Studio on your Mac or Windows PC and try a local model end to end — chat, image generation and the OpenAI-compatible server, all on-device.
Download
Pull the installer from the publisher's official site or the platform store.
Pick a model
The first-run wizard recommends a Q4_K_M model sized to your hardware.
Chat
Start a conversation in the built-in interface — no account, no API key.