Hi there.
People keep asking me: “why an Olares One and not a Mac Studio / a beefy GPU rig / a Threadripper workstation / cloud?”. So today, the unfiltered story of how I landed there. Not a spec-sheet comparison — just the reasoning of a working dad who wanted to run local LLMs seriously without burning his weekends.
The context (because it matters)
Two lovely daughters at home, a full-time job, and a real desire to run LLMs locally to actually understand what works and what doesn’t. That gives me a very concrete brief:
- No time to tinker. I code enough at work; I don’t want to spend my evenings hand-compiling llama.cpp with exotic flags. I want it to boot and just work.
- A machine that fits in the living room. If it takes up a cubic meter and sounds like a hairdryer, my wife will kick it out within a week — and she’ll be right.
- Serious GPU horsepower. Not a tuned Pi, not a cute mini PC — a real GPU with enough VRAM to run a Qwen3 27B without breaking a sweat.
- Secure by default. I don’t want to open my home network just to play with hastily-exposed HTTP endpoints. Tight by default, please.
With that brief, several options were on the table. Here’s how each one fared.
Mac Studio — I really wanted to
I’m an Apple guy, no secret. The current Mac Studio, with its M-Ultra chip and 192 or 512 GB of unified memory — on paper, it’s gorgeous for running massive models locally. And the mini Mac Studio form factor would’ve been perfect for the living room.
The problem is inference speed. Unified memory works miracles for model size, but on a 27B Q4 you’re still looking at roughly half the throughput of an equivalent NVIDIA GPU. That gap shows in interactive use — it’s the difference between “good enough to play” and “good enough to use as a coding assistant all day”.
Plus the price tag: a well-loaded M3 Ultra is easily double the price of a NVIDIA mini-PC. For lower inference speed on the model sizes I care about. Wrong trade-off for what I wanted.
Verdict: amazing for 70B+ models that can’t fit in VRAM elsewhere, not for my daily coding/agent use.
Framework Desktop / AMD Ryzen AI Max+ 395 — same story
Friends of mine went with a Framework Desktop running the Ryzen AI Max+ 395 with 128 GB of unified memory. On paper the combo is very tempting — AI-tuned APU, compact form factor, and 128 GB is roomy enough to load big models at home. In practice, when we compare inference speeds on the same models, they don’t reach what I get out of the 5090M on a 27B Q4. Same conclusion as the Mac Studio: unified memory wins on model size, not on throughput.
Verdict: great if you mostly run 70B+ models that don’t fit anywhere else, below a dedicated NVIDIA GPU for my daily code-assist use.
Custom GPU PC — I’ve done my time
I’ve been building PCs since high school. I love it. But in 2026, with two young kids, picking a case, sizing the PSU, managing thermals, installing Linux clean, configuring CUDA, NVIDIA Container Toolkit, drivers, K8s to orchestrate my containers… that’s at least a lost weekend.
And then you maintain it. When a driver breaks, you debug. When a container doesn’t start, you dig into logs. It’s fun for five minutes, painful when you just want to try a new model on a Sunday morning and find yourself fixing a driver/CUDA version mismatch.
Verdict: technically the best perf/dollar, economically the worst use of my free time.
Threadripper workstation / tower server — no
Too big, too loud, too visible. See: “must fit in the living room”. KO on non-technical grounds, but those grounds matter when you share a home.
Cloud GPU — long term, it stings
I use cloud for one-off things (RunPod, Vast, Together). But running code-assist and agents that hit a LLM all day, at €1-2/h easily, adds up fast. Plus network latency, API dependence, the “what if the provider changes prices or terms tomorrow” question.
Verdict: great for bursts, not a replacement for daily use.
Why Olares One won
I stumbled on Olares almost by accident. I was looking for a mini GPU machine with an OS-as-a-service when I found their project. And it clicked.
Mini-PC form factor. RTX 5090M 24 GB GDDR7, Core Ultra 9 275HX, 96 GB DDR5. All of that in a box about the size of a Mac Mini. Wife-approved at first glance. It’s not pure design-object material but it’s compact, sober, fits in.
Turnkey OS. OlaresOS is their in-house Linux running under the hood, but used like a normal desktop OS. Built-in app store (Olares Market), everything containerized cleanly, you install an app like you would on a Mac. No docker-compose to write by hand, no YAML for Kubernetes — that’s already done under the hood.
Secure by default. Everything goes through a proxy with auth, no ports accidentally exposed to the Internet. You hit your services via a URL that resolves locally through an encrypted tunnel. No more “is my llama.cpp open to the planet” anxiety.
Generous config. 96 GB DDR5 is comfortable for running multiple containers in parallel. The 24 GB VRAM is the constraint (which is why I bring DFlash and friends back into dedicated posts) but you can fit a Qwen3 27B Q4 + drafter + agent harness without too much pain.
It’s kind of the Apple of AI. There, I said it. I’m pro-Apple — I like integrated ecosystems where hardware and software are designed together to just work. Olares does that for local AI. That’s probably what convinced me most: I don’t want to spend my time integrating the stack myself.
The verdict after a few months
I’ve had it for a few months now. Honestly? It does the job. I code on it daily via Claude Code + a local Qwen3 as backup, I test vLLM configs, I tune llama.cpp, I publish reproducible numbers. The machine hasn’t flinched, the OS updates cleanly, the app store keeps surfacing new apps.
Full honesty though: my original brief said “no time to tinker”. In theory. In practice, my tech passion takes over the moment an ArXiv paper drops — I run unstable llama.cpp forks, I compile exotic kernels, I sometimes break my config on a Sunday evening. Except that’s exactly where the turnkey baseline pays off: when a test goes sideways, I roll back to a clean state in two clicks. The managed base from Olares cleanly isolates my experiments from the rest of the box. So yes, I tinker — but on a foundation I didn’t have to build myself.
And full honesty all the way through: I end up spending way more time on this than I imagined when I picked the machine. Except it’s time I choose to spend, not time forced on me for maintenance. That’s a meaningful difference — and it’s precisely why I don’t regret the call.
What I sometimes miss: 32 GB of VRAM. The 24 GB blocks me on some exotic paths (cf. DFlash impossible). But that’s the trade-off for the mobile form factor.
If you want one
Good news, the Olares team gave me a personal referral link (thanks to my community contributions). If you were going to buy one anyway, going through this link gets you $400 off ($3,599 instead of $3,999) and pays me $200 per sale. Active until late June 2026.
I’m mentioning this out of transparency — and yes, incidentally, it helps keep the blog alive (hosting, domain, and frankly the time I spend writing here). And to answer the obvious question: yes, I’d recommend it even without the affiliate link. The recommendation doesn’t change, just the option to help the blog along the way.
TL;DR
- Mac Studio: too slow on inference for my daily code/agent use.
- Framework Desktop (AMD AI Max+ 395, 128 GB): same as the Mac — unified memory wins on size, not on speed.
- Custom GPU PC: best perf/dollar but worst use of my free time.
- Tower workstation: too big, wife veto.
- Cloud GPU: occasional yes, daily no (cost + dependency).
- Olares One: best compromise of compact form factor, turnkey OS, serious GPU and security by default.
It’s a very personal choice, tied to my context. If you have a basement with good airflow and like tinkering, build your own GPU tower — it’s great. If you only run 70B+ models and speed isn’t a priority, get a Mac Studio. But for a developer who wants a living-room AI lab that just works, the Olares One ticked every box for me.
See you next time!