My personal Olares Market — 28 apps hand-tuned for the Olares One, one click away

Hi there.

You just got an Olares One (or you’re thinking about it), and you might wonder why I keep posting “88 t/s”, “184 t/s”, “80 t/s with DFlash” numbers on this blog. Well, all those configs live in a personal Olares Market you can add to your device in 30 seconds. Today, here’s how, plus a tour of the 28 apps inside.

What this market is

The official Olares Market is beclab/apps — the default catalog baked into your device, full of generic apps for every Olares model.

Except the One has an unusual GPU: RTX 5090 Laptop, 24 GB GDDR7, sm_120 consumer Blackwell. And every generic app leaves a huge amount of throughput on the table because they target Ampere or Ada by default.

So I built mine: orales-one-market. It’s also an official Olares market source (same API, same protocol), but every app is hand-tuned for the 5090M: Hadamard rotation (TurboQuant), q4_0 KV cache to double the context, native sm_120, NO_VMM when HAMi gets in the way, vLLM with speculative decoding wired up. In short, what I publish on the blog runs in production on my device through these apps.

How to add it

Three steps, really:

Open Olares Market on your device
Go to Settings → Add Source
Paste this URL:

https://orales-one-market.aamsellem.workers.dev

That’s it. The market syncs every 5 minutes; apps appear in the store next to the official catalog. Install whichever you want, they run in your isolated Olares Kubernetes cluster, done.

The 28 apps at a glance

LLM inference via llama.cpp (text)

All on b8667 + TurboQuant (Hadamard rotation) + q4_0 KV cache:

llamacppqwen35a3bone — Qwen3.5 35B-A3B UD-Q4_K_XL, 129 t/s, 64K context, thinking mode
llamacppqwen36a3bone — Qwen3.6 35B-A3B, next-gen Qwen3.5
llamacppqwen36dense27bone — Qwen3.6 27B dense NVFP4
llamacppqwen35iq4one — Qwen3.5 35B-A3B IQ4_XS, compact build
llamacppnemotron30a3bone — Nemotron 3 Nano 30B-A3B, 184 t/s, 128K, Mamba-2 hybrid
llamacppglm47flash — GLM-4.7-Flash 30B-A3B
gemma426ba4bone — Gemma 4 26B-A4B (MoE + vision), 119 t/s, 64K, LMArena 1441
gemma4e2bone — Gemma 4 E2B 2.3B (ultra-fast, voice pipeline)
cascade230a3bone — Nemotron Cascade 2 30B-A3B (math/code specialist)
qwen3coder30a3bone — Qwen3 Coder 30B-A3B (coding agent)
devstralsmallone — Devstral Small 2507 (coding agent)
nemotron3nano4bone — Nemotron 3 Nano 4B Q8 (lightweight edge AI)

Vision via llama.cpp

qwen35a3bvisionone — Qwen3.5 35B-A3B + mmproj, 131 t/s, 32K, multimodal
qwen35iq4visionone — Qwen3.5 35B-A3B Vision IQ4_XS
qwen36a3bvisionone — Qwen3.6 35B-A3B Vision, next-gen

DFlash speculative decoding

dflashqwen36one — Qwen3.6 27B via DFlash (spiritbuun fork, 80 t/s on sm_120)
lucedflashqwen36one — Qwen3.6 27B via Lucebox DFlash (custom Blackwell kernels)

LLM inference via vLLM

vllmqwen3527bone — Qwen3.5 27B NVFP4 + speculative decoding
vllmqwen36dense27bone — Qwen3.6 27B dense NVFP4-MTP
vllmqwen36turbo27bone — Qwen3.6 27B + Sandermage Genesis + TurboQuant K8V4 (my Turbo build, 88 t/s)
vllmgemma4e4bone — Gemma 4 E4B (Vision + Audio) via vLLM

EXL3 / TabbyAPI

exl3qwen35a3bone — Qwen3.5 35B-A3B EXL3 4bpw via TabbyAPI + ExLlamaV3

Voice / Audio (ASR + TTS)

vllmvoxtral3bone — Voxtral Mini 3B ASR, 2.7× faster than Whisper, 3.2% WER
vllmvoxtralrt4bone — Voxtral Realtime 4B streaming, WebSocket, 480 ms latency
vllmvoxtraltts4bone — Voxtral 4B TTS, 20 voices, 9 languages, 70 ms latency
qwen3ttstone — Qwen3-TTS 1.7B, 9 voices, zero-shot voice clone
omnivoiceone — OmniVoice TTS, 646 languages, voice cloning + voice design

Creative / Music

acestepxlone — ACE-Step 1.5 XL, AI music generation (4B DiT, Turbo+SFT)

Why a fork instead of pushing everything upstream

Because the official market targets every Olares (mini, pro, One). Configs there are necessarily generic. On the One, that leaves 30-50% of t/s on the table compared to sm_120-specific tuning. So I keep my market in parallel for those who want the max, and I keep contributing upstream to beclab/apps when relevant (I’m actually the only external contributor on that repo).

TL;DR

URL to add in Olares Market → Settings → Add Source:

https://orales-one-market.aamsellem.workers.dev

28 ready-to-install apps, hand-tuned for the 5090M of the Olares One. Syncs every 5 minutes, updated whenever I tune a new config. If it saves you a weekend of benchmarking, mission accomplished.

See you next time!

Disclosure — If you don’t have an Olares One yet and what you see here makes you want one, ordering through this referral link gets you $400 off ($3,599 instead of $3,999) and pays me $200. I’m mentioning this out of transparency — and yes, incidentally, it helps keep the blog alive (hosting, domain, and the time I spend writing here). Link valid until late June 2026.