Hi there.
You just got an Olares One (or you’re thinking about it), and you might wonder why I keep posting “88 t/s”, “184 t/s”, “80 t/s with DFlash” numbers on this blog. Well, all those configs live in a personal Olares Market you can add to your device in 30 seconds. Today, here’s how, plus a tour of the 28 apps inside.
What this market is
The official Olares Market is beclab/apps — the default catalog baked into your device, full of generic apps for every Olares model.
Except the One has an unusual GPU: RTX 5090 Laptop, 24 GB GDDR7, sm_120 consumer Blackwell. And every generic app leaves a huge amount of throughput on the table because they target Ampere or Ada by default.
So I built mine: orales-one-market. It’s also an official Olares market source (same API, same protocol), but every app is hand-tuned for the 5090M: Hadamard rotation (TurboQuant), q4_0 KV cache to double the context, native sm_120, NO_VMM when HAMi gets in the way, vLLM with speculative decoding wired up. In short, what I publish on the blog runs in production on my device through these apps.
How to add it
Three steps, really:
- Open Olares Market on your device
- Go to Settings → Add Source
- Paste this URL:
https://orales-one-market.aamsellem.workers.dev
That’s it. The market syncs every 5 minutes; apps appear in the store next to the official catalog. Install whichever you want, they run in your isolated Olares Kubernetes cluster, done.
The 28 apps at a glance
LLM inference via llama.cpp (text)
All on b8667 + TurboQuant (Hadamard rotation) + q4_0 KV cache:
- llamacppqwen35a3bone — Qwen3.5 35B-A3B UD-Q4_K_XL, 129 t/s, 64K context, thinking mode
- llamacppqwen36a3bone — Qwen3.6 35B-A3B, next-gen Qwen3.5
- llamacppqwen36dense27bone — Qwen3.6 27B dense NVFP4
- llamacppqwen35iq4one — Qwen3.5 35B-A3B IQ4_XS, compact build
- llamacppnemotron30a3bone — Nemotron 3 Nano 30B-A3B, 184 t/s, 128K, Mamba-2 hybrid
- llamacppglm47flash — GLM-4.7-Flash 30B-A3B
- gemma426ba4bone — Gemma 4 26B-A4B (MoE + vision), 119 t/s, 64K, LMArena 1441
- gemma4e2bone — Gemma 4 E2B 2.3B (ultra-fast, voice pipeline)
- cascade230a3bone — Nemotron Cascade 2 30B-A3B (math/code specialist)
- qwen3coder30a3bone — Qwen3 Coder 30B-A3B (coding agent)
- devstralsmallone — Devstral Small 2507 (coding agent)
- nemotron3nano4bone — Nemotron 3 Nano 4B Q8 (lightweight edge AI)
Vision via llama.cpp
- qwen35a3bvisionone — Qwen3.5 35B-A3B + mmproj, 131 t/s, 32K, multimodal
- qwen35iq4visionone — Qwen3.5 35B-A3B Vision IQ4_XS
- qwen36a3bvisionone — Qwen3.6 35B-A3B Vision, next-gen
DFlash speculative decoding
- dflashqwen36one — Qwen3.6 27B via DFlash (spiritbuun fork, 80 t/s on sm_120)
- lucedflashqwen36one — Qwen3.6 27B via Lucebox DFlash (custom Blackwell kernels)
LLM inference via vLLM
- vllmqwen3527bone — Qwen3.5 27B NVFP4 + speculative decoding
- vllmqwen36dense27bone — Qwen3.6 27B dense NVFP4-MTP
- vllmqwen36turbo27bone — Qwen3.6 27B + Sandermage Genesis + TurboQuant K8V4 (my Turbo build, 88 t/s)
- vllmgemma4e4bone — Gemma 4 E4B (Vision + Audio) via vLLM
EXL3 / TabbyAPI
- exl3qwen35a3bone — Qwen3.5 35B-A3B EXL3 4bpw via TabbyAPI + ExLlamaV3
Voice / Audio (ASR + TTS)
- vllmvoxtral3bone — Voxtral Mini 3B ASR, 2.7× faster than Whisper, 3.2% WER
- vllmvoxtralrt4bone — Voxtral Realtime 4B streaming, WebSocket, 480 ms latency
- vllmvoxtraltts4bone — Voxtral 4B TTS, 20 voices, 9 languages, 70 ms latency
- qwen3ttstone — Qwen3-TTS 1.7B, 9 voices, zero-shot voice clone
- omnivoiceone — OmniVoice TTS, 646 languages, voice cloning + voice design
Creative / Music
- acestepxlone — ACE-Step 1.5 XL, AI music generation (4B DiT, Turbo+SFT)
Why a fork instead of pushing everything upstream
Because the official market targets every Olares (mini, pro, One). Configs there are necessarily generic. On the One, that leaves 30-50% of t/s on the table compared to sm_120-specific tuning. So I keep my market in parallel for those who want the max, and I keep contributing upstream to beclab/apps when relevant (I’m actually the only external contributor on that repo).
TL;DR
URL to add in Olares Market → Settings → Add Source:
https://orales-one-market.aamsellem.workers.dev
28 ready-to-install apps, hand-tuned for the 5090M of the Olares One. Syncs every 5 minutes, updated whenever I tune a new config. If it saves you a weekend of benchmarking, mission accomplished.
See you next time!
Disclosure — If you don’t have an Olares One yet and what you see here makes you want one, ordering through this referral link gets you $400 off ($3,599 instead of $3,999) and pays me $200. I’m mentioning this out of transparency — and yes, incidentally, it helps keep the blog alive (hosting, domain, and the time I spend writing here). Link valid until late June 2026.