← All posts
Engineering·8 min read·

Building an AI Smart Display: Hardware Decisions and Trade-offs

How we chose the RK3566 SoC, 5-inch IPS display, and 4GB RAM for the Jinn HoloBox — and the trade-offs behind every component decision.

The Jinn HoloBox is built on a Rockchip RK3566 quad-core Cortex-A55 SoC with a Mali-G52 GPU, 4GB LPDDR4 RAM, and a 5-inch 720x1280 IPS display connected over DSI. We chose these components after months of prototyping because they hit the intersection of cost, thermal performance, and AI capability that a $299 consumer device demands. Here is how we arrived at each decision and what we gave up along the way.

Why does the SoC matter so much in a smart display?

In a smartphone or laptop, you can compensate for a weak processor with more RAM or a faster SSD. In a smart display with a fixed BOM, the SoC dictates almost everything: what OS you can run, how responsive the UI feels, whether on-device AI is feasible, and how much heat you generate in a fanless enclosure.

We evaluated three SoC families seriously:

FeatureRK3566RK3588Amlogic S905X4
CPU4x Cortex-A55 @ 1.8 GHz4x A76 + 4x A55 @ 2.4 GHz4x Cortex-A55 @ 2.0 GHz
GPUMali-G52 2EEMali-G610 MP4Mali-G31 MP2
NPU0.8 TOPS6 TOPSNone
RAM supportLPDDR4/4X up to 8 GBLPDDR4X/5 up to 32 GBDDR3/4 up to 4 GB
Approx. SoC cost (qty 1K)~$8-12~$35-45~$6-9
Mainline Linux supportGood (Armbian)MaturingLimited

The RK3588 is the obvious performance king, but at roughly 3-4x the SoC cost it would have pushed our retail price well past $449. The Amlogic S905X4 is cheaper, but its Mali-G31 GPU struggles with compositing a 720p WebGL UI, and it lacks an NPU entirely. The RK3566 sits in the middle: enough GPU headroom for our Three.js-based avatar renderer, a 0.8 TOPS NPU for future on-device inference, and mature mainline Linux support through Armbian.

How did we choose the display?

The display decision came down to three constraints: physical size for a countertop device, resolution for readable text, and interface type for driver simplicity.

We tested 3.5-inch, 5-inch, and 7-inch panels. The 3.5-inch felt cramped for anything beyond a clock face. The 7-inch required a larger enclosure that looked out of place on a kitchen counter. The 5-inch panel hit the sweet spot — large enough to show a conversational UI with an avatar, small enough to fit next to a coffee maker.

According to Mordor Intelligence's 2025 smart display market report, the 5-10-inch category accounted for 52.1% of smart display revenue, confirming that this size range dominates consumer adoption.

Resolution: 720x1280 vs 1080x1920

We prototyped with both. The 1080p panel looked sharper in side-by-side comparisons, but the difference at arm's length (typical smart display viewing distance of 2-4 feet) was negligible. The real cost was GPU load: driving 1080p at 30 fps required roughly 2.25x the pixel fill rate of 720p, which pushed the Mali-G52 harder and increased power draw by about 15% in our thermal testing. At 720p, we maintain a comfortable 30+ fps with headroom for the WebGL avatar renderer.

The DSI (Display Serial Interface) connection was non-negotiable. HDMI would have added a connector, a level shifter, and cable routing complexity. DSI gives us a direct digital link from the SoC to the panel with lower EMI and simpler board layout.

What about the 4 GB RAM decision?

Our software stack — Node.js runtime worker, Go gateway, Next.js web UI rendered in Chromium — is not lightweight. In our profiling on a 2 GB prototype:

Chromium consumed 400-600 MB rendering the avatar page
Node.js runtime worker used 80-150 MB depending on context window size
Go gateway used 20-40 MB
System + kernel needed ~200 MB

That left almost no headroom on 2 GB. Context switches slowed to a crawl under memory pressure, and the OOM killer occasionally terminated the Node.js worker mid-conversation.

With 4 GB, we have roughly 2.5 GB of working headroom after the base stack loads. That matters for future features: local embedding models, larger plugin sets, or caching conversation history. The LPDDR4 spec also gives us 3200 MT/s bandwidth, which keeps the GPU fed during avatar rendering.

How do we handle thermals without a fan?

The RK3566 has a TDP of approximately 3-5W under sustained load, which is manageable in a fanless design — but only with careful thermal planning. We use a die-cast aluminum heat spreader bonded to the SoC with a thermal pad. The enclosure has passive convection slots on the back.

During our 72-hour stress test (continuous conversation + avatar rendering + wake word detection), the SoC junction temperature stabilized at 68 degrees C. The Cortex-A55 thermal throttle point is 85 degrees C, so we have a 17-degree margin. In a 35 degrees C ambient environment (hot kitchen), that margin shrinks to about 10 degrees — tight but acceptable.

What we gave up

A fan would have let us run the CPU at sustained 1.8 GHz under all conditions. Without it, we occasionally see brief thermal throttling to 1.6 GHz during extended conversations in warm rooms. The real-world impact is an extra 50-100 ms of latency on LLM response processing — not perceptible to most users, but it is there.

Why did we choose eMMC over an SD card?

Early prototypes used microSD cards for storage. They were convenient for development but terrible for reliability. According to a 2024 study by Bunnie Huang on SD card failure modes, consumer microSD cards in always-on embedded devices show a 5-15% annual failure rate due to write amplification on flash cells.

We switched to 16 GB onboard eMMC. It is soldered to the board (no loose connections), has a built-in wear-leveling controller, and supports command queuing for faster random I/O. The trade-off is that storage is not user-replaceable, but for a consumer appliance that is actually a feature — it eliminates a common failure mode.

What about audio hardware?

Voice is the primary input for the HoloBox, so microphone quality matters more than in a typical display. We use a dual-MEMS microphone array with PDM (Pulse Density Modulation) input. PDM is natively supported by the RK3566's audio subsystem, which means no external ADC chip is needed.

The dual-mic setup enables basic acoustic echo cancellation (AEC) — critical because the HoloBox has a built-in speaker. Without AEC, the wake word engine would trigger on the device's own audio output. We process AEC in software using speexdsp, which adds roughly 2% CPU load on one core.

Speaker considerations

We use a 1.5W cavity speaker driven by a Class-D amplifier. It is not audiophile quality — it is optimized for speech clarity in the 300 Hz to 3.4 kHz vocal range. We deliberately rolled off bass response below 200 Hz to avoid cabinet resonance in the small enclosure. For users who want better audio, we expose a 3.5mm line-out jack.

Why PDM over I2S for microphones?

We evaluated both PDM (Pulse Density Modulation) and I2S (Inter-IC Sound) microphone interfaces. I2S microphones are more common in consumer electronics and provide a cleaner digital signal. However, the RK3566's PDM controller supports direct connection to PDM MEMS microphones without an external codec chip — eliminating one component from the BOM and simplifying the PCB layout.

The trade-off: PDM requires more CPU cycles for decimation filtering (converting the 1-bit oversampled stream to usable PCM audio). On the Cortex-A55, this costs approximately 1-2% of one core — acceptable given our CPU budget.

How does the BOM add up?

We do not publish exact BOM costs, but here is a rough breakdown by category for context:

ComponentApproximate % of BOM
SoC + RAM + eMMC~35%
Display panel + touch digitizer~25%
PCB + passives + connectors~15%
Enclosure + thermal~12%
Audio (mics + speaker + amp)~8%
Power supply + regulation~5%

The display and SoC together account for roughly 60% of hardware cost. This is typical for smart displays — Counterpoint Research's 2025 BOM analysis of the smart display segment found that display and processor consistently represent 55-65% of total component cost.

What would we change in v2?

Hindsight is valuable. If we were starting the hardware design today:

LPDDR4X instead of LPDDR4: The X variant offers 10-15% lower power consumption at the same bandwidth. We specced LPDDR4 because our initial supplier had better lead times, but LPDDR4X availability has improved.
USB-C for power: Our v1 uses a barrel jack for reliability, but USB-C would simplify the accessory ecosystem.
A third microphone: Moving from 2 to 3 MEMS mics would enable beamforming in addition to AEC, improving wake word detection in noisy kitchens.

Key takeaways

1.The RK3566 hits a practical sweet spot for AI smart displays: enough compute for a WebGL UI and on-device wake word, with a 0.8 TOPS NPU for future inference — all at a price point that supports a $299 device.
2.A 5-inch 720p IPS display balances readability, GPU load, and physical footprint for a countertop form factor.
3.4 GB RAM is the minimum for a Chromium + Node.js + Go stack; 2 GB causes OOM kills under real workloads.
4.Fanless thermal design works for the RK3566's 3-5W TDP but requires careful heat spreader engineering and accepts occasional throttling in hot environments.
5.PDM microphones eliminate the need for an external audio codec, simplifying the BOM, but require CPU-side decimation filtering — a worthwhile trade-off on the Cortex-A55.
6.Every BOM decision is a trade-off chain — cheaper SoC means less GPU headroom, which means lower resolution, which means the display can be cheaper too. The trick is finding the chain that delivers the best user experience at the target price.
hardware designsmart display hardwareproduct engineeringRK3566ARM SoC

Want an AI agent on your counter?

Jinn HoloBox is available for pre-order at $299 ($150 off retail).

Pre-Order Now