Note · Store Tech

The register is down: field triage for POS terminals

~7 min read · POS · networking · field support
Glass 3D point-of-sale terminal opened for repair with a multimeter beside it

It's a few minutes before open and a store calls: the register won't come up, and the line is already forming. After five years supporting registers, networks and peripherals across an 867-store footprint, I have a fixed way of thinking about the next fifteen minutes. This is it — the actual ladder, and the reasoning behind each rung.

Rule zero: keep the store selling

The instinct is to start debugging. Resist it. The first move is always: can this store take money right now? Shift the line to another lane, bring up a backup terminal, switch to the offline/failover flow if the estate has one. A store that can sell buys you calm, and calm is when diagnosis goes well. Triage and root cause are different jobs — at 8:59am you're doing triage.

The ladder: cheapest, most-likely first

Debugging in the field is the discipline of changing one variable at a time, ordered by how cheap the check is and how often it's the culprit:

  • Power. Not "is it plugged in" — is the outlet live (that's what the store's maintenance log calls "the register keeps dying at lane 3"), is the power brick's LED on, does the terminal get to any sign of life at all. Dead outlet, tripped breaker, and failed PSU between them explain a shocking share of "the register is broken."
  • Peripherals. Unplug everything — scanner, printer, cash drawer, pin pad — and boot bare. A shorted USB device or a receipt printer wedged in an error state can hang a terminal at boot. If bare boot works, reconnect one device at a time until it breaks again. That device is your answer, not the register.
  • Network. Can the terminal see the switch (link lights), can it reach the store's gateway, can the other lanes? A register that boots but can't sign in is usually a network story, not a POS story.
  • Software / boot. Only now: stuck update, corrupted boot media, an application that starts and crashes. This rung is last because it's the most expensive to check and — despite reputations — the least frequent cause at the counter.

The order matters more than the list. Every rung you skip "because it's obviously software" costs you double when you climb back down.

Blast radius: one lane, or the whole store?

Before touching the terminal, ask what else is broken. One register down points at that lane: its hardware, its cable drop, its port. Every register down points at shared plumbing — the switch, the router, the circuit, or an upstream outage — and no amount of rebooting lane 2 will fix it. Registers fine but payments failing points at connectivity to the processor, where a cellular failover (the MiFi units many stores carry) either saved the day or silently didn't. Thirty seconds of scoping saves thirty minutes of debugging the wrong box.

Know when to stop

Field triage has a clock on it. If the ladder hasn't found it and a swap unit is available, swap the hardware and move on — mean-time-to-selling beats root cause while customers are waiting. The dead unit goes to the bench where it can be taken apart honestly, without a queue watching. Knowing when you've crossed from "diagnosing" into "tinkering while the store bleeds" is the actual skill; the checklist just makes it visible.

Write it down, every time

The ticket isn't paperwork — it's the dataset. Individual failures look random; logged failures have shapes. The lane that eats power supplies (bad outlet). The store where every outage is connectivity (aging circuit, failover worth auditing). The printer model that wedges after a specific firmware. One line per incident — symptom, rung that found it, fix — is what turns a fleet of one-off emergencies into a maintenance plan. It's also how you notice you've stopped having a certain kind of emergency, which is the quietest form of winning.

Why this shaped how I write software

Everything I build carries this experience. It's why my from-scratch POS prints receipts through a failure-tolerant path, why Packet Analyzer treats malformed input as normal weather, and why I reach for bounded queues and graceful fallbacks before clever features. Production isn't where code runs — it's where code meets a Tuesday morning rush with a line out the door. Software written by someone who has stood at that counter is just built differently.