The pitch you hear from a lot of AI surveillance vendors is some version of: "Our detection runs in under a second, so the response is instantaneous." This is technically true and operationally meaningless. The detection running fast does not matter if the detection is wrong half the time, because the response that follows is either (a) ignored after the third false alarm of the shift, or (b) escalated, which means LE eventually stops responding to your address.
The relevant question is not how fast the AI fires. The relevant question is: how long does it take from a real threat appearing on camera to a real human action affecting the threat? On our SOC floor that number is 60 seconds. Here is how that number is built and why it is the right one.
The detection problem
Pure-AI computer vision is good at distinguishing humans from non-humans. It is less good at distinguishing a person who lives next door, walking their dog along your fence line at 6am, from a person staging an approach to your back gate at 3am. The first is a legitimate non-event. The second is a threat. Both look like humans to the model. Both fire.
Some vendors try to solve this with more sensors — geofencing, behavioral analytics, intent classifiers. These all help. None of them get false alarm rates below the low 20s in field conditions. That is enough false alarms per shift to train both operators and customers to stop responding.
The human-in-the-loop window
The fix is to put a trained operator between the AI firing and the customer being notified. The operator sees what the AI saw, plus the live camera feed, plus the site context (the customer's whitelist of expected vehicles, the prior 48 hours of activity at that camera, the time of day pattern for that site). The operator makes a call in 15 to 30 seconds. If the event is benign, the customer never sees it. If the event is real, the operator escalates immediately.
The 60-second number is the budget for the full chain: AI detection (1-2 seconds) plus operator review and decision (15-30 seconds) plus audio talk-down or LE dispatch initiation (10-20 seconds) plus a buffer for network and routing. End to end: 60 seconds from event to direct action.
Why not faster? Two reasons. First, the operator review is the layer that drives false alarms from 30%+ to under 5%. Removing it to save 20 seconds gives back all the false alarm fatigue. Second, the 60-second number is well inside the operational window for almost every threat type. A copper thief needs 90 seconds minimum to start producing damage. A trespasser staging a perimeter approach takes 2-4 minutes to actually breach. Sixty seconds is the right side of all those numbers.
Why not slower? Past 90 seconds, the deterrent value of audio talk-down drops sharply. The intruder has committed to the act. Recovery becomes possible but prevention becomes harder.
What this costs to operate
A real SOC is not cheap. The Pikesville floor runs 24/7/365 with US-based operators only. Each operator monitors a defined number of camera feeds within a workload curve that has been tuned over years to keep median response inside the 30-second window. Training is roughly six weeks for an entry-level operator, longer for shift leads. We do not offshore. Every operator on every shift is in Maryland.
The cost economics versus pure-AI alerting are real. A pure-AI service can charge $40 to $80 per month per camera because there is no operator labor in the loop. A human-verified service runs higher because someone has to actually look at the alert. We do not apologize for this. The economics of false alarm fatigue are not theoretical: on average, a pure-AI customer cancels LE response on their address inside 90 days. After that, the system is functionally a recording device.
What you should ask a vendor
If you are evaluating an AI surveillance service, three questions clarify quickly:
- Is there a human reviewing events before they reach me? If no, ask what the false alarm rate is. The honest answer is 30%+.
- What is your median time from event detection to direct action on the threat? Action means talk-down, LE dispatch, or customer notification — not the AI firing internally. If the answer is under 30 seconds, ask whether there is a human in the loop. Probably not.
- Where are your operators based? If the answer is offshore, your audio talk-down has an accent problem and your LE coordination has a jurisdictional problem. Neither is fatal but both matter.
The missing layer
AI is a tool. It is a fast tool. It is not, by itself, a security service. The missing layer in pure-AI systems is the operator who knows that the figure on camera 3 at 3am is the same delivery driver who comes in at 4am every Tuesday — because the operator was on shift last Tuesday and the Tuesday before that.
The 60-second SOC is what that knowledge looks like in production. It is slower than the AI alone. It is also the
