Imagine if you could settle/rekindle domestic arguments by asking your smart speaker when the room last got cleaned or whether the bins have already been taken out. Or — for a healthier use case — what if you ask your speaker to keep count of reps as you do squats and bench presses? Or switch into full-on ‘personal trainer’ mode — barking orders to peddle faster as you spin cycles on a dusty old exercise bike (who needs a Peloton!). And what if the speaker was smart enough to know you’re eating dinner and took care of slipping on some mood music?
Imagine if all those activity-tracking smarts were on tap without any connected cameras in your home. Another bit of fascinating research from researchers at Carnegie Mellon University’s Future Interfaces Group opens up these possibilities — demonstrating a novel approach to activity tracking that does not rely on cameras as the sensing tool. Installing connected cameras inside your home is a horrible privacy risk, which is why the CMU researchers set about investigating the potential of using millimeter wave (mmWave) Doppler radar to detect different types of human activity.
The challenge they needed to overcome is that while mmWave offers a “signal richness approaching that of microphones and cameras”, as they put it, data sets to train AI models to recognize different human activities as RF noise are not readily available (as visual data for training other types of AI models is). Not to be deterred, they set about synthesizing Doppler data to feed a human activity-tracking model — devising a software pipeline for training privacy-preserving AI models. The results can be seen in this video — whe model correctly identifies several activities, including cycling, clapping, waving, and squats. Purely from its ability to interpret the mmWave signal the movements generate — and purely having been trained on public video data.
“We show how this cross-domain translation can be successful through a series of experimental results,” they write. “Overall, our approach is an important stepping stone towards significantly reducing the burden of training such as human sensing systems and could help bootstrap uses in human-computer interaction.” Researcher Chris Harrison confirms the mmWave Doppler radar-based sensing doesn’t work for “very subtle stuff” (like spotting different facial expressions). But he says it’s sensitive enough to detect less vigorous activity — like eating or reading a book.
A need for line-of-sight between the subject and the sensing hardware also limits the motion detection ability of Doppler radar. (Aka: “It can’t reach around corners yet.” Which, for those concerned about future robots’ powers of human detection, will surely sound slightly reassuring.) Detection does require special sensing hardware, of course. But things are already moving on that front: Google has been dipping its toe in via project Soli — adding a radar sensor to the Pixel 4, for example. Google’s Nest Hub also integrates the same radar sensors to track sleep quality.
“One of the reasons we haven’t seen more adoption of radar sensors in phones is a lack of compelling use cases (sort of a chicken and egg problem),” Harris tells TechCrunch. “Our research into radar-based activity detection helps to open more applications (e.g., smarter Siris, who know when you are eating, making dinner, cleaning, or working out, etc.).” Asked whether he sees greater potential in mobile or fixed applications, Harris reckons there are interesting use cases for both. “I see use cases in both mobile and nonmobile,” he says. “Returning to the Nest Hub… the sensor is already in the room, so why not use that to bootstrap more advanced functionality in a Google smart speaker (like rep counting your exercises?
“There are a bunch of radar sensors already used in the building to detect occupancy (but now they can detect the last time the room was cleaned, for example).” “Overall, the cost of these sensors is going to drop to a few dollars very soon (some on eBay are already around $1), so you can include them in everything,” he adds. “And as Google is showing with a product that goes in your bedroom, the threat of a ‘surveillance society’ is much less worry-some than with camera sensors.” Startups like VergeSense already use sensor hardware and computer vision technology to power real-time analytics of indoor space and activity for the b2b market (such as measuring office occupancy).
But even with local processing of low-resolution image data, there could still be a perception of privacy risk around using vision sensors — certainly in consumer environments. Radar offers an alternative to visual surveillance that could be a better fit for privacy-risking consumer-connected devices such as ‘smart mirrors. “If it is processed locally, would you put a camera in your bedroom? Bathroom? Maybe I’m prudish, but I wouldn’t personally,” says Harris. He also points to earlier research that underlines the value of incorporating more types of sensing hardware: “The more sensors, the longer tail of interesting applications you can support. Cameras can’t capture everything, nor do they work in the dark.”
“Cameras are pretty cheap these days, so it is hard to compete there, even if radar is cheaper. I do believe the strongest advantage is privacy preservation,” he adds. Of course, having any sensing hardware — visual or otherwise — raises potential privacy issues. For example, a sensor that tells you when a child’s bedroom is occupied may be good or bad, depending on who has access to the data. (Do you want your smart speaker to know when you’re having sex?) And all sorts of human activity can generate sensitive information, depending on what’s happening. While radar-based tracking may be less invasive than other sensors, it doesn’t mean there are no potential privacy concerns.
It depends on where and how the sensing hardware is being used. Albeit, it’s hard to argue that the data radar generates is likely less sensitive than comparable visual data were it to be exposed via a breach. “Any sensor should naturally raise the question of privacy — it is a spectrum rather than a yes/no question,” agrees Harris. “Radar sensors are usually rich in detail but highly anonymizing, unlike cameras. If your Doppler radar data leaked online, it’d be hard to be embarrassed about it. No one would recognize you. If cameras from inside your house leaked online, well….”
“It isn’t turnkey, but there are many large video corpora to pull from (including Youtube-8M),” he says. “It is orders of magnitude faster to download video data and create synthetic radar data than having to recruit people to come into your lab to capture motion data. “One is inherently 1 hour spent for 1 hour of quality data. Ese days. For every hour of video, it takes us about 2 hours to process, but that is just on one desktop machine we have here; in contrast, youu can download hundreds of hours of footage easily from many excellently curated video databases. The lab. The key is that you can parallelize this, using Amazon AWS or equivalent, and process 100 videos at once, so the throughput can be extremely high.”
And while RF signal does reflect, and do so to different degrees off of other surfaces (aka “multi-path interference”), Harris says the signal reflected by the user “is by far the dominant signal”.Theyy didn’t need to model other reflections to get their demo model working. (Though he notes that could be done to further hone capabilities “by extracting big surfaces like walls/ceiling/floor/furniture with computer vision and adding that into the synthesis stage”.) “The [Doppler] signal is actually very high level and abstract, and so it’s not particularly hard to process in real-time (much less ‘pixels’ than a camera),” he adds. “Embedded car processors use radar data for things like collision braking and blind spot monitoring, and those are low-end CPUs (no deep learning or anything).”
The research is being presented at the ACM CHI conference alongside another Group project — Pose-on-the-Go — which uses smartphone sensors to approximate the user’s full-body pose without needing wearable sensors. CMU researchers from the Group have also previously demonstrated a method for indoor ‘smart home’ sensing on the cheap (also without the need for cameras), as well as — last year — showing how smartphone cameras could be used to give an on-device AI assistant more contextual savvy.
In recent years, they’ve also investigated using laser vibrometry and electromagnetic noise to give smart devices better environmental awareness and contextual functionality. Other interesting research out of the Group includes using conductive spray paint to turn anything into a touchscreen. Various methods are used to extend the interactive potential of wearables — such as by using lasers to project virtual buttons onto the arm of a device user or incorporating another wearable (a ring) into the mix. The future of human-computer interaction looks certain to be much more contextually savvy — even if current-gen ‘smart’ devices can still stumble on the basics and seem more than a little dumb.
Leave a Reply