Deep learning system could someday serve as a social coach
It’s a fact of nature that a single conversation can be interpreted in very different ways. For people with anxiety or conditions such as Asperger’s, this can make social situations extremely stressful. But what if there was a more objective way to measure and understand our interactions?
Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Institute of Medical Engineering and Science (IMES) say that they’ve gotten closer to a potential solution: an artificially intelligent, wearable system that can predict if a conversation is happy, sad, or neutral based on a person’s speech patterns and vitals.
“Imagine if, at the end of a conversation, you could rewind it and see the moments when the people around you felt the most anxious,” says graduate student Tuka Alhanai, who co-authored a related paper with PhD candidate Mohammad Ghassemi that they will present at next week’s Association for the Advancement of Artificial Intelligence (AAAI) conference in San Francisco. “Our work is a step in this direction, suggesting that we may not be that far away from a world where people can have an AI social coach right in their pocket.”
As a participant tells a story, the system can analyze audio, text transcriptions, and physiological signals to determine the overall tone of the story with 83 percent accuracy. Using deep-learning techniques, the system can also provide a “sentiment score” for specific five-second intervals within a conversation.
“As far as we know, this is the first experiment that collects both physical data and speech data in a passive but robust way, even while subjects are having natural, unstructured interactions,” says Ghassemi. “Our results show that it’s possible to classify the emotional tone of conversations in real-time.”
The researchers say that the system’s performance would be further improved by having multiple people in a conversation use it on their smartwatches, creating more data to be analyzed by their algorithms. The team is keen to point out that they developed the system with privacy strongly in mind: The algorithm runs locally on a user’s device as a way of protecting personal information. (Alhanai says that a consumer version would obviously need clear protocols for getting consent from the people involved in the conversations.)
How it works
Many emotion-detection studies show participants “happy” and “sad” videos, or ask them to artificially act out specific emotive states. But in an effort to elicit more organic emotions, the team instead asked subjects to tell a happy or sad story of their own choosing.
Subjects wore a Samsung Simband, a research device that captures high-resolution physiological waveforms to measure features such as movement, heart rate, blood pressure, blood flow, and skin temperature. The system also captured audio data and text transcripts to analyze the speaker’s tone, pitch, energy, and vocabulary.
“The team’s usage of consumer market devices for collecting physiological data and speech data shows how close we are to having such tools in everyday devices,” says Björn Schuller, professor and chair of Complex and Intelligent Systems at the University of Passau in Germany, who was not involved in the research. “Technology could soon feel much more emotionally intelligent, or even ‘emotional’ itself.”