An eavesdropping attack for Android smartphones has been created by a team of researchers that, to varying degrees, can identify the gender and identity of the caller and even interpret private communication. The side-channel attack, known as EarSpy, intends to investigate new opportunities for listening by recording motion sensor data readings brought on by reverberations from mobile device ear speakers.
Researchers from five American universities — Texas A&M University, Temple University, New Jersey Institute of Technology, University of Dayton, and Rutgers University—have collaborated on the academic project EarSpy. Although this attack has been investigated in smartphone loudspeakers, ear speakers looked insufficient to produce enough vibration for eavesdropping risk to make such a side-channel attack workable.
However, compared to versions from a few years ago, newer smartphones employ more potent stereo speakers, which deliver considerably higher music quality and greater vibrations. Similar to this, contemporary gadgets have more sensitive motion sensors and gyroscopes that can capture even minute resonances from speakers.
The spectrogram below provides evidence of this advancement, demonstrating how the dual ear speakers of a 2019 OnePlus 7T create substantially more data than the earphone of a 2016 OnePlus 3T. In their tests, the researchers employed the OnePlus 7T and OnePlus 9 smartphones together with various collections of previously recorded music that was solely played via the ear speakers of the two devices.
Additionally, the team recorded accelerometer data during a practice conversation using the third-party program “Physics Toolbox Sensor Suite” before feeding it to MATLAB for analysis and feature extraction from the audio stream. Using publicly accessible datasets, a machine learning (ML) system was taught to identify gender, caller identity, and speech content.
The test data varied according to the dataset and device, but it gave generally encouraging findings for eavesdropping via the ear speaker. On the OnePlus 7T, caller gender identification varied from 77.7% to 98.7%, caller ID categorization ranged from 63.0% to 91.2%, and speech recognition varied from 51.8% to 56.4%.
“We evaluate the time and frequency domain features with classical ML algorithms, which show the highest 56.42% accuracy,” explains the researchers. “As there are ten different classes here, the accuracy still exhibits five times greater accuracy than a random guess, which implies that vibration due to the ear speaker induced a reasonable amount of distinguishable impact on accelerometer data,” as per EarSpy technical paper.
Gender identification peaked at 88.7% on the OnePlus 9 smartphone, speaker identification declined to an average of 73.6%, and voice recognition varied from 33.3% to 41.6%. Caller gender and ID accuracy reached 99% using the loudspeaker and the “Spearphone” application the researchers created when testing a similar attack in 2020, and voice recognition accuracy reached 80%.
The volume users select for their ear speakers is one factor that can lessen the effectiveness of the EarSpy attack. A lower level would be more ear-friendly and may stop listening via this side-channel attack. The hardware component layout and assembly tightness also influence the distribution of speaker reverberation. Finally, human motion or environmental vibrations reduce the accuracy of the produced speech data.
With Android 13, gathering sensor data at sampling rates higher than 200 Hz is forbidden without authorization. At the default sampling rate (400 Hz to 500 Hz), this disables speech recognition, but if the attack is carried out at 200 Hz, the accuracy is only reduced by roughly 10%. The researchers advise phone manufacturers to guarantee that sound pressure remains constant throughout calls and to arrange the motion sensors so that internal vibrations don’t affect them or, at the very least, have the least negative impact.