The Multimedia Analytics Module in the INSPEC2T Platform

VICOMTECH-IK4, January 2017

As part of their role in INSPEC2T, Vicomtech-IK4 successfully integrated the re-identification of persons from images, tested under controlled real conditions, and is currently adding further functionalities on video and audio processing.

The person recognition feature of the INSPEC2T platform allows the police operators to get automatic notifications if the same person gets identified in images from different incident reports. This feature is useful for the re-identification of missing persons or suspect criminals. Currently, the multimedia analytics engine, forming part of the INSPEC2T platform, uses robust face detection models and deep neural networks to recognise the same faces, i.e. persons. Figure 1 presents photos, submitted within two distinct incident reports, in which the persons’ faces were automatically detected (see the regions highlighted by yellow rectangle). The corresponding correlation report (see Figure 2), generated automatically, informs on a person being re-identified, and provides a list of the corresponding incident reports. Figures 1 and 2 show captured screenshots of the Secure Portal, being the portal built by Aditess Ltd for dedicated access by police operators.

Figure 1: Automatic face detection in uploaded images (two cases are presented). The detected faces are highlighted with a yellow rectangle.

Figure 2: Correlation report relating two incident reports based on the recognition of persons. The images visible in Figure 1.

When videos enter as input to the multimedia analytics module for person detection, only a couple of representative images of each person visible in the video is extracted for further analysis that supports future re-identification tasks. The multimedia analytics module always respects personal data since no personally identifiable information is exchanged. In Figure 3 a video recording is depicted in which the camera moves from one actor to the other and back. Finally, the main representative images of the actor’s, as being considered in the following analysis tasks, are shown below (in grayscale).

Figure 3: Face detection in video (video scene is depicted on the top); only a representative set of faces is extracted (shown below).

Entering into the last development phase in order to have the entire platform ready for in –field tests, starting in the middle of April 2017, Vicomtech-IK4 has begun the final integration steps of audio analysis tools which will provide automatic audio transcription tools to the police operators and also allow for the raising of automatic alerts based on keyword search and detection of acoustic security events such as siren sound, screaming, broken glass, etc., (in cooperation with Aditess Ltd) by the INSPEC2T platform. The application of this technology focuses on the processing of any audio material uploaded by the user (including the audio track of videos) and will allow for automatic analysis in cases when the description of the incident and the situation is submitted by the user via an audio recording (see illustrative images in Figure 4).

Figure 4 Audio analysis tools being included in the INSPEC2T platform: automatic text transcription, keyword search and acoustic event detection (from top to bottom).