IMC, March 2018
When an incident occurs, there is a vast amount of information and data that needs to be processed by Law Enforcement Agencies (LEAs). To enable a faster reaction to this information, one of the INSPEC2T project objectives is to correlate information submitted by citizens in their crime reports. This information contains geographical, textual and may also contain multimedia information (images, video or sound). Part of IMC’s effort as Technical Partner to the INSPEC2T project is to apply semantic analysis techniques and tools in order to analyze the textual information with an objective to correlate reports. The correlation could connect multiple reports regarding the same incident description, the same suspect description or similar characteristics in both. This allows Secure Portal operators to quickly identify submitted content and either group them in an Incident file or connect them to older Incidents, providing additional information on the suspect or the incident.
HOW SEMANTIC ANALYSIS TECHNIQUES WORK
Starting from simple keyword and pattern matching a report contains “tall man, dark hair, tattoos” could be related to the one that contains “man with tattoos and dark hair”. Furthermore, by using location and time (specifically proximity in location and time) in relation to the description itself will lead to a potential correlation of these reports. Since each person has a different approach to written language, the above-mentioned descriptions can be compared in a “per same word” distance. Meaning, how many words match over the total words of the description. That could lead us to a more abstract report correlation method without the need to identify each word in the description and also to establish levels of similarity in order to identify possible correlations.
The figure below illustrates the two different processes.
THE DEVELOPMENT OF THE CASE BASED REASONING (CRB) METHODOLOGY
During system development and testing, a higher level of textual analysis was reached by using Natural Language Processing (NLP). NLP is helping us to identify not only the text similarity but also the keywords of the message. In a simplistic version, NLP identifies the most important/meaningful keywords (or sets of keywords) in the phrases. Basic NLP components are the lexicon and the ontologies. Both require an increased effort to create and populate in order to have a larger pool of identified keywords and accuracy. Since NLP is an intermediate step, each change, optimization or correction had to be updated in all different ontologies and languages. Such continuous parallel work created different processing for the same procedure. This can be mitigated by using a smaller portion of NLP analysis.
Finally, Case Based Reasoning (CBR) is a continuous method for reading /correlating which is based on many of the above tools to extract information from text. This process can be described in the sequence illustrated below which consists of Retrieve, Reuse, Revise and Retain of data which is followed by a correction phase. In the correction phase, the successful results are stored for reuse and in parallel revising the archived ones (already processed) through the new rules.
Disrupting factors have been identified in the process of using such techniques in the INSPEC2T report correlation. One of them is having different languages in the text description of the reports, which requires a big effort to identify them and maintain NLP mechanisms with per language dictionaries and ontologies. A second one is the speed of processing, as real-time submitted reports require real-time results to act faster.
INSPEC2T’s pilot testing showed that the intelligence provided by the CBR mechanism can possibly help the operator to survive the information overflow (sorting and searching among hundreds of reports) by receiving hints about possible report matches. This is actually creating smarter decisions in the process of grouping reports under an incident, creating advantages both in the reaction speed but also in the effectiveness of community policing.
IMC’s blog focuses on and illustrates some technical aspects of the INSPEC2T solution given IMC technical expertise. Please note that the INSPEC2T consortium will ensure that the privacy rights and other users’ fundamental rights will be respected at all times. We will provide a full analysis of the INSPEC2T legal and ethical requirements in a future blog entry.