skip to Main Content

Multimodal Risk Assessment and Symptom Tracking (MRAST)

The diagnosis of cancer and subsequent treatments can have a large impact on patients’ psychological well-being. Along with the physical remnants of treatment, cancer survivors often continue to grapple with anxiety and depression. Moreover, multiple studies have shown that symptoms extracted from conversation can greatly improve the accuracy for disease identification and disease progression (Picone at al., 2020). Furthermore, self-reporting of symptoms can result in significant bias in reporting experiences.

In PERSIST project, MRAST framework was developed by UM to extract to analyse and ‘annotate’ the risks as expressed by patients in their diary recordings. Main assessment tools are still using traditional methods in clinics including interviews and specifically designed questionnaires (Kroenke et al., 2001). During the interviews, patients show numerous verbal and non-verbal expressions about themselves. Noticing and understanding of those expressions are mainly based on the experience and skills of the staff. Thus, automatic depression recognition tools should be integrated into clinical workflows for overcoming above-mentioned issues. The importance of developing objective, reliable and effective assessment tool as a complementary to psychometric questionnaires for depression recognition was stated before (Cuthbert and Insel, 2013). Thus, the contribution of multimodality approach to observe the signs of depression and early detection of the mental disorders should be considered in the clinical workflow.

MRAST framework includes mainly 5 main steps:

  1. Retrieving the non-annotated patient diary recordings from OHC platform.
  2. Extracting the text (transcription) and audio (speech) from the videos via ASR
  3. Sending the text, audio and video components to Multimodal Depression Analysis Framework (developed by UM) to find the possible patient depression signs.
  4. Sending the text to the SYMP Chatbot to extract the risk factors (symptoms). SYMP Chatbot returns a list of possible risks (symptoms) sorted by the probability.
  5. Storing the risk factors and depression results together as a Composition FHIR resource to the OHC FHIR server.

Automatic speech recognition (ASR) system SPREAD is built on an end-to-end Connectionist Temporal Classification-based deep neural model (e.g. like DeepSpeech model). The model is called end-to-end because it needs speech samples and corresponding transcripts without any additional information. This approach allows finding an alignment between audio and text. Speech recognition was achieved for all PERSIST languages (EN, SI, ES, RU, LV, FR).

Multimodal depression analysis gets text, audio and video as input and classify the patient diary videos as depressed or not by using different artificial intelligence algorithms (SVM, RF, LSTM WOG, LSTM WG). In MRAST, word2vec model was used to extract the text features; Covarep library was used to extract audio features and openface library was used to extract facial features from video.


MRAST framework

The Symptoma Chatbot is an interactive conversational agent, handling free text inputs for unsupervised patient history taking and information collection. Standard healthcare admission questionnaires are very structured – they can be redesigned to be more sensitive but are not able to adapt to things that are not foreseen, thus missing many potential hints in the anamnesis. Symptoma’s chatbot adapts to each question based on patient input. E.g., if you enter “headache”, it automatically asks for “fever” as fever is the most common symptom to occur with headaches. If you enter “undercooked eggs”, it asks for “nausea”, “vomiting”, and “abdominal pain”. Therefore, it dynamically adapts to the patient’s input delivering a patient-centric conversation, catching any data points which are related to the patients’ conditions. For PERSIST, SYM extended the database and localized in the PERSIST languages with a focus on oncological concepts, quality of life subjects, as well as layman jargon.

The detailed scientific results of MRAST framework were recently published by UM team (Arioz et al., 2022). With this study, we provided objective assessment for depression and identification of risk factors to support the clinical decision. The results of the PERSIST patient videos will be published in the final report of the PERSIST project.

Dr. Izidor Mlakar and Dr. Umut Arioz

The University of Maribor (UM), Slovenia


  • Arioz, U.; Smrke, U.; Plohl, N.; Mlakar, I. Scoping Review on the Multimodal Classification of Depression and Experimental Study on Existing Multimodal Models. Diagnostics 2022, 12, 2683.
  • Cuthbert B. N. and Insel T. R., Toward the future of psychiatric diagnosis: the seven pillars of RDoC. BMC Medicine, 11, 126, (2013).
  • Kroenke, R.L. Spitzer, J.B. Williams, The PHQ-9: validity of a brief depression severity measure, J. Gen. Intern. Med. 16 (9), pp. 606–613, (2001).
  • Picone, M., Inoue, S., DeFelice, C., Naujokas, M. F., Sinrod, J., Cruz, V. A., … & Wassman, E. R. (2020). Social listening as a rapid approach to collecting and analyzing COVID-19 symptoms and disease natural histories reported by large numbers of individuals. Population Health Management, 23(5), 350-360.
Back To Top