Similarity-based Data Search and Exploration on the Semantic Web: Applied on Biomedical and Health Data
Reference number: NCSR_00695.
Duration: 2 years (Nov. 2015 - Aug. 2017).
Funding: CNRS-L (National Council for Scientific Research), Lebanon.
Amount of grant: 16 million LPB.
Team: members of LAU’s ECE and CS departments, and colleagues from the University of Pau and Adour Countries (UPPA), France, the University of Sao Paulo (USP), Brazil, the Antonine University (UA), Lebanon, and NOBATEK R&D, France.
The goal of our project is to develop efficient techniques for similarity-based query evaluation functions and pattern discovery, applied on biomedical and (health-related) clinical data. In brief, we aim at i) investigating the integration of Semantic Web (SW)-based formats (XML/RDF-based) to describe biomedical and health related data, ii) investigate the use of knowledge bases (thesauri and/or ontologies) to handle the semantics of bio-information, iii) introduce similarity measures, considering both data structure (organization) and semantics (meaning), for approximate data search and pattern discovery, iv) elaborate intuitive methods for data search and querying by non-expert users, and v) implement an online prototype system in order to test and evaluate each of our prospective contributions. From an academic perspective, this project is intended to allow graduate and PhD students, and researchers to vividly collaborate and exchange novel ideas, so as to further consolidate their respective scientific backgrounds and expand their research span.
We can categorize our research achievements in eight main tasks: 1) Reviewing semantic data representation formats and data semantization techniques which can be used to represent biomedical data, 2) Transforming syntactic data into semantically meaningful information using dedicated knowledge bases, 3) Introducing a semantic-aware multimedia data representation model, 4) Facilitating data manipulation for non-expert users, 5) Designing semantic-aware database indexing techniques, 6) Developing semantic-aware data querying and ranking functions, 7) Developing user-friendly tools integrating various (ranked search, pattern matching, and patter discovery) functions which target various e-health applications including: health assessment, nutrition recommendation, as well as polarity and affect analysis that can be used in clinical therapy, 8) Implementing and testing all algorithms, and providing online implementations to be evaluated by third party non-expert users.
We are conducting additional tests to evaluate the scalability and adaptability of our solutions when handling different multimedia objects (e.g., medical images, electrocardiograms, time series, and text-based diagnoses) with different sizes and properties. We are also investigating crowd-sourcing as an auxiliary metadata source, adding an additional dimension to our medical and health related data for user-sensitive event detection and identification. In the future, we plan to implement inference and recommendation functionality to infer future medical events based on current ones (e.g., predicting that a patient or a population is prone to heart strokes in the future based on current health records and daily nutrition diets). Also, we aim to study the semantic relationships among medical events (e.g., disjointness, intersection, inclusion) based on data from different sources (e.g., clinics, hospitals, pharmacies), coined with domain specific health or biomedical ontologies, to create an open linked medical collective knowledge base.
We are pursuing the above research perspectives (among others) in a recently launched international research project titled Mining, indexing, and visualizing big data in clinical decision support systems, led by partners from the University of Sao Paulo, Brazil, in collaboration with colleagues from the USA, France, Germany, and our team in Lebanon, and funded by the Research Support Foundation of the State of Sao Paulo.