From Large Language Models to Knowledge Graphs for Biomarker Discovery in Cancer

dc.contributor.author Karim, Rezaul
dc.contributor.author Comet, Lina Molinas
dc.contributor.author Shajalal , Md
dc.contributor.author De Perthuis, Paola
dc.contributor.author Rebholz-Schuhmann, Dietrich
dc.contributor.author Decker, Stefan
dc.date.accessioned 2023-12-26T18:47:14Z
dc.date.available 2023-12-26T18:47:14Z
dc.date.issued 2024-01-03
dc.identifier.isbn 978-0-9981331-7-1
dc.identifier.other dc455b3e-4a96-4967-aa7c-3a26c199b58b
dc.identifier.uri https://hdl.handle.net/10125/107055
dc.language.iso eng
dc.relation.ispartof Proceedings of the 57th Hawaii International Conference on System Sciences
dc.rights Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject The Technical, Socio-Economic, and Ethical Aspects of AI
dc.subject bioinformatics
dc.subject biomarker discovery
dc.subject cancer diagnosis
dc.subject explainable ai
dc.subject knowledge graphs.
dc.subject machine learning
dc.subject nlp
dc.subject ontology
dc.title From Large Language Models to Knowledge Graphs for Biomarker Discovery in Cancer
dc.type Conference Paper
dc.type.dcmi Text
dcterms.abstract Domain experts often rely on up-to-date knowledge for apprehending and disseminating specific biological processes that help them design strategies to develop prevention and therapeutic decision-making. A challenging scenario for artificial intelligence (AI) is using biomedical data (e.g., texts, imaging, omics, and clinical) to provide diagnosis and treatment recommendations for cancerous conditions. Data and knowledge about cancer, drugs, genes, proteins, and their mechanism is spread across structured (knowledge bases (KBs)) and unstructured (e.g., scientific articles) sources. A large-scale knowledge graph (KG) can be constructed by integrating these data, followed by extracting facts about semantically interrelated entities and relations. Such KGs not only allow exploration and question answering (QA) but also allow domain experts to deduce new knowledge. However, exploring and querying large-scale KGs is tedious for non-domain users due to a lack of understanding of the underlying data assets and semantic technologies. In this paper, we develop a domain KG to leverage cancer-specific biomarker discovery and interactive QA. For this, a domain ontology called OncoNet Ontology (ONO) is developed to enable semantic reasoning for the validation of gene-disease relations. The KG is then enriched by harmonizing the ONO, metadata, controlled vocabularies, and biomedical concepts from scientific articles by employing BioBERT- and SciBERT-based information extractors. Further, since the biomedical domain is evolving, where new findings often replace old ones, without employing up-to-date findings, there is a high chance an AI system exhibits concept drift while providing diagnosis and treatment. Therefore, we finetuned the KG using large language models (LLMs) based on more recent articles and KBs.
dcterms.extent 11 pages
prism.startingpage 5576
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
0547.pdf
Size:
2.15 MB
Format:
Adobe Portable Document Format
Description: