From Large Language Models to Knowledge Graphs for Biomarker Discovery in Cancer

dc.contributor.authorKarim, Rezaul
dc.contributor.authorComet, Lina Molinas
dc.contributor.authorShajalal , Md
dc.contributor.authorDe Perthuis, Paola
dc.contributor.authorRebholz-Schuhmann, Dietrich
dc.contributor.authorDecker, Stefan
dc.date.accessioned2023-12-26T18:47:14Z
dc.date.available2023-12-26T18:47:14Z
dc.date.issued2024-01-03
dc.identifier.doi10.24251/HICSS.2024.670
dc.identifier.isbn978-0-9981331-7-1
dc.identifier.otherdc455b3e-4a96-4967-aa7c-3a26c199b58b
dc.identifier.urihttps://hdl.handle.net/10125/107055
dc.language.isoeng
dc.relation.ispartofProceedings of the 57th Hawaii International Conference on System Sciences
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectThe Technical, Socio-Economic, and Ethical Aspects of AI
dc.subjectbioinformatics
dc.subjectbiomarker discovery
dc.subjectcancer diagnosis
dc.subjectexplainable ai
dc.subjectknowledge graphs.
dc.subjectmachine learning
dc.subjectnlp
dc.subjectontology
dc.titleFrom Large Language Models to Knowledge Graphs for Biomarker Discovery in Cancer
dc.typeConference Paper
dc.type.dcmiText
dcterms.abstractDomain experts often rely on up-to-date knowledge for apprehending and disseminating specific biological processes that help them design strategies to develop prevention and therapeutic decision-making. A challenging scenario for artificial intelligence (AI) is using biomedical data (e.g., texts, imaging, omics, and clinical) to provide diagnosis and treatment recommendations for cancerous conditions. Data and knowledge about cancer, drugs, genes, proteins, and their mechanism is spread across structured (knowledge bases (KBs)) and unstructured (e.g., scientific articles) sources. A large-scale knowledge graph (KG) can be constructed by integrating these data, followed by extracting facts about semantically interrelated entities and relations. Such KGs not only allow exploration and question answering (QA) but also allow domain experts to deduce new knowledge. However, exploring and querying large-scale KGs is tedious for non-domain users due to a lack of understanding of the underlying data assets and semantic technologies. In this paper, we develop a domain KG to leverage cancer-specific biomarker discovery and interactive QA. For this, a domain ontology called OncoNet Ontology (ONO) is developed to enable semantic reasoning for the validation of gene-disease relations. The KG is then enriched by harmonizing the ONO, metadata, controlled vocabularies, and biomedical concepts from scientific articles by employing BioBERT- and SciBERT-based information extractors. Further, since the biomedical domain is evolving, where new findings often replace old ones, without employing up-to-date findings, there is a high chance an AI system exhibits concept drift while providing diagnosis and treatment. Therefore, we finetuned the KG using large language models (LLMs) based on more recent articles and KBs.
dcterms.extent11 pages
prism.startingpage5576

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0547.pdf
Size:
2.15 MB
Format:
Adobe Portable Document Format