The iLM and iGN models

Overview

ingenum harnesses real-time clinical livestock data through its WorkMate platform, creating a graph dataset with the ingenum Language Model (iLM). The ingenum Graph Network (iGN) encodes this data, capturing intricate clinical relationships. Our methodology involves contrastive pretraining to align clinical observations with textual descriptions, enhancing model accuracy. This enables zero-shot retrieval, facilitating precise information retrieval and intervention suggestions. Fine-tuning for specific downstream tasks leverages pretraining insights, with continuous refinement based on expert feedback ensuring our technology's relevance and effectiveness in veterinary epidemiology, offering advanced diagnostic and treatment prediction capabilities. ForeSight uses both the iLM and iGN systems to discover patterns in the clinical observation data, and enables users to investigate these patterns interactively.

Detailed Methodology

Dataset Compilation

Through WorkMate, ingenum has access to a live data stream of clinical observations, diagnoses, and treatments. Our integrations with PMS systems and laboratories also provide us with definitive pathologies. The ingenum Language Model (iLM), combined with human feedback from our expert veterinary epidemiologists, automatically constructs a graph dataset with nodes representing individual clinical observations in livestock, detailing symptoms, diagnoses, treatments, and outcomes. Edges in the dataset represent various types of relationships and interactions amongst observations, like temporal sequences and causal links.

Each node is accompanied by detailed textual descriptions, which encapsulate the clinical findings, diagnostic reasoning, treatment justifications, and observed outcomes, offering a comprehensive picture of each case.

Graph Neural Network Implementation

The ingenum Graph Network (iGN) is a type of Graph Neural Network, which we use to encode the graph-based clinical observation data into a latent vector space. This encoding captures the dependencies and relationships among the observations. Simultaneously, the textual descriptions associated with each observation are processed using the iLM text encoder, which is optimised for the language of animal clinical observations through the continued operation of the WorkMate application. This ensures that the semantic content of clinical narratives is accurately captured.

Contrastive Pre-training Approach

During the contrastive pretraining phase, we focus on aligning the graphical representations of clinical observations with their associated textual descriptions. This alignment is achieved by minimizing the distance between embeddings of matching observation-text pairs and maximizing it for non-matching pairs. To optimize this process, we utilize contrastive loss functions, such as EBM-NCE or InfoNCE, treating each clinical observation and its corresponding textual description as a positive pair.

Zero-shot Retrieval and Editing Capabilities

Following pretraining, our technology exhibits zero-shot retrieval capabilities, facilitating the accurate retrieval of relevant textual descriptions based on specific clinical observations and vice versa. Moreover, our technology is equipped to suggest modifications or interventions for specific clinical scenarios, guided by textual prompts designed to reflect distinct clinical objectives or constraints.

Downstream Task Adaptation

We fine-tune our technology for a variety of downstream tasks, including predicting treatment outcomes and diagnosing based on clinical observations. This fine-tuning process leverages a task-specific dataset, utilising the rich representations developed during the pretraining phase to enhance task-specific performance.

Rigorous Evaluation and Continuous Refinement

We undertake a thorough evaluation process, employing metrics specifically tailored to assess the relevance and coherence of the generated textual descriptions and the accuracy of predictions on downstream tasks. Informed by these evaluations and feedback from domain experts, we are committed to the continuous refinement of our technology, ensuring it remains relevant and effective in the clinical setting.