Knowledge-backed AI with Monarch: A match made in heaven

Monarch Initiative
6 min readAug 25, 2023


Have you been dying to interrogate carefully curated biomedical knowledge using LLMs? We just made it possible! In this post, we show you how you can access the wealth of information captured in the Monarch Initiative Knowledge Graph, using our ChatGPT integration via the plugins feature from OpenAI.

What does AI know?

Artificial Intelligence (AI) systems, particularly Large Language Models (LLMs) like OpenAI’s ChatGPT, have introduced new and exciting ways to interact with information. Trained to predict missing parts of text from vast collections of books, articles, social media, code, and more, these models learn patterns in language, ranging from basic grammatical structures to specific answers for AP exam questions.

Unfortunately, while LLMs can and do memorize bits of their training data, they are not all- knowing. LLMs encode a variety of facts deep in the layers of their neural networks, but information on many topics is too uncommon to be reliably stored. Making matters worse, AI models’ propensity to hallucinate answers they don’t know is cause for serious concern, as is the potential for memorization of incorrect information. “Beware of false knowledge; it is more dangerous than ignorance.” (Like most sources, ChatGPT attributes this quote to George Bernard Shaw, though this is, ironically, not entirely accurate.)

How can we teach AI models to give us the truth, the whole truth, and nothing but the truth? We aren’t there yet, but progress is being made. Recent research on hallucination indicates the problem is deeper than superficial mis-memorization of facts, and work on convincing LLMs to report a humble “I don’t know” is ongoing.

Despite these difficulties, LLMs excel at summarizing, integrating, and explaining information provided to them during a chat. Given hundreds of lines of computer code, they can explain the logic. Given a passage of literature, they can identify the metaphors. Given a news article, they can summarize the contents. Rather than trying to build a model that memorizes everything, we can instead help the AI locate and extract quality information upon request. This technique even has a name: grounding.

Knowledge graphs, which are vast, curated databases filled with interconnected facts, make an ideal partner for AI. Knowledge graphs that provide API access, like Monarch, further provide the search and access features needed — it’s a match made in heaven!

Monarch on ChatGPT

To illustrate how LLM-based AI can be connected to curated knowledge, we developed a Monarch integration with ChatGPT using OpenAI’s plugins feature. Plugins allow ChatGPT (specifically, OpenAI’s state-of-the-art GPT-4 model) to search for data via external APIs, retrieve and summarize the results, and make followup requests if necessary.

Example ChatGPT search for Cystic Fibrosis (screenshot)

For example, above we asked ChatGPT to list up to 5 symptoms of Cystic Fibrosis (CF). The answer includes both disclaimer text and links that direct back to the relevant pages on These are explicit features of the plugin, as part of the initialization text provided to the model.

Although the linkage between CF and these symptoms is provided by Monarch, the layperson summary information (such as polyps being soft, painless, noncancerous growths) is added by the AI. LLMs are famed for their multilingual abilities, and translation between scientific and common terms is no exception. Features like these make powerful resources like Monarch significantly more user-friendly. A search for “CF,” for example, will be expanded to “Cystic Fibrosis” based on the question context, and ChatGPT doesn’t hesitate to correct misspelled rare disease names.

In the interaction above, ChatGPT made two back-to-back calls to the Monarch Initiative: first a keyword-based search to find the disease identifier for Cystic Fibrosis (MONDO:0009061), and second to retrieve associated phenotypes. This multi-call planning behavior is one of LLMs’ many surprising properties — unlike the links and disclaimer text, it is not part of the plugin or the API. We can even ask the model to describe its plan first, which has been shown to improve logical reasoning when needed.

We asked ChatGPT what genes are associated with EDS and what its plan was (screenshot)

Knowledge graphs (KGs) are designed to store information in a highly structured and organized fashion, allowing for sophisticated analysis of real-world data, like Long-COVID symptoms. Most of human knowledge is stored in free text, of course, not in knowledge graphs, but LLMs can help here too. First, they can effectively extract, structure, and organize information from text, as elegant work by Monarch’s own Caufield et al. demonstrates. But as mentioned previously, they can also summarize text in native human language. In the example below the response from the Monarch API includes a linked publication, and ChatGPT uses another plugin, WebPilot, to retrieve the linked PubMed page content for summarization.

Example of summary of publication information by ChatGPT

With over 50,000 linked publications, Monarch and AI can together provide a uniquely capable interface for scientific inquiry.

Challenges and Future Directions

So, can we finally get the truth, the whole truth, and nothing but the truth from our AI assistants? We still have a ways to go. The first challenge is to find an effective strategy for providing an AI with the most relevant information to work with. As anyone who’s tried to find something on the internet knows, a quality search engine can make all the difference. Fortunately, LLMs provide new avenues for information retrieval by producing embeddings; these deeply-encoded numeric representations turn out to be similar for text with similar content. Recent work suggests a hybrid keyword + embeddings approach may be best.

Identifying the best resources to feed back to the model is especially important due to LLMs’ limited context size — the number of tokens (words or word parts) the model can process at one time. GPT-4 can handle up to 32,000 tokens, roughly equivalent to 24,000 words. That’s quite a lot (this post contains about 1,700 tokens, and models with larger contexts are increasingly available), but results from API calls use more tokens than normal text due to formatting. The single result for the list of 5 symptoms above used 1,373 tokens, for example.

While OpenAI’s plugins are designed to work with any well-documented API, we found that designing a small LLM-focused API was both easier for the model to use (the API specification itself uses tokens), and allowed us to pack information into fewer tokens. Because we can only provide a subset of results to the model, and some entities have hundreds or thousands of associations, questions like “what genes do disease X and disease Y have in common” will require utilizing the linkage information stored in Monarch on the API side, and these queries are not yet supported. Who knows — in the future LLMs may be able to generate sophisticated knowledge graph queries themselves.

Next, although it is easy to verify the correctness of a few example queries, understanding the real utility of knowledge-backed AI and making measured progress requires careful evaluation. Benchmark datasets are typically used for quantifying LLM quality, and we are investigating recently developed benchmarks such GeneTuring and SciQA. Finally, this proof-of-concept uses OpenAI’s plugins feature, only available to ChatGPT Plus paid subscribers, limiting accessibility and reproducibility in academic settings. We are exploring other options, including OpenAI’s less restrictive function-calling models, and open-source alternatives like Llama 2 and Gorilla, the latter of which is also trained for API calling (but against specific, non-Monarch APIs).

Disclaimer: this blog post was human-written, with review and suggestions provided by GPT-4.

The demonstration chat including API calls and results may be accessed here. The Monarch Plugin code is available on GitHub.

Contributors: Shawn T. O’Neil, Kevin Schaper, Glass Elsarboukh, Nomi Harris, Monica Munoz-Torres, Justin Reese, J. Harry Caufield, Melissa Haendel, Peter Robinson, Chris Mungall



Monarch Initiative

Semantically curating genotype-phenotype knowledge. Visit us at #OpenScience #Collaborative #Data