ProteinEngine: Empower LLM with domain knowledge for protein engineering

Yiqing Shen, Outongyi Lv, Houying Zhu, Yu Guang Wang*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review

Abstract

Large language models (LLMs) have garnered considerable attention for their proficiency in tackling intricate tasks, particularly leveraging their capacities for zero-shot and in-context learning. However, their utility has been predominantly restricted to general tasks due to an absence of domain-specific knowledge. This constraint becomes particularly pertinent in the realm of protein engineering, where specialized expertise is required for tasks such as protein function prediction, protein evolution analysis, and protein design, with a level of specialization that existing LLMs cannot furnish. In response to this challenge, we introduce ProteinEngine, a human-centered platform aimed at amplifying the capabilities of LLMs in protein engineering by seamlessly integrating a comprehensive range of relevant tools, packages, and software via API calls. Uniquely, ProteinEngine assigns three distinct roles to LLMs, facilitating efficient task delegation, specialized task resolution, and effective communication of results. This design fosters high extensibility and promotes the smooth incorporation of new algorithms, models, and features for future development. Extensive user studies, involving participants from both the AI and protein engineering communities across academia and industry, consistently validate the superiority of ProteinEngine in augmenting the reliability and precision of deep learning in protein engineering tasks. Consequently, our findings highlight the potential of ProteinEngine to bride the disconnected tools for future research in the protein engineering domain.
Original languageEnglish
Title of host publicationArtificial intelligence in medicine
Subtitle of host publication22nd International Conference, AIME 2024 Salt Lake City, UT, USA, July 9–12, 2024 Proceedings, Part I
EditorsJoseph Finkelstein, Robert Moskovitch, Enea Parimbelli
Place of PublicationCham
PublisherSpringer
Pages373-383
Number of pages11
ISBN (Electronic)9783031665356
ISBN (Print)9783031665349
DOIs
Publication statusPublished - 2024
Event22nd International Artificial Intelligence in Medicine Conference, AIME 2024 - Salt Lake City, United States
Duration: 9 Jul 202412 Jul 2024

Publication series

NameLecture Notes in Artificial Intelligence
PublisherSpringer
Volume14844
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference22nd International Artificial Intelligence in Medicine Conference, AIME 2024
Country/TerritoryUnited States
CitySalt Lake City
Period9/07/2412/07/24

Keywords

  • Deep Learning
  • Large Language Model
  • Protein Design
  • AI for Protein Design

Cite this