Projects

Automated extraction of food knowledge from literature

Understanding the chemical composition of food is essential for achieving precise nutrition based on diet. However, much of the food-chemical information is scattered across oceans of the scientific literature, making the manual extraction cumbersome. For this project, we aim to automate the extraction of knowledge from scientific literature by utilizing large language models (LLMs). Ultimately, we strive to construct a comprehensive and quality-controlled food knowledge graph that supports various real-life applications such as diet recommendations, drug discovery, and knowledge synthesis. (Website available at https://foodatlas.ai/.)


Representative publication:

Youn, Jason, Fangzhou Li, Gabriel Simmons, Shanghyeon Kim, and Ilias Tagkopoulos. “FoodAtlas: Automated Knowledge Extraction of Food and Chemicals from Literature.” Computers in Biology and Medicine (2024).

Youn, Jason, Fangzhou Li, and Ilias Tagkopoulos. “Semi-Automated Construction of Food Composition Knowledge Base.” 2nd AAAI Workshop on AI for Agriculture and Food Systems (2023).


Trainees involved:

Jason Youn (Ph.D. Candidate), Fangzhou Li (Ph.D. Candidate), Arielle Soomi Yoo (Ph.D. Student), Keer Ni (Ph.D. Student), Michael Gunning (Ph.D. Student) and Kaichi Xie (M.S.)


Funding:

USDA-NIFA AI Institute for Next Generation Food Systems (AIFS).

E. coli stress response to antiseptics and disinfectants

Resistance to antibiotics is a serious global health treat. While antiseptics and disinfectants are fundamental to control the number of microorganisms in hospitals and food industries, the indiscriminate use and poor regulation could be contributing to select for bacteria with decreased susceptibility to antimicrobials. Our laboratory aims to study the E. coli response after exposure to ten antiseptics and disinfectants of widespread use, describing the pathways and potential mechanisms of tolerance and cross-resistance to antimicrobials. We work in evolving E. coli in various antiseptics and disinfectants and in quantifying the trajectories, genetic events and fitness effects of the resulting mutations.


Representative Publication:

Merchel Piovesan Pereira, Beatriz, Xiaokang Wang, and Ilias Tagkopoulos. “Short-and long-term transcriptomic responses of Escherichia coli to biocides: a systems analysis.” Applied and Environmental Microbiology (2020).

Merchel Piovesan Pereira, Beatriz, and Ilias Tagkopoulos. “Benzalkonium chlorides: uses, regulatory status, and microbial resistance.” Applied and Environmental Microbiology (2019).


Trainees involved:

Beatriz Pereira (Ph.D. Candidate), Muhammad Adil Salim (Ph.D. Candidate), Xiaokang Wang (Ph.D.), Navneet Rai (Postdoc)

Predicting gene expression in Salmonella enterica and E. coli

Salmonella causes many different types of illnesses, such as typhoid fever and diarrhea. About 16 million typhoid fever cases and 600,000 deaths caused by Salmonella enterica are reported annually worldwide and about 1.4 million Salmonellosis cases are reported in the United States each year. Our laboratory has compiled an omics database for Salmonella, as well as an E. coli omics database (done in an earlier project). Because these two strains of bacterium are so similar, it is our goal to leverage knowledge in one omic to aid the other when constructing models that can predict gene expression.


Trainees involved:

Cheng-En Tan (Ph.D. Candidate), Trevor Chan (Ph.D. Candidate), Minseung Kim (Ph.D.)

Exploration of bacterial anticipatory responses in the mammalian gut

For organisms to survive in in a complex environment such as their natural habitat, they are required to be able to adapt fast and preferably possess adaptive anticipation. Recent studies have indicated that microbes can anticipate the future environmental condition by using combinations of environmental cues. We hypothesize that one such system is the mammalian gut, where the competition to colonize and acquire resources by hundreds of microbial species is fierce. Here, we study the existence of natural anticipatory behavior in Escherichia coli, a well-known resident of the gut, by exposing the bacterium to sequential sequences of different cues. Our results show that anticipatory behavior is a natural phenomenon in gut microbiota and we are in the process of understanding its mechanistic underpinning through genetic manipulation and subsequent screening.


Trainees involved:

Minseung Kim (Ph.D.), Navneet Rai (Postdoc)


Funding:

Army Research Office (DOD-ARO).

Prediction of cellular state using artificial neural networks

Gene expression prediction is one of the grand challenges in computational biology. The availability of transcriptomics data combined with recent advances in artificial neural networks provide an unprecedented opportunity to create predictive models of gene expression with far reaching applications. We develop methodology to incorporate genetic regulatory relationships amongst genes into our neural network architecture to achieve high predictive power using minimal training data.


Representative Publication:

Eetemadi A, Tagkopoulos I. Genetic Neural Networks: An artificial neural network architecture for capturing gene expression relationships.


Trainees involved:

Ameen Eetemadi (Ph.D. candidate)


Funding:

National Science Foundation.

Accelerating knowledge discovery from omics data via optimal experimental design

The last decade has witnessed an extraordinary progress in genome sequencing technology, decreased costs and more diverse sequenced genomes. The various omics data complement each other, providing a possibility to understand genetic etiology of complex traits and disease comprehensively. However, how to collect omics data more efficiently by leveraging all the current data turns out to be a daunting challenge due to the fact that it is hard to design new experiments efficiently to fill the gap in the existing data set. The goal of this project is to develop optimal experimental design (OED) methods that indicate which experimental conditions should be conducted to collect new data so that the new data yield the most amount of information about the question of interest. Achieving this goal will accelerate knowledge discovery from omics data by reducing the amount of experiments to run and thus cost in money and time. An exploratory study is being conducted with a focus on exploring gene expression level of E. coli under the pressure of antiseptics and antibiotics. OED methods are being developed to aid the process of exploration.


Trainees involved:

Xiaokang Wang (Ph.D. candidate); Beatriz Pereira (Ph.D. candidate); Ameen Eetemadi(Ph.D. candidate) and Navneet Rai (Postdoc)

Elucidating the Autophagy network in plants

Autophagy is a dynamic process that entails the engulfment of cellular components that are ultimately targeted for degradation. We provide systems biology analysis and predictive modeling to the Dinesh-Kumar lab to elucidate the key players in autophagy regulation in plants.


Collaborators:

Dinesh-Kumar Laboratory, Plant Biology and Genome Center, UC Davis.


Trainees involved:

Minseung Kim (Ph.D. Candidate).


Funding:

Not funded yet.

Viral resitance signaling networks in tomato

Tomato, the second most important vegetable crop in the world is threatened by two insect-transmited viruses, TSWV and TYLCV, which cause annual crop losses in excess of one billion dollars. Although the loci of genetic resistance in wild tomato relatives, little is known about the mechanisms involved. The goal of this project is to gain insights into the nature of plant resistance mechanisms to insect-transmited viruses, through an integrative transcriptomics and proteomics approach.


Collaborators:

Dinesh-Kumar Laboratory, Plant Biology and Genome Center, UC Davis; Jeff Caplan, UoDelaware; Robert Gilbertson, UC Davis; Diane Ullman, UC Davis.


Trainees involved:

Xiaokang Wang (Ph.D. Candidate), Minseung Kim (Ph.D. Candidate).


Funding:

National Science Foundation (NSF-PGRP).

Systems Biology of TGRL neurodegenerative pathogenesis

Neurovascular inflammation and immune alterations are strongly associated with a number of neurological diseases. We apply a systems biology approach to identify the specific fatty acids and triglyceride-rich lipoprotein (TGRL) lipolysis products that mediate neurovascular injury. We provide bioinformatics support and computational predictive modeling to integrate heterogeneous data sources for target prediction.

Publications:

Aung HH, Tsoukalas A, Rutledge JC, Tagkopoulos I. A systems biology analysis of brain microvascular endothelial cell lipotoxicity.

BMC Syst Biol.
2014 Jul 4;8:80. PubMed PMID: 24993133.
(link)


Collaborators:

Rutledge Lab, Internal Medicine, UC Davis


Trainees involved:

Athanasios Tsoukalas (Postdoc); Ameen Eetemadi (Ph.D. Candidate).


Funding:

Seed funding from the Center for Information Technology (CITRIS) and the Clinical and Translational Science Center (CTSC).

Design and characterization of optimal synthetic gene circuits

Mathematical modeling and numerical simulation are crucial to support design decisions in synthetic biology. Accurate estimation of parameter values is key, as direct experimental measurements are difficult and time-consuming. Insufficient data, incompatible measurements, and specialized models that lack universal parameters make this task challenging. Here, we have created a database (PAMDB) that integrates data from 135 publications that contain 118 circuits and 165 genetic parts of the bacterium Escherichia coli. We used a consensus method and a universal model to estimate parameter values for all parts, as well as their confidence intervals. We then used a computer aided design (CAD) tool together with optimization techniques to find optimal designs of novel circuits with the desired dynamic characteristics.


Representative Publication:

Huynh L, Tagkopoulos I. A parts database with consensus parameter estimation for synthetic circuit design. ACS Synthetic Biology. 2016 5 (12), 1412-1420. PubMed PMID: 27454439.


Trainees involved:

Linh Huynh (PhD) and Navneet Rai (Postdoc).


Funding:

National Science Foundation (NSF-CISE).

Population collapse and adaptive rescue during long‐term chemostat fermentation

Since decades, E. coli has been the model organism to study the bacterial adaptive evolution under different environmental conditions. Recent development of cost effective omics tools has boosted the studies aiming to understand genetic and transcriptional basis of long- and short-term bacterial evolution. Most of the evolution conditions have been performed in batch cultures by serially transferring the growing culture to fresh medium each day. Problem with use of batch cultures is that chemical environments of the media are dynamic and will change over time, adding additional source (noise) to the ongoing evolution. An alternative way to study the evolution in chemically stable condition is to conduct the study in chemostats. Here, we performed the long-term evolution in chemostat using different media and strain combinations and discovered temporal cell size elongation and valley like growth profiles in several media and stress combinations. Later, we performed whole genome resequencing and transcriptome profiling to identify the genetic and transcriptional basis of the adaptation.


Representative Publication:

Rai N, Huynh L, Kim M, and Tagkopoulos I. Population collapse and adaptive rescue during long‐term chemostat fermentation. Biotechnology and bioengineering 2015 116, no. 3; 693-703.


Trainees involved:

Navneet Rai (Postdoc), Linh Huynh (Ph.D.) and Minseung Kim (Ph.D.)


Funding:

National Science Foundation, Army Research Office (DOD-ARO).

RiboTALE: a novel class of synthetic regulatory systems

A limiting factor in designing complex synthetic gene circuit is the number of independent and orthogonal regulators. Here, we constructed and validated a novel class of inducible synthetic regulators, coined RiboTALEs. RiboTALEs were constructed by fusing a library of Transcription Activator-Like Effectors (TALEs) with the Riboswitches. It was demonstrated the RiboTALEs constructed through different combinations of TALE proteins and riboswitches were able to rapidly and reproducibly control the expression of downstream targets with a dynamic range of 34-fold. (Rai et al, 2015a).


Representative Publication:

Rai N, Ferreiro A, Neckelmann A, Soon A, Yao A, Siegel J, Facciotti MT, Tagkopoulos I. RiboTALE: A modular, inducible system for accurate gene expression control. Sci Rep. 2015 May 29;5:10658. PubMed PMID: 26023068


Trainees involved:

Navneet Rai (Postdoc) and iGEM-UCD 2013 team.


Funding:

National Science Foundation.

Algorithmic and Computational Foundations for Synthetic Biology

The current state of synthetic biology is reminesence of the state of electronics in the ’50s: few parts that work well, most of them uncharacterized, very few computational tools for optimal experimental design. A major effort in our lab is to build the algorithmic and computational infrastructure for engineering biological circuits. Efforts in this area include PAMLib, a library of parts and modules with experimentally characterized parameters, exact and approximate algorithms to mix and match parts for optimal design, the Interactive SBROME CAD optimization tool, the OptInfer parameter inference platform from multiple I/O synthetic circuit datasets (see Software page).


Three representative Publications:

  • Huynh L, Tsoukalas A, Köppe M, Tagkopoulos I. SBROME: a scalable optimization and module matching framework for automated biosystems design. ACS Synth Biol. 2013 May 17; 2(5):263-73. PubMed PMID: 23654271.
  • Huynh L, Tagkopoulos I. Optimal part and module selection for synthetic gene circuit design automation. ACS Synth Biol. 2014 Aug 15;3(8):556-64. PubMed PMID: 24933033.
  • Huynh L., Tagkopoulos I., Fast and Accurate Circuit Design Automation through Hierarchical Model Switching. ACS Synth Biol. 2015 Apr 28; PubMed PMID: 25916918.

Trainees involved:

Linh Huynh (Ph.D. Candidate); Navneet Rai (Postdoc); Nasos Tsoukalas (Postdoc)


Funding:

National Science Foundation (NSF-CISE).

Integrative multi-omics, genome-scale modeling

Multi-omics, is a term that refers to the integration of multiple genome-scale profiling datasets, which usually include transcriptomics, proteomics and metabolomics. Mining such integrative datasets can provide insights on the information content of genes for any environmental condition or genetic background. These datasets can serve as training data for an integrative multi-scale cellular model that can learn to predict molecular expression and traits. Our lab has recently constructed an integrative model based on constraint optimization and we are currently working on multi-scale HPC simulations with artificial neural netoworks. Ecomics, multi-omics dataset and Clairvoyance, a predictive tool are the latest products of this work.


Two representative Publications:

  • Kim M, Zorraquino V, Tagkopoulos I. Microbial forensics: predicting phenotypic characteristics and environmental conditions from large-scale gene expression profiles. PLoS Comput Biol. 2015 Mar;11 (3):e1004127. PubMed PMID: 25774498. (link)
  • Carrera J, Estrela R, Luo J, Rai N, Tsoukalas A, Tagkopoulos I. An integrative, multi-scale, genome-wide model reveals the phenotypic landscape of Escherichia coli. Mol Syst Biol. 2014 Jul 1;10:735. PubMed PMID: 24987114.(link)

Trainees involved:

Minseung Kim(Ph.D. Candidate); Navneet Rai (Postdoc); Nasos Tsoukalas (Postdoc); Violeta Zorraquino (Postdoc)


Funding:

Not funded yet.

Identifying the genetic basis of cross-stress behavior

Being exposed in stress A can make an organism more or less vulnerable to stress B. Our laboratory aims to identify the mechanisms, mutations and networks involved with cross-stress behavior in the case of E. coli. To identify the genetic basis and evolutionary potential of this phenomenon, we have profiled and evolved more than a hundred cell lines in abiotic stresses (osmotic, acidic, oxidative, butanol) and their combinations. We have then repaired the mutations under the mutant background and assessed its fitness effects using growth/competition profiling. We also used systems and network biology analyses to elucidate latent interactions. Currently, we work in evolving E. coli in various antiseptics – antibiotics and in quantifying the trajectories, genetic events and fitness effects of the resulting mutations.


Two representative Publications:

  • Dragosits M, Mozhayskiy V, Quinones-Soto S, Park J, Tagkopoulos I. Evolutionary potential, cross-stress behavior and the genetic basis of acquired stress resistance in Escherichia coli. Mol Syst Biol. 2013;9:643. PubMed PMID: 23385483. (link)
  • Zorraquino V.,Quinones S., Kim M., Rai N., Tagkopoulos I., “Deciphering cross-stress responses in Escherichia coli under complex evolutionary scenarios”, bioRxiv, doi:10.1101/010595. (link)

Trainees involved:

Minseung Kim(Ph.D. Candidate); Violeta Zorraquino (Postdoc); Nasos Tsoukalas (Postdoc)


Funding:

National Science Foundation (NSF-MCB), Army Research Office (DOD-ARO).

Secondary cell wall regulation in Arabidopsis

In collaborations in various plant biology labs, we have applied systems and network biology analysis to elucidate the gene regulatory network for secondary cell wall synthesis in Arabidopsis thaliana, as well as its response under various stresses. We performed network inference, network integration and gene ontology analysis.


Publications:

Taylor-Teeples M, Lin L, de Lucas M, Turco G, Toal TW, Tagkopoulos I. et al. An Arabidopsis gene regulatory network for secondary cell wall synthesis. Nature. 2014 Dec 24; PubMed PMID: 25533953. (link)


Collaborators:

Brady Lab, Plant Biology and Genome Center, UC Davis; five other experimental labs contributed to this work.


Trainees involved:

Athanasios Tsoukalas (Postdoc).


Funding:

National Science Foundation (NSF-PGRP).

Predictive analytics for clinical decision support

Sepsis is an overwhelming immune response to infection, which damages its own tissues and organs. This process can happen at any age, regardless of health condition, and many times from seemingly benign incidents. Severe sepsis, sepsis with acute organ dysfunction, strikes about 18 million people annually (750,000 cases in the United States) and has a high short-term mortality risk (28% to 50%). We trained machine learning models with the Electronic Health Records of 1497 patients that were collected from the ICU of the UC Davis Medical Center. We tailored Partially Observable Markov Decision Processes (POMDP) to identify the optimal policy (actions), given the data, as well as regressors and classifiers to predict mortality and length-of-stay.


Publications:

  • Gultepe E, Green JP, Nguyen H, Adams J, Albertson T, Tagkopoulos I. From vital signs to clinical outcomes for sepsis patients: A clinical decision support system based on discriminative classification. JAMIA 2014 Mar-Apr;21(2):315-25. PubMed PMID: 23959843. (link)
  • Tsoukalas A., Albertson T., Tagkopoulos, I. A data-driven, probabilistic machine learning approach to decision support for patients with Sepsis, JMIR Med Inform. 2015 Feb 24;3(1):e11. PubMed PMID: 25710907. (link)

Collaborators:

Tim Albertson M.D. (Chair, Internal Medicine), Hien Nguyen M.D., Jeff Green M.D., Jason Adams M.D.


Trainees involved:

Athanasios Tsoukalas (Postdoc), Eren Gultepe (MS student).


Funding:

CITRIS and CTSC (Seed funding).

Self-regulation of Heterologous Protein Production systems

Recombinant Protein Production is a pillar of the biotechnology industry. There is a trade-off between the quantity of protein produced and its quality, as high concentrations of the recombinant protein stress the cell and lead to various issues, such as misfolding and formation of inclusion bodies. We presented a novel approach stemming from synthetic biology, where a stress-induced promoter drives a repressor protein that shuts down the recombinant protein production system, once significant levels of stress are detected. We demonstrated its efficancy in a proof-of-concept system by quantifying the soluble/insoluble fraction and titers of GFP in our engineered strains.


Publications:

  • Dragosits M, Nicklas D, Tagkopoulos I. A synthetic biology approach to self-regulatory recombinant protein production in Escherichia coli. J Biol Eng. 2012 Mar 30;6(1):2. PubMed PMID: 22463687. (link)
  • Tagkopoulos I. Microbial factories under control: auto-regulatory control through engineered stress-induced feedback. Bioengineered. 2013 Jan-Feb;4(1):5-8. PubMed PMID: 22922761. (link)

 

Trainees involved:

Martin Dragosits (Postdoc), Dan Nicklas (rotation student).


Funding:

Not funded.