WHO WE ARE
We are an interdisciplinary lab of biologists, computer scientists and engineers that want to push the boundaries of knowledge
WHAT WE DO
We analyze data to understand complex systems and predict their behavior. We generate and test hypotheses, create algorithms, build software systems
HOW WE DO IT
We develop and apply Machine learning, optimization and other computational methods with HPC support on data generated in our lab and elsewhere.
We serve multiple disciples but we are better at biological, medical and agricultural systems and models. Experimentally, we focus mostly on microbes. Often tagged as ML/AI geeks, HPC pests, Systems & Synthetic Biology aficionados
WHERE WE ARE
We are part of the Computer Science department and the UC Davis Genome Center. Our lab and offices are on the 5th floor of Genome and Biomedical Science Facility (GBSF)
WHY WE DO IT
Long hours, low pay, constant pressure over results, publications and funding, what is there not to like? The excitement to work on discover something novel, useful, potentially ground-breaking is difficult to match.
DESIGN, PLANNING AND STRATEGY
This always starts from the science or business question that we want to answer. We identify the challenge, the opportunity and the resources we need to succeed. We go to whatever lengths necessary to engage the right people, design the most informative experiments and remove any intrinsic bias to ensure that success if within grasp.
RESEARCH AND DEVELOPMENT
Once the scope and success criteria have been defined, we approach R&D through an engineering lense. Usually interdisciplinary teams of 2-5 students, postdoctoral associates and other trainees meet weekly, divide tasks and exchange ideas.
EVALUATION, REFINEMENT AND DISSEMINATION
R&D is not a monolithic feed-forward process, it involves constant feedback, evaluation and refinement, where early failures become the guideposts for future success. Once the scientific aims of a project are completed, the final step includes peer-review of our methods, publication of our findings and making available our products to our collaborators and the public.
The search for life outside planet Earth has been the topic of research of many scientists over the years. Human space exploration and the imminent possibility of actually colonizing other planets (being Mars our best candidate) raise concerns by a portion of these scientists, since we could, unintentionally, contaminate space with our Earth microbes. Multiple missions over the past decades have sent spacecraft and equipment, such as rovers, to Mars. We should expect even more expeditions over the next decades. Could it be possible that we are already sending life to Mars?
We know from studies on Earth that microbes on our planet are pretty much everywhere. Scientists have found microorganisms thriving in the most extreme environments, from low-temperature arctic permafrost to high pressure acidic deep-sea hydrothermal vents . Fungi have been isolated from Chernobyl, surviving to the exposure of high radiation levels. Some species of fungi seem to grow towards sources of radiation . Other scientists have studied microbes from the International Space Stations , which demonstrates that they are not only making their way to space but also surviving on it.
Concerns about space contamination are not new, and space agencies such as NASA dedicate efforts to minimize its occurrence, by following strict practices of decontamination and investing in research to improve such techniques. However, all attempts of decontamination may not be 100% efficient. The standards currently enforced  include a reasonably acceptable level of microbes (acceptability limit) and spores on surfaces. The numbers significantly minimize potential contamination, but they also mean that surfaces may not be completely sterile. Scientists have studied microbes from spacecraft assembly cleanrooms [5, 6], which are designed to keep microbial levels as low as possible. Following strict guidelines reduces the abundance of microbes in the cleanrooms significantly, but don’t eliminate them . Researchers have isolated Bacillus strains, a genus of bacteria capable of producing highly resistant spores, from those environments, in the hope of gaining insights from resistance mechanisms . Studies such as these have also implications on our daily, terrestrial, lives. Resistant microbes (especially to cleaning products) pose potential risks to human health. The threat emerges from the ability of microorganisms to transfer resistance genes to each other. When microbes that cause disease receive genes that hinder them more skillful to survive, controlling and eliminating pathogens becomes even harder.
Sending microbes to Mars, and especially to regions in which they could survive and thrive, may lead to future inaccurate “discoveries” of martian life. Future advancements regarding the discovery of life outside of planet Earth have to balance the benefits of knowledge acquisition with the potential for contamination of the very same sites we want to study. Such balance and how to proceed split opinions and stimulate ample discussions [7, 8]. There is one certainty, though: once we get to the point of sending humans to Mars, contaminating the planet will become unavoidable.
 Chénard C, Lauro FM. (2017) Microbial Ecology of Extreme Environments Springer ISBN 978-3-319-51686-8 (eBook).
 Zhdanova NN et al. (2004) Ionizing radiation attracts soil fungi. Mycol. Res. 108 (9): 1089–1096
 Lang JM et al. (2017) A microbial survey of the International Space Station (ISS). PeerJ.PubMed 29492330
 Office of planetary protection, NASA. https://planetaryprotection.nasa.gov/requirements
 Mahnert A et al. (2015) Cleanroom Maintenance Significantly Reduces Abundance but Not Diversity of Indoor Microbiomes. PLoS ONE 10(8): e0134848
 Tirumala MR et al. (2018) Bacillus safensis FO-36b and Bacillus pumilus SAFR-032: a whole-genome comparison of two spacecraft assembly facility isolates. BMC Microbiology 18:57
 Conley CA and Rummel JD (2013) Appropriate protection of Mars. Nature Geoscience volume 6, pages 587–588.
 Macauley MK (2007) Environmentally Sustainable Human Space Activities: Can Challenges of Planetary Protection be Reconciled? Astropolitics 5:209–2364
When training a binary classifier, cross entropy (CE) loss is usually used as squared error loss cannot distinguish bad predictions from extremely bad predictions. The CE loss is defined as follows:
where is the probability of the sample falling in the positive class . , where is a sigmoid function.
When implementing CE loss, we could calculate first and then plug in the definition of CE loss. However, there is a problem with this in practice. At the beginning of training, a positive example might be confidently classified as a negative example , implying . If is small enough, it could be smaller than the smallest floating point value i.e. numerically zero. Then we get if we take the log of 0 when computing the cross-entropy. To tackle this potential numerical stability issue, the logistic function and cross-entropy are usually combined into one in package in Tensorflow and Pytorch
Still, the numerical stability issue is not completely under control since could blow up if z is a large negative number. To tackle this potential problem, the “log-sum-exp” trick is used to shift the center of the exponential sum. The log-sum-exp trick is described as follows
Using this formula, we can force the greatest value to be zero even if other values would underflow. So can be in practice.
One of the concepts that can improve effectiveness of a machine learning (ML) method, is the consideration of sparsity in its design. Here I give a short summary on benefits of sparsity considerations in ML.
Definition: A set of numbers (e.g. vector, matrix, etc.), is considered sparse when a high percentage of the values are assigned a constant default value.
Consideration of sparsity in ML can have three main benefits:
- Reduced space: when dataset is known to be sparse, amount of required memory an be reduced substantially using a sparse representation.
- Reduced runtime: when many numbers have the same default value, corresponding calculations can be done only once to reduce runtime.
- Reduced generalization error: sometimes, sparsity is a favorable property for a portion of an ML model (e.g. the parameters, an intermediate representation). This maybe due to a principal such as Occam’s razor or known facts about the problem in hand. In such cases, considering the sparsity in the model will reduce generalization error leading into more powerful model.
SparseNN is a library I designed providing the first two benefits above in the context of artificial neural networks (ANN). This was initially designed for protein inference problem described in our DeepPep  model where input data is substantially sparse. I defined a type of ANN module called SparseBlock for when particular blocks of data are known to be sparse. As in figure above, the sparse representation of input contains the non-zero blocks each with corresponding row id.
 M. Kim, A. Eetemadi, and I. Tagkopoulos, “DeepPep: deep proteome inference from peptide profiling”, PLoS Computational Biology (2017) [link]10
Lab member Minseung Kim graduates with a PhD!
Nicholas Joodi graduates with a MS!30