The 2021 ICML Workshop on Computational Biology

Machine learning advances are used in self-driving cars, speech recognition systems, and translation software. However, the COVID-19 pandemic has highlighted the urgency of translating such advances to the domain of biomedicine. Such a pivot requires new machine learning methods to build long term vaccines and therapeutic strategies, predict immune avoidance, and better repurpose small molecules as drugs.

The ICML Workshop on Computational Biology (WCB) will highlight how machine learning approaches can be tailored to making both translational and basic scientific discoveries with biological data. Practitioners at the intersection of computation, machine learning, and biology are in a unique position to frame problems in biomedicine, from drug discovery to vaccination risk scores, and the Workshop will showcase such recent research. Commodity lab techniques lead to the proliferation of large complex datasets, and require new methods to interpret these collections of high-dimensional biological data, such as genetic sequences, cellular features or protein structures, and imaging datasets. These data can be used to make new predictions towards clinical response, to uncover new biology, or to aid in drug discovery.

This workshop aims to bring together interdisciplinary machine learning researchers working in areas such as computational genomics; neuroscience; metabolomics; proteomics; bioinformatics; cheminformatics; pathology; radiology; evolutionary biology; population genomics; phenomics; ecology, cancer biology; causality; representation learning and disentanglement to present recent advances and open questions to the machine learning community. We especially encourage interdisciplinary submissions that might not neatly fit into one of these categories.

Awards

We are pleased to announce the following awards sponsored by: Chan Zuckerberg Initiative, Columbia IICD, Python Software Foundation, Mofitt Cancer Center and JetBrains

Contributed Talk Award

Best Spotlight Talk

Best Poster Award

Important Dates

Deadline for submissions : May 25th 2021 (extended from May 22nd 2021)
Reviewer deadline : June 13th 2021
Notification of acceptance : June 16th, 2021
Video recording deadline : June 26th, 2021
Camera-ready deadline : July 16th, 2021
Workshop date : July 24th 2021

Call for submissions

We invite extended abstracts, and highlight papers dealing with novel algorithms and computational approaches that are robust, scalable to high-dimensional data, and provide interpretable models of biological systems. These can be applications of ML methods or bioinformatics approaches to biological and biomedical data or novel approaches that enable new analyses.Papers will be presented in poster format and some will be selected for oral presentation. Through invited talks and presentations by the participants, this workshop will bring together current advances in Computational Biology and set the stage for continuing interdisciplinary research discussions.
Submission
All novel Computational Biology approaches are of interest to the workshop. We welcome original abstracts on recently published work as well as preliminary ideas in two different formats:
         - Extended abstracts not exceeding 4 pages in length (plus 1 optional page for references).
         - Highlight papers not exceeding 2 pages with an abstract and link to recently published paper/code. This avenue can be used to project already published articles.

All submissions must use the ICML template with this sty file with the modification of the footnote to this workshop by adding " Workshop on Computational Biology" behind the line 131 "(ICML)" in the file "icml2021.sty". The submission need not be anonymized. If the submission concerns previously published work, please cite the original paper in the workshop submission.
Submissions should be made through the EasyChair system.
Accepted submissions will have the option of being published on the workshop website. For authors who do not wish their papers to be posted online or become citable, please mention this in the workshop submission.

Instructions for revised submission
In your camera-ready submission please:
1) Change the command in the .sty file at lines 120 -123 to
\newcommand{\ICML@appearing}{The 2021 ICML Workshop on Computational Biology. Copyright 2021 by the author(s).}
2) Comment \usepackage{icml2021} in the main .tex file and uncomment \usepackage[accepted]{icml2021}.
Revised submissions should be made through the easychair by July 16th.

Poster
This year we are using Gather Town Rooms for poster presentations, which are presented by the Poster author at a given session. We scheduled three poster sessions and we set up a room for each one. Workshop attendees can virtually walk around the Gather Town Room and visit with the author and other attendees.
The Poster and thumbnail formats as requested by ICML logistics team

We are currently working with the ICML team to finish setting up the submission portal for the poster files and thumbnails, which is expected to be open for your submissions in the coming days. When this is complete, you will be able to submit your files using the following link: https://icml.cc/Conferences/2021/PosterUpload.
Please ensure you are logged in to the ICML account used for registration when submitting your poster. We will post information on our workshop website when the submission system is open.

Awards
All accepted contributions shall be presented at the virtual poster session. There will be Awards for Best Poster Presentations. In addition, a set of best submissions will also have the opportunity to present their work as Contributed Talks and receive awards.

Registration
All participants must register for the Workshop through the ICML 2021 conference.

Contact
For workshop-related queries please contact:workshopcompbio@gmail.com

Schedule

* Times below are in EDT

8:45 - 8:50 am Opening Remarks (Elham Azizi)
Session 1 (Chair: Yubin Xie)
8:50 - 9:30 am  Invited Talk 1: Carola-Bibiane Schönlieb
Lessons from the Pandemic for Machine Learning and Medical Imaging Slideslive
9:30 - 9:50 am Contributed Talk 1
Multigrate: single-cell multi-omic data integration [Paper 44] Slideslive
9:50 - 10:15 am Spotlights
Statistical correction of input gradients for black box models trained with categorical input features [Paper 56] Slideslive
Opportunities and Challenges in Designing Genomic Sequences [Paper 8] Slideslive
pmVAE: Learning Interpretable Single-Cell Representations with Pathway Modules [Paper 24] Slideslive
Deep Contextual Learners for Protein Networks [Paper 46] Slideslive
Multimodal data visualization, denoising and clustering with integrated diffusion [Paper 35] Slideslive
10:15 - 10:30 am Break
Session 2 (Chair: Cassandra Burdziak)
10:30 - 11:00 am Invited Talk 2: Quaid Morris
Anomaly detection to find rare phenotypes Live talk
11:00 - 11:20 am Contributed Talk 2
Light Attention Predicts Protein Location from the Language of Life [Paper 3] Slideslive
11:20 - 11:55 pm Highlights
Representation of Features as Images with Neighborhood Dependencies forCompatibility with Convolutional Neural Networks [Paper 2] Slideslive
VoroCNN: Deep Convolutional Neural Network Built on 3D Voronoi Tessellation of Protein Structures [Paper 40] Slideslive
DIVERSE: Bayesian Data IntegratiVE learning for precise drug ResponSE prediction [Paper 4] Slideslive
Spherical Convolutions on Molecular Graphs for Protein Model Quality Assessment [Paper 43] Slideslive
Data-driven Experimental Prioritization via Imputation and Submodular Optimization [Paper 7] Slideslive
Data Inequality, Machine Learning and Health Disparity [Paper 9] Slideslive
Deep neural networks identify sequence context features predictive of transcription factor binding Paper 26 Slideslive
11:55 - 1 pm   Poster session 1 [Virtual World Room 1 ]
1:00 - 2:00 pm   Poster session 2 [Virtual World Room 2 ]
Session 3 (Chair: Sandhya Prabhakaran)
2:00 - 2:30 pm Invited Talk 3: Kristin Swanson
Every Patient Deserves Their Own Equation Slideslive
2:30 - 2:50 pm Contributed Talk 3
Reconstructing unobserved cellular states from paired single-cell lineage tracing and transcriptomics data [Paper 57] Slideslive
2:50 - 3:15 pm Spotlights
Equivariant Graph Neural Networks for 3D Macromolecular Structure [Paper 15] Slideslive
Viral Evolution and Antibody Escape Mutations using Deep Generative Models [Paper 53] Slideslive
Multi-Scale Representation Learning on Proteins [Paper 21] Slideslive
Immuno-mimetic Deep Neural Networks (Immuno-Net) [Paper 34] Slideslive
Gene expression evolution across species, organs and sexes in Drosophila Paper 60 Slideslive
3:15 - 4:15 pm   Poster session 3 [Virtual World Room 3 ]
Session 4 (Chair: Amine Remita)
4:15 - 4:45 pm Invited Talk 4: Mathieu Blanchette
Learning from evolution Live talk
4:45 - 5:05 pm Contributed Talk 4
A Bayesian Mutation-Selection Model of Evolutionary Constraints on Coding Sequences Paper 67 Slideslive
5:05 - 5:30 pm   Award ceremony and closing remarks (Yubin Xie)

Invited speakers

*Listed alphabetically

Mathieu Blanchette

Associate Professor, McGill University
Director of School of Computer Science at McGill University

Quaid Morris

Group leader, Memorial Sloan Kettering Cancer Center

Carola-Bibiane Schönlieb

Professor of Applied Mathematics
Head of the Cambridge Image Analysis group
University of Cambridge

Kristin Swanson

Vasak and Anna Maria Polak Professor in Cancer Research
Vice Chair, Neurosurgery, Mayo Clinic

Accepted Submissions


Contributed Talks

Paper 3: Hannes Stärk, Christian Dallago, Michael Heinzinger and Burkhard Rost. Light Attention Predicts Protein Location from the Language of Life. [paper]
Paper 44: Mohammad Lotfollahi, Anastasia Litinetskaya and Fabian Theis. Multigrate: single-cell multi-omic data integration [paper]
Paper 57: Khalil Ouardini, Romain Lopez, Matthew G Jones, Sebastian Prillo, Richard Zhang, Michael I. Jordan and Nir Yosef. Reconstructing unobserved cellular states from paired single-cell lineage tracing and transcriptomics data [paper]
Paper 67: Berk Alpay, Mafalda Dias, Jonathan Frazer, and Debora Marks. A Bayesian Mutation-Selection Model of Evolutionary Constraints on Coding Sequences


Spotlights & Posters

Paper 8 : Mengyan Zhang and Cheng Soon Ong. Opportunities and Challenges in Designing Genomic Sequences [paper]
Paper 15: Bowen Jing, Stephan Eismann, Pratham Soni and Ron Dror. Equivariant Graph Neural Networks for 3D Macromolecular Structure [paper]
Paper 21: Charlotte Bunne, Vignesh Ram Somnath and Andreas Krause. Multi-Scale Representation Learning on Proteins [paper]
Paper 24: Gilles Gut, Stefan Stark, Gunnar Ratsch and Natalie Davidson. pmVAE: Learning Interpretable Single-Cell Representations with Pathway Modules [paper]
Paper 34: Ren Wang, Tianqi Chen, Stephen Lindsly, Cooper Stansbury, Indika Rajapakse and Alfred Hero. Immuno-mimetic Deep Neural Networks (Immuno-Net) [paper]
Paper 35: Manik Kuchroo, Abhinav Godavarthi, Guy Wolf and Smita Krishnaswamy. Multimodal data visualization, denoising and clustering with integrated diffusion [paper]
Paper 46 Michelle Li and Marinka Zitnik. Deep Contextual Learners for Protein Networks [paper 46]
Paper 60: Soumitra Pal, Brian Oliver and Teresa Przytycka. Gene expression evolution across species, organs and sexes in Drosophila
Paper 53: Nicole Thadani, Nathan Rollins, Sarah Gurev, Pascal Notin and Debora Marks. Viral Evolution and Antibody Escape Mutations using Deep Generative Models [paper]
Paper 56: Antonio Majdandzic and Peter Koo. Statistical correction of input gradients for black box models trained with categorical input features [paper]


Posters

Paper 10: Jun Cheng, Carolin Lawrence and Mathias Niepert VEGN: variant effect prediction with graph neural network [paper]
Paper 11: Gabriel Kalweit, Maria Kalweit, Mansour Alyahyay, Zoe Jaeckel, Florian Steenbergen, Stefanie Hardung, Ilka Diester and Joschka Boedecker. NeuRL: Closed-form Inverse Reinforcement Learning for Neural Decoding [paper]
Paper 12: Keisuke Yamada and Michiaki Hamada. Prediction of RNA-protein Interactions Using a Nucleotide Language Model
Paper 13: Hasib Zunair and Abdessamad Hamza. Synthetic COVID-19 Chest X-ray Dataset for Computer-Aided Diagnosis [paper]
Paper 20: Soroor Hediyeh-Zadeh, Yi Xie, Holly Whitfield and Melissa Davis Reference-free cell type annotation and phenotype characterisation in single cell RNA sequencing by learning geneset representations
Paper 22: Amine Amor, Pietro Liò, Ramon Viñas, Helena Andres Terre and Vikash Ranjan Singh Graph Representation Learning on Tissue-Specific Multi-Omics [paper]
Paper 25: Subhabrata Majumdar, Saonli Basu, Matt McGue and Snigdhansu Chatterjee Simultaneous Selection of Multiple Important Single Nucleotide Polymorphisms in Familial Genome Wide Association Studies Data [paper]
Paper 27: Alice Del Vecchio, Andreea Deac, Pietro Liò and Petar Veličković Neural message passing for joint paratope-epitope prediction
Paper 28: Rohan Ghotra, Nicholas Lee, Rohit Tripathy and Peter Koo Designing Interpretable Convolution-Based Hybrid Networks for Genomics [paper]
Paper 29: Zhenqin Wu, Bryant Chhun, Galina Schmunk, Syuan-Ming Guo, Chang Kim, Li-Hao Yeh, Tomasz Nowakowski, James Zou and Shalin Mehta DynaMorph: self-supervised learning of morphodynamic states of live cells [paper]
Paper 30: Rishal Aggarwal, Akash Gupta and U. Deva Priyakumar APObind: A Dataset of Ligand Unbound Protein Conformations for Machine Learning Applications in De Novo Drug Design [paper]
Paper 38: Soroor Hediyeh-Zadeh, Daryl Wilding-McBride, Ahmed Mohamed, Rune Larsen, Melissa Davis and Andrew Webb. Improving confident peptide identifications across mass spectrometry runs by learning deep representations of TIMS-MS1 features
Paper 39: Xueer Chen, Linyue Fan, Cameron Y. Park, Lauren Friend, Sham Rampersaud, George Plitas, Alexander Y. Rudensky and Elham Azizi. Semi-supervised Deconvolution of Spatial Transcriptomics in Breast Tumors
Paper 41: Arnav Das*, Rui Yang*, Vianne Gao, Alireza Karbalaghareh, William Noble, Jeff Bilmes and Christina Leslie Epiphany: Predicting the Hi-C Contact Map from 1D Epigenomic Data [paper]
Paper 42: Ji-Eun Park, Wancen Mu, Yining Jiao, Michael Love, Marc Niethammer and Natalie Stanley. MultImp: Multiomics Generative Models for Data Imputation [paper]
Paper 45: Yang An, Felix Drost, Fabian Theis, Benjamin Schubert and Mohammad Lotfollahi. Deconvolution of the T cell immune response using multi-modal learning [paper]
Paper 47: Guojie Zhong, Jiayao Wang, Siyu He and Xi Fu. Towards better understanding of developmental disorders from integration of spatial single-cell transcriptomics and epigenomics [paper]
Paper 48: Ziqi Zhang, Haoran Sun, Ragunathan Mariappan, Xi Chen, Mika Jain, Mirjana Efremova, Sarah Teichmann, Vaibhav Rajan and Xiuwei Zhang. Integrating unpaired scRNA-seq and scATAC-seq with unequal cell type compositions [paper]
Paper 49: Andrea Karlova, Wim Dehaen and Daniel Svozil. Fingerprint VAE [paper]
Paper 50: Elior Rahmani, Eran Halperin, Michael Jordan and Nir Yosef. Identifying systematic variation in gene-gene interactions at the single-cell level by leveraging low-resolution population-level data
Paper 52: Boning Li, Yingce Xia, Shufang Xie, Lijun Wu and Tao Qin. Distance-Enhanced Graph Neural Network for Link Prediction [paper]
Paper 54: Somesh Mohapatra, Joyce An and Rafael Gómez-Bombarelli. Graph attribution methods applied to understanding immunogenicity in glycans [paper]
Paper 55: Bishnu Sarker, Marie-Dominique Devignes, Sabeur Aridhi and Guy Wolf. Prot-A-GAN: Automatic Protein Function Annotation using GAN-inspired Knowledge Graph Embedding [paper]
Paper 58: Nicholas Lee and Peter Koo. Representation learning of genomic sequence motifs via information maximization [paper]
Paper 59: Michael Cai, Seojin Bang and Heewook Lee. TCR-epitope binding affinity prediction using multi-head self attention model [paper]
Paper 61: Nate Gruver, Samuel Stanton, Polina Kirichenko, Marc Finzi, Phillip Maffettone, Vivek Myers, Emily Delaney, Peyton Greenside and Andrew Wilson. Effective Surrogate Models for Protein Design with Bayesian Optimization [paper]
Paper 62: Anirudh Jain, Markus Heinonen and Samuel Kaski. Multi-target optimization for drug discovery using generative models [paper]
Paper 63: Otto Kißig, Martin Taraz, Sarel Cohen and Tobias Friedrich. Drug Repurposing using Link Prediction on Knowledge Graphs [paper]
Paper 64: Mafalda Dias, Pascal Notin, Jonathan Frazer, Sam Berry, Nikki Thadani, Lood van Niekerk, Debora Marks and Yarin Gal. Exploring the latent space of deep generative models: Applications to G-protein coupled receptors
Paper 66: Mika Sarkin Jain, Krzysztof Polanski, Mirjana Efremova and Sarah Teichmann. MultiMAP: Dimensionality Reduction and Integration of Multimodal Data [paper]


Highlights of Published Work

Paper 2: Omid Bazgir, Ruibo Zhang, Saugato Rahman Dhruba, Raziur Rahman, Souparno Ghosh and Ranadip Pal. Representation of Features as Images with Neighborhood Dependencies forCompatibility with Convolutional Neural Networks [paper]
Paper 4: Betul Guvenc Paltun, Samuel Kaski and Hiroshi Mamitsuka. DIVERSE: Bayesian Data IntegratiVE learning for precise drug ResponSE prediction [paper]
Paper 7: Jacob Schreiber and William Noble. Data-driven Experimental Prioritization via Imputation and Submodular Optimization [paper]
Paper 9: Yan Gao and Yan Cui. Data Inequality, Machine Learning and Health Disparity [paper]
Paper 26: An Zheng, Michael Lamkin, Hanqing Zhao, Cynthia Wu, Hao Su and Melissa Gymrek. Deep neural networks identify sequence context features predictive of transcription factor binding
Paper 40: Ilia Igashov, Kliment Olechnovič, Maria Kadukova, Česlovas Venclovas and Sergei Grudinin. VoroCNN: Deep Convolutional Neural Network Built on 3D Voronoi Tessellation of Protein Structures [paper]
Paper 43: Ilia Igashov, Nikita Pavlichenko and Sergei Grudinin. Spherical Convolutions on Molecular Graphs for Protein Model Quality Assessment [paper]

Diversity and Inclusion

We are pleased to announce that we will continue our ICML Workshop on Computational Biology Fellowships (for students) this year. Awards include a free registration. We encourage applications from underrepresented groups. The deadline is July 14th. Apply here.

We are also featuring other workshops that you might find helpful for diversity and inclusion.
Queer in AI

Steering committee

Elham Azizi: ea2690@columbia.edu (https://www.azizilab.com/)
Sandhya Prabhakaran: sandhya.prabhakaran@moffitt.org (sandhyaprabhakaran.com)
Abdoulaye Baniré Diallo: diallo.abdoulaye@uqam.ca (labo.bioinfo.uqam.ca)
Wesley Tansey: wesley.tansey@columbia.edu (http://wesleytansey.com)
Julia E. Vogt: julia.vogt@inf.ethz.ch (https://mds.inf.ethz.ch/team/detail/julia-vogt/)
Dana Pe’er: peerd@mskcc.org (https://www.mskcc.org/research/ski/labs/dana-pe-er)
Anshul Kundaje: akundaje@stanford.edu (https://profiles.stanford.edu/anshul-kundaje)
Wajdi Dhifli: wajdi.dhifli@univ-lille.fr (https://sites.google.com/site/wajdidhifli/)
Alexander Anderson: Alexander.Anderson@moffitt.org (labpages.moffitt.org/andersona/)
Engelbert Mephu Nguifo: engelbert.mephu_nguifo@uca.fr

Organizing committee

Debora Marks (debbie@hms.harvard.edu)
Yubin Xie: yux2009@med.cornell.edu
Cassandra Burdziak: cnb3001@med.cornell.edu
Amine Remita: remita.amine@courrier.uqam.ca
Mark Robertson-Tessi: Mark.robertsonTessi@moffitt.org
Jaan Altosaar: j@jaan.io
Sabeur Aridhi: sabeur.aridhi@loria.fr
Bishnu Sarker: bishnu.sarker@inria.fr

Program Committee

Elham Azizi, Columbia University
Abdoulaye Baniré Diallo, Université du Québec à Montréal
Cassandra Burdziak, Memorial Sloan Kettering Cancer Center
Yubin Xie, Memorial Sloan Kettering Cancer Center
Amine Remita, Université du Québec à Montréal
Sandhya Prabhakaran, Moffitt Cancer Centre
Engelbert Mephu Nguifo, University Clermont Auvergne - LIMOS - CNRS
Adrian Heilbut, Kallyope, Inc
Ahmed Halioui, My Intelligent Machines, Université du Québec à Montréal
Ahmet Coskun, Georgia Institute of Technology/Emory University
Anne Siegel, IRISA -- CNRS
Ayshwarya Subramanian, Harvard University
Bo Yuan, Harvard University
Chetanya Pandya, Sema4 Genomics
Christopher Garay, MITRE Corporation
Doron Haviv, Memorial Sloan Kettering Cancer Center
Elisabetta De Maria, Laboratoire I3S
Filippo Utro, IBM
Hayda Almeida, University of Quebec in Montreal (UQAM)
Ibrahim Chamseddine, Harvard Medical School/Massachusetts General Hospital
Jason Ernst, University of California, Los Angeles
Jean-Philippe Vert, Google
Luis Rueda, University of Windsor
Manu Setty, Memorial Sloan Kettering Cancer Center
Marcilio De Souto, LIFO/University of Orleans
Mathieu Carrière, Inria Sophia Antipolis
Mengting Gu, Yale University
Mervin Fansler, Memorial Sloan Kettering Cancer Center
Mohammed Alquraishi, Columbia University
Mor Nitzan, Racah Institute of Physics, The Hebrew University, Jerusalem
Niina Haiminen, IBM
Sabeur Aridhi, University of Lorraine
Sarvesh Nikumbh, Imperial College London and MRC London Institute of Medical Sciences
Sebastien Lemieux, IRIC / Université de Montréal
Sheng Liu, Indiana University School of Medicine
Xiang Niu, Weill Cornell Medical College