(Biomedical Research Extensive Archive To Help Everyone)

Natural Language Understanding Tools for Biomedical Researchers

AI vs COVID-19 Initiative

Breakthroughs in machine learning make it possible for the first time to read and understand complex research at scale. Our goal is to make BREATHE Dataset and BREATHE Deep Literature Search a resource for biomedical researchers, doctors and virologists, to augment their ability to sift through biomedical knowledge and existing research to extract novel insights and help them make new drug discoveries.

Who will benefit?

  1. Biomedical researchers - those that look for cures to illnesses. In particular we aim to help researchers involved in looking for treatment / vaccines for COVID-19 and more generally doing research in novel drugs, vaccines and treatment protocols.

  2. Virologists - those doing research on viruses, their mechanism of reproduction, propagation and infection of host organisms.

  3. Epidemiologists - those studying patterns of frequency and the causes and effects of diseases in the human population.

Why build it?

  1. COVID-19. The virus is rapidly advancing and there is currently no proven therapy or vaccine. BioMedBERT will provide tools to discover novel insights into existing research.

  2. Research Volume & Velocity. There are millions of research documents in a variety of repositories and the pace of publications is accelerating. The existing volume of research already too vast for any single individual or small group to be master and it is only growing.

  3. Missed Latent Connections. Nuanced or weak connections between research items could be recognized by our language model, allowing researchers to approach existing problems with a broader set of tools and ideas.


Our approach starts with gathering one of the largest research datasets in the world, 'BREATHE'. The BREATHE (Biomedical Research Extensive Archive To Help Everyone) dataset contains more than 16 million machine-read medical and research publications. Our approach then uses state-of-the-art machine learning techniques to identify latent insights from literature and deep Neural Network based model for Language Understanding. We utilize emerging language architectures (BERT, T5) to achieve these insights.

Our infrastructure runs on Google Cloud Platform and takes advantage of the massive compute power available via the TensorFlow Research Cloud. A single Cloud TPU v3 Pod can deliver 100+ petaflops (1 petaflop=one thousand million million (10 15th) floating-point operations per second). By utilizing the compute, storage, and networking of Google Cloud along with standard open source platforms and tools such as TensorFlow, we are able to train, refine, and iterate our models faster than ever before.

Team members and collaborators:

Our team feels the of sense of urgency to bring unique tools to help with the current crisis, and to further help the world become better prepared for the next. The team is composed of machine learning experts, computer scientists, and technology practitioners from across the globe. This includes AI experts who are Machine Learning Google Developer Experts (GDEs), Software Developers from 42 Silicon Valley, and consultations with experts from Google Cloud and TensorFlow Research Cloud.

Dan Goncharov

Francesco Mosconi

Ivan Kozlov

Fabrizio Milo

Souradip Chakraborty

Ekaba Bisong

Shweta Bhatt

Antoine Delorme

Uliana Popov

Khloe Hou

Igor Popov

Gulnozai Khodizoda

Ishmeet Kaur

Simon Ewing

Blaire Hunter

Suzanne Repellin

Danila Kurgan

Colton Ehrman


Dave Elliott

Soonson Kwon

Nazneen Aziz, PhD

Joseph Lehar, Phd

Scientific Advisory Board:

Rachel Leibman, Ph.D. Microbiology | Assoc. Scientific Director of AbbVie

Lisa K. Fitzpatrick MD, MPH, MPA
Founder and CEO, Grapevine Health

Janie F. Shelton, PhD, MPH
Senior Scientist at 23andMe

Ran Gao, PhDSenior Associate Director Global Epidemiology at Boehringer Ingelheim

Salah Qutaishat, PhD
Infectious Disease Epidemiologist

Mari Takashima
Nurse Researcher Epidemiologist | PhD candidate

David Lilienfeld, MD, MPH
Clinical development pharmacovigilance consultant

Ashley Brenton, PhD
Chief Science Officer at Mycroft Bioanalytics

Jeff Barrett, Ph.D. Senior Advisor, Quantitative Medicine at Critical Path Institute (C-Path)

Toumy Guettouche, Ph.D. Biochemistry and Molecular Biology | Director of Roche Sequencing Solutions

Andrew Whittle, Ph.D. Metabolism Biology | Innovation Lead at Novo Nordisk

Partners and Sponsors: