GDSCN Course Content

Our approach to exciting students about genomic data science involves introducing students to the field, investigating relevant questions, making analyses interactive and easy to follow, and equipping students with tangible skills to reuse and build upon.

As we develop these course modules, instructors will be recruited to test the material to ensure that the content is usable, understandable, and valuable for both students and instructors.

SARS-CoV-2 Variant Detection with Galaxy on AnVIL

Our first lesson introduces a bioinformatics pipeline to detect a variant based on a publicly available genetic sample of SARS-CoV-2. Students will be introduced to the sequencing revolution, variants, genetic alignments, and essentials of cloud computing prior to the lab activity. During the lesson, students will work hands-on with the point-and-click Galaxy interface on the AnVIL cloud computing resource to check data, perform an alignment, and visualize their results.


Explore the entire Lesson Book or jump directly into the Lesson Overview, Background Lectures, or Lab Exercise sections.

SARS-CoV-2 icon by Hanna Vega http://book.bionumbers.org/ is licensed under CC-BY SA 4.0 https://creativecommons.org/licenses/by-sa/4.0/

Statistics for Genomics

In collaboration with Kasper Hansen, we've developed a series of books based on material in his course Statistics for Genomics taught at the Johns Hopkins School of Public Health. These books introduce more advanced concepts in genomic data science and teach students more programming skills for these analyses, leveraging RStudio and Bioconductor in AnVIL.

Differential Expression

RNA Biology and Sequencing (RNA-seq)

Single-Cell RNA (scRNA-seq) Sequencing

Principle Component Analysis (PCA) and Batch Effects


Additional Course Resources

There is so much exciting work happening outside of the GDSCN to create educational materials on genomics and data science. Explore more resources below: