GDSCN Course Content

Our approach to exciting students about genomic data science involves introducing students to the field, investigating relevant questions, making analyses interactive and easy to follow, and equipping students with tangible skills to reuse and build upon. 

As we develop these course modules, instructors will be recruited to test the material to ensure that the content is usable, understandable, and valuable for both students and instructors.

BioDIGS in the Classroom

BioDiversity and Informatics for Genomics Scholars (BioDIGS) ( is an exciting research project that sets up undergraduate students to lead novel data collection and analysis. This distributed soil metagenomics research teaches students the fundamentals of genomics and data science while conducting novel research into the diversity within our soil.

Our book, BioDIGS in the Classroom, describes the project and leads an exploration of the data generated so far. We use the AnVIL and Galaxy platforms for data access and analysis.

SARS-CoV-2 Variant Detection with Galaxy on AnVIL

Our first lesson introduces a bioinformatics pipeline to detect a variant based on a publicly available genetic sample of SARS-CoV-2. Students will be introduced to the sequencing revolution, variants, genetic alignments, and essentials of cloud computing prior to the lab activity. During the lesson, students will work hands-on with the point-and-click Galaxy interface on the AnVIL cloud computing resource to check data, perform an alignment, and visualize their results. 

Explore the entire Lesson Book or  jump directly into the Lesson Overview, Background Lectures, or Lab Exercise sections.

SARS-CoV-2 icon by Hanna Vega is licensed under CC-BY SA 4.0

Statistics for Genomics

In collaboration with Kasper Hansen, we've developed a series of books based on material in his course Statistics for Genomics taught at the Johns Hopkins School of Public Health. These books introduce more advanced concepts in genomic data science and teach students more programming skills for these analyses, leveraging RStudio and Bioconductor in AnVIL.

Differential Expression

RNA Biology and Sequencing (RNA-seq)

Single-Cell RNA (scRNA-seq) Sequencing

Principle Component Analysis (PCA) and Batch Effects

Additional Course Resources

There is so much exciting work happening outside of the GDSCN to create educational materials on genomics and data science. Explore more resources below: