Data Science Summer Hangouts Series 2025

RDDSX space outside the Collaborative Classroom
Van Pelt-Dietrich Library

DDDI's 2025 Summer Hangouts program offers students the opportunity to participate in informal, hands-on tutorials led by our postdoctoral research fellows. These tutorials are open to students from all backgrounds and skill levels and cover various data science methods and topics. Hangouts will be held twice a week on Tuesdays and Thursdays, from June 10th through June 25th. A pizza lunch will be provided. All talks and tutorials will take place in the RDDSX space, conveniently located near the Collaborative Classroom in Van Pelt-Dietrich Library.

Our Hangouts series this year will explore machine learning, statistics, optimization, and generative AI for scientific discovery across fields. All are welcome!


View the live stream

Recordings of the sessions will be made available here


Schedule

Hangouts will run twice a week from noon to 1pm from June 10th through June 25th. A pizza lunch will be provided.

 

DateSpeakerTitle + Description
Tuesday 6/10
noon - 1pm
Sam Dillavou +
Kieran Murphy

Introduction to Machine Learning

What is machine learning, how is it used, what does it do well, and where does it go wrong?

Thursday 6/12
noon - 1pm
Tess Cherlin

From Data to Insight: Applying Survival Analysis in Clinical Research

In clinical research, we often want to determine whether an intervention improves patient outcomes — such as whether a new cancer therapy increases survival rates. Conversely, we also seek to understand how long it takes for certain patient populations to develop a disease, such as the time to cardiovascular disease onset stratified by sex assigned at birth. In this tutorial, we will use simulated clinical data and apply survival analysis methods to explore disease outcomes. Cloud-based RStudio (Posit) will be used for this tutorial. No coding skills required to participate.

Tuesday 6/17
noon - 1pm
Coby Viner

Speeding up science: GNU Parallel for bioinformatics and beyond

This session introduces GNU Parallel—the shell utility that can launch thousands of jobs simultaneously, harnessing every CPU core (or even multiple machines) with a single command.

We’ll start with the why and a one-command install, then walk through the core syntax—argument lists, replacement strings, streaming with --pipe, and positional parameters—using concise man-page examples. A quick tour of newer features—colour-coded live output, adaptive --delay auto, memory-aware suspension, and seamless SLURM/SSH execution—leads into a benchmark of real-world speed-ups.

To ground it all, we’ll close with a concrete genomics pipeline from my own work, showing how GNU Parallel transforms long, brittle for-loops into reproducible high-throughput workflows.

Thursday 6/19No sessionJuneteenth
Tuesday 6/24
noon - 1pm
Sourav Dey

Bayesian optimization, with applications in chemical reaction discovery and optimization

Machine learning in the low data regime. How do you find the optimal solution to a problem when the objective function is expensive to evaluate. We will also discuss how to optimize these types of black box functions in a sparse space.

Thursday 6/26 Wednesday 6/25
noon - 1pm
Supranta S. Boruah

Diffusion Models in Action: From Toy Problems to Dark Matter Maps

Diffusion models have emerged as powerful tools in generative modeling, achieving impressive results across different domains. In this tutorial, we’ll explore how these models can be adapted to tackle a cosmological challenge: mapping dark matter in the Universe from noisy weak lensing observations. We’ll begin with a hands-on introduction to diffusion models using a simple 1D example. Participants will simulate the forward and reverse diffusion processes, first with an analytical score function and then by training a neural network to learn it. Along the way, we’ll build up key intuitions behind denoising score matching.

The second half of the tutorial connects these concepts to a real-world application in cosmology. We'll discuss how diffusion-based posterior sampling can be used to reconstruct dark matter mass maps from lensing data, inspired by recent work in the field.