Data Science Hangouts 2025

DDDI's 2025 Summer Hangouts program offers students the opportunity to participate in informal, hands-on tutorials led by our postdoctoral research fellows. These tutorials are open to students from all backgrounds and skill levels and cover various data science methods and topics. Hangouts will be held twice a week on Tuesdays and Thursdays, from June 10th through June 25th. A pizza lunch will be provided. All talks and tutorials will take place in the RDDSX space, conveniently located near the Collaborative Classroom in Van Pelt-Dietrich Library.

Our Hangouts series this year will explore machine learning, statistics, optimization, and generative AI for scientific discovery across fields. All are welcome!

Location

All tutorials will take place in the RDDSX space just outside the Collaborative Classroom in Van Pelt-Dietrich Library.

Live Stream

View the live stream

Recordings

Recordings of the sessions will be made available here

RSVP

Space might be limited.

Please RSVP

Schedule

Hangouts will run twice a week from noon to 1pm from June 10th through June 25th. A pizza lunch will be provided.

Date	Speaker	Title + Description
Tuesday 6/10 noon - 1pm	Sam Dillavou + Kieran Murphy	Introduction to Machine Learning What is machine learning, how is it used, what does it do well, and where does it go wrong?
Thursday 6/12 noon - 1pm	Tess Cherlin	From Data to Insight: Applying Survival Analysis in Clinical Research In clinical research, we often want to determine whether an intervention improves patient outcomes — such as whether a new cancer therapy increases survival rates. Conversely, we also seek to understand how long it takes for certain patient populations to develop a disease, such as the time to cardiovascular disease onset stratified by sex assigned at birth. In this tutorial, we will use simulated clinical data and apply survival analysis methods to explore disease outcomes. Cloud-based RStudio (Posit) will be used for this tutorial. No coding skills required to participate.
Tuesday 6/17 noon - 1pm	Coby Viner	Speeding up science: GNU Parallel for bioinformatics and beyond This session introduces GNU Parallel—the shell utility that can launch thousands of jobs simultaneously, harnessing every CPU core (or even multiple machines) with a single command. We’ll start with the why and a one-command install, then walk through the core syntax—argument lists, replacement strings, streaming with --pipe, and positional parameters—using concise man-page examples. A quick tour of newer features—colour-coded live output, adaptive --delay auto, memory-aware suspension, and seamless SLURM/SSH execution—leads into a benchmark of real-world speed-ups. To ground it all, we’ll close with a concrete genomics pipeline from my own work, showing how GNU Parallel transforms long, brittle for-loops into reproducible high-throughput workflows.
Thursday 6/19	No session	Juneteenth
Tuesday 6/24 noon - 1pm	Sourav Dey	Bayesian optimization, with applications in chemical reaction discovery and optimization Machine learning in the low data regime. How do you find the optimal solution to a problem when the objective function is expensive to evaluate. We will also discuss how to optimize these types of black box functions in a sparse space.
~~Thursday 6/26~~ Wednesday 6/25 noon - 1pm	Supranta S. Boruah	Diffusion Models in Action: From Toy Problems to Dark Matter Maps Diffusion models have emerged as powerful tools in generative modeling, achieving impressive results across different domains. In this tutorial, we’ll explore how these models can be adapted to tackle a cosmological challenge: mapping dark matter in the Universe from noisy weak lensing observations. We’ll begin with a hands-on introduction to diffusion models using a simple 1D example. Participants will simulate the forward and reverse diffusion processes, first with an analytical score function and then by training a neural network to learn it. Along the way, we’ll build up key intuitions behind denoising score matching. The second half of the tutorial connects these concepts to a real-world application in cosmology. We'll discuss how diffusion-based posterior sampling can be used to reconstruct dark matter mass maps from lensing data, inspired by recent work in the field.