Teaching Phylogenetics in the Cloud

A few weeks ago, I wrote about using JupyterHubs to make computational biology education more accessible to my students. I know teaching with Jupyter Notebooks has it’s detractors, but I’ve always noticed a difference when teaching with notebooks, as opposed to a text editor + the interpreter. The conversations in the classroom stay more focused on the material, rather than what they missed when the interpreter moved too fast, or when I switched from the script to the terminal. And, in fact, multiple students have told me similar – that they’ve felt lost in other courses, but not mine. I feel like there’s an education paper there that I don’t have the experience to write, but would happily collaborate on.

When we think about teaching computation, or phylogenetics, we often think of PhD students. Their questions look like this:

  • “I have data, can you please please please help me analyze it?”
  • “I have data and my advisor says I need a phylogeny like tomorrow, help???”
  • “I read about this technique, can you help me understand if it’s appropriate for my data?”

And so I’ve mostly switched over to a read-try-create model for teaching. Read an example, try to run the example and understand the output, create your own extension or apply the concept to novel data. I find this works better for early-stage MS students, and my undergraduate students.  Their questions often look like this:

  • “I think I might find phylogeny interesting, can I try?”
  • “I’ve heard that computation is the wave of the future, but can I really do it?”
  • “I’m not sure this will be part of my career as a scientist, can you try it with me?”

Read-try-create in a notebook environment puts content over delivery.

I got to wondering if I could see similar shifts in phylogenetics if I adopted this framework for teaching phylogeny. I typically teach phylogeny with RevBayes. There are a few reasons for this – I’ve implemented things in RevBayes, I like the graphical model framework, the analyses that I want to do are implemented there. The tutorial materials are also wonderful. But RevBayes has a framework in which you specify almost all parts of a phylogenetic model, including some concepts that are quite an abstraction from empirical biology, like specifying MCMC moves. Learners get overwhelmed, fast, and switching back and forth between text editor and interpreter is a lot for many of them.

Simon Frost and Michael Landis created a Rev Jupyter Kernel. I recently contributed a pull request that fixes some core functionality of this, and it’s now ready for use. I’ve been using the RevNotebook in a Littlest JupyterHub instance to onboard undergraduates and MS students into phylogeny research, and I’m really happy with it. Here’s what I did:

  1. First, I set up a JupyterHub on Digital Ocean according to these instructions. There’s more on this in my Plan C post. I started an 8 GB RAM instance.
  2. In the terminal of the JupyterHub, I installed RevBayes according to the Linux instructions. Mostly – the build command to build a Jupyter-ready RB is ./build.sh -jupyter true
  3. I cloned the kernel. And followed the instructions. Note that for a JupyterHub, to make RevNotebook available to all your users, any sudo command will need to be sudo -E
  4. Then, I cloned in the repository where I’ve been stashing notebooks.
  5. Next, I added students
  6. Finally, I created a link for students to click on to automatically sync their copy of the lessons with mine.

So far, so good! For the most part, the students are working through the lessons on their own, and it’s going so much better than last year when I was assigning them lessons, and having them work from the PDF at the Rev  interpreter. Anecdotally, I feel like comprehension, and crucially retention, is higher.

But don’t they need to learn command line???

Yes. Yes, they do. But I think if they really understand both phylogeny and Rev before I turn them loose on our friendly local HPC, it will be better. It’s not that hard to run a script. We’ve practiced running some Rev scripts in the terminal of the JupyterHub. I really don’t doubt that we can take those skills are port them to an HPC.  I’ll probably make my second-year undergraduates give the new members the five-cent tour of the HPC. But understanding what’s in the script … that’s hard.

I wanna see the RevHub!

RevBayes is too memory-intense to compile in a MyBinder, but if you want to play around, I can give you access to my RevHub. Just drop me a line.

Additionally, I am unhappy with my workflow for converting our Latex and Markdown tutorials to Notebooks. If you’d like to help, I’d love a buddy. A couple of us have been floating the idea of a SciPy meeting sprint to develop a set of notebooks for teaching phylogenetics in Python and Rev. Get in touch, if you’re interested? No, that doesn’t look right. Get in touch!!! Diversity in contributorship is our strength.



One thought on “Teaching Phylogenetics in the Cloud

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s