This semester, I have the pleasure of teaching a semester-long Computational Biology course for the first time at Southeastern. This is really exciting – I’m very excited to help a new generation of students learn to use computation to do their research. We have about 12 students – MS, undergraduates and even a faculty colleague.
It’s always a challenge to teach these sorts of courses. In terms of the material, computation is just different than the normal biology classes. It’s a different skill set, and many students have low awareness of scientific and technical computing.
But that’s not what I’m going to talk about today. Today, I’m going to talk about the technical aspects of a course like this. Computing courses aimed at graduate students typically go something like this:
- Show up; say hello; get a cup of coffee.
- Sit down and do installs.
- Do tutorials.
- Do additional installs later in the course, as needed.
This works OK. Graduate students are generally aware that they a) have research data and b) need to analyze it if they’re leaving with a PhD at some point. While it’s nice to make the installs not be horrifically painful, the audience is sort of captive.
Undergraduates aren’t. My class is an upper-division elective; they could do something else. If the installs are too torturous or their computer can’t run the software, they can leave. And that’s a particular issue for the students I serve. Southeastern is in Southeastern Louisiana, a historically low-income region of a low-income state. Most students work 20+ hours a week outside class and end up doing homework on a work computer that isn’t their personal machine. Some might not have reasonable computers. Some might be military or coast guard and might need to keep up with homework on a deployment weekend.
Enter JupyterHub. JupyterHub allows instructors to serve multiple instances of Jupyter notebooks for their classes. The servers can be then hooked up to a custom domain, so students can navigate in their browser to a course website, login, and start an interactive compute session without doing any installs. Need to do homework on a work computer? No problem! Your computer is 6 years old and you might need to finish the class out on a loaner laptop? No problem.
In particular, I used the Littlest JupyterHub, a JupyterHub variant for small classes. TLJH is meant to be used in a single-server set up. This works well for me for a couple of reasons:
- What we’re doing is fairly simple – install a few Python packages for working with data. We’re not doing a ton of complicated installs, or working with multiple languages, or compiling a whole lot of finnicky FOSS software. I don’t have a real need for containers, either.
- I have a newborn at home, and I’m very tired and I need everything explained like I’m five. TLJH docs are very good at this.
- I’d like this course to work, and work smoothly, and work with easily-communicated technology so I can encourage other faculty to adopt the model and infrastructure.
I had originally intended to serve the course off of a the State of Louisiana supercomputer, but a major equipment breakage has been taking up all the staff’s time, so they couldn’t set up a JupyterHub to their security specifications for me. I ordered a small server to run the course … and it didn’t arrive in time to start class. TLJH has well-written instructions for deploying on Digital Ocean, a provider for customizeable cloud servers. The time from making the choice of server to having a working hub was about 5 minutes. I purchased a domain name, and linked the nameserver and server address and had a working web portal in another few minutes.
Once the hub was working, I logged into it, opened a terminal and followed these instructions to start installing packages.
Each week, I make my lectures in Jupyter Notebooks, make a homework notebook, and push them to github. [We’re in week three of the class and still working our way through Data Carpentry‘s python materials. As a maintainer for these materials, I am slightly concerned that we’re in instructional hour 12 and still on these lessons … lot of stuff in there. We do every exercise, though.] Then, I use the nbgitpuller utility to generate a link to my repo. This causes the materials in each student’s hub to sync to my personal repo when clicked. This way, I can serve materials based on my GitHub and use version control in the way I’m used to, without inundating novice students with git lingo right away. I put the link on the schedule for that day. The students arrive, click the link, sync, and we get started.
Overall, this is very easy, and I’m very happy. I have 12 students + me working through the dc-ecology-py materials on a 2 CPU, 4 GB memory system. That seems sufficient, and my Digital Ocean server can be resized on the fly in under 2 minutes if I decide it isn’t. So far, the course has cost like $2.50 to run.
I’m the Bioinformatics and Computational Biology Core organizer for the Southeastern campus, under the state’s INBRE funding. I’m hoping to discuss these experiences more at the INBRE retreat in two weeks, and hopefully drive forward adoption of these types of course set-ups. TLJH is clearly a very important tool in the kit for serving the students that we have.
Edit: Got a good question on twitter:
And the answer is yes! There is a terminal in JupyterHub, so you can still practice the command line, command line based revision management, and running python scripts at the command line. Below are screenshots of how you access it.
I started with the idea of navigating the file system from within Python last week, then did some shell navigation the following class period. I even had props!