Recently, a paper came out discussing the prevalence in errors in data spreadsheets associated with publications. Many of my colleagues have taken that as evidence that Excel should not be used, in favor of programmatic solutions like R or Python. And that’s pretty well how I feel – I usually only use Excel to view data. When I do have to use Excel to enter data, such as data that cannot be obtained programmatically, I try to stick to the Data Carpentry guidelines.
That said, I think Greg Wilson made an excellent point here, that many of these same errors could occur in programmatic data analysis. It seems obvious to me that the solution here isn’t stop using Excel, unless you’re going to make an investment in training your workers to use programmatic tools correctly. To that end, I’d like to introduce our new undergraduate researcher, Krishna Gandikota. Krishna is working with me to develop a small parsing program to pull down taxonomy data from the community resource AntWiki, and parse it to fit into my current data structures.
This will be a useful tool for several reasons:
- We can avoid the errors introduced by entering data by hand
- We can save time over searching tons of ants by hand
- Others will be able to use this tool for their work
And, in his own words, here’s Krishna and what he’d like to obtain this semester with the Paleantology project:
I am a sophomore in Biomedical Engineering pursing medical school. As someone who aspires to go into the medical field I wanted to take a step into the concept of “research”. I hope to widen my understanding of biology and obtain some new skills in software and coding.