Transitioning from wet lab to bioinformatics: My research journey
A brief history of big data
I found myself caught in the era of big data in genetics research while I was doing my Bachelor of Science degree at the University of Adelaide. It was around 2016 when I first got introduced to the term “data science” and “big data” in the third year of my genetics course.
The manifestation of big data can be traced back to early 1944 according to a Forbe’s article, A very short history of big data. From what I understand, the increased capacity to generate biological research data demanded that relevant individuals within the field develop the skills and knowledge needed to analyse this data. As a result, new niches for complex biology data analysis emerged, namely bioinformatics and computational biology.
From the lab bench to the computer
My undergraduate practical sessions focused more on essential wet lab techniques which involve running experiments in a lab setting – using both biological samples and chemical reagents – such as western blotting to detect proteins, Polymerase Chain Reactions (PCR) to amplify DNA sequences and CRISPR-CAS gene editing technology to induce gene mutations. At that time, I was involved in a wet lab placement in which I did a lot of PCRs and genotyping. On the other hand, my bioinformatics lectures were mostly theory-based. Back then, coding wasn’t one of my strengths. But, since coding wasn’t an essential part of my practicals, I was able to complete all of the tasks assigned to me.
I was first introduced to coding when I attended an introductory bioinformatics workshop in 2014 – the same year that the University of Adelaide’s bioinformatics hub and our supercomputing system, “Phoenix,” were established. Little did I know that this was the beginning of my transition from bench-work in the lab to bioinformatics i.e., the science of analysing biological data via computational analysis.
My transitional journey
Although I was able to make the transition to the field of bioinformatics, I have to admit that I had a steep learning curve at the beginning. The first hurdle I had to conquer was learning how to code. This is much like learning how to write a sentence for the first time. After writing the “sentences” i.e. the command lines, I then needed to learn how to compile them into functional coding scripts. So in a way, I would say that it was a lot like putting sentences together into a meaningful paragraph. I started learning Bash and R, the two coding languages that are commonly used to process sequencing data. Unlike conversational sentences, I found it difficult to understand the command lines or the reasoning behind its structure. Coming from a non-coding background, for me the letters and symbols of different coding functions seemed like ancient hieroglyphs that I couldn’t comprehend.
It took me a while to get over the fear of "screwing things up" and boldly executing command lines to see what each command does. To speed up the learning process, I signed myself up for online coding courses. Whenever I stumbled across any roadblock as my learning progressed, I would not hesitate to consult experts such as bioinformaticians, statisticians, or computational biologists. Over time, with a lot of practice (and even hitting a dead end a few times), I slowly got the hang of it.
The next challenge was to learn and understand data processing. Understanding the rationale behind each data processing step, such as data cleaning and quality checks, is vital for building up decision making in data analysis. At first, I found it overwhelming to understand what happens once a code is executed. This was because I knew too little at the time to check the processed data files and interpret the output. I also faced problems understanding the terminologies used in computational biology – hierarchical clustering, principal component analysis (PCA), and bootstrapping, just to name a few of these terms. However, I slowly got increasingly better at this after some trial-and-error, as more experience served to hone my data analysis skills.
I find the thought processes involved in computational data analysis very different from my wet lab experience. For instance, at the time of my placements, while I was doing wet lab experiments, I used to plan my experiments beforehand, and then carefully follow the protocols to conduct the experiment and interpret the results. The wet lab components were concrete, and hence I was able to visualise and sense that I was making progress, as opposed to bioinformatics where the data analysis is performed in an abstract coding world, represented by a terminal interface. Sometimes, when I failed to get a code to work, I felt defeated as I blindly believed that I had not made any progress. However, I slowly learnt about keeping track of the changes I was making to my code (version control) using online repositories such as Github. This also served as a way for me to remind myself of how much progress I was making, which is very different from lab book-keeping for wet lab experiments.
Aside from learning the technical aspects of my new field, I also discovered that stereotypes exist about research students involved in data analysis. There is a common misunderstanding that bioinformatics or computational biology research students are just “typing using their keyboards” and their work is not as laborious, as compared to their wet lab experiment peers, thus indirectly indicating that they are “not doing much work.” I experienced a mild identity crisis for a while because of the presence of this stereotype. To change that stereotyping, I talked more about the progress of my work, rather than the results from my data analysis, to my colleagues. I intended to create an awareness that data analysis requires intensive thinking and coding processes before the raw data can be presented as a refined data visualisation for further interpretation.
In retrospect, transitioning from a wet lab monkey to a data miner has been rewarding despite it being a rocky journey. The transition was definitely possible through persistence and determination. I am now a Master of Philosophy (MPhil) research candidate at the Plant Epigenetics and Reproduction Group at the University of Adelaide working on understanding plant epigenetics via data analysis. I have presented my work via talks and posters at both local seminars and national conferences. The national conferences include the Australian Bioinformatics and Computational Biology Society (ABACBS) Conference 2017 (Adelaide), Australasia Conference for Undergraduate Research (ACUR) 2017, ComBio2018 (Sydney), and Lorne Genome Conference 2019 (Victoria).
In sum, I have come a long way and I am extremely grateful for the mentorship I have received from really fantastic researchers and data scientists here at my current university. However, I believe that there is still more to learn. And I am looking forward to what the future holds for a girl who is curious about the world of bioinformatics and computational biology.