What is Bioinformatics?

The focus of this blog begins with Bioinformatics, so I guess you were wondering – what is Bioinformatics!?

Fundamentally, bioinformatics is the application of information science to understand biological complexity.

Put simply – it is the use of computers to analyse, store and share biological data and involves a combination of Statistics, Computer Science and Biology.

Bioinformatics allows us to study the function and evolution of genes and genomes.

As a result bioinformatic methods are routinely applied to many important real world applications such as disease monitoring, biofuel development, drug development, forensic analysis, evolutionary studies, veterinary science and crop improvement.

Why Biology, Statistics AND Computer Science?

Bioinformatic data is complex – and nowadays it can be huge – so we need computers to help us with large analytical tasks.

This means we have to understand some level of computer science e.g how to run computationally expensive jobs so that resources are used more efficiently and how to automate tasks to save time and improve accuracy.

Similarly we need statistics to make sense of all that data and ensure we are not misleading ourselves through chance associations and patterns.

As you will learn a large part of bioinformatics involves identifying and classifying patterns in biological data (where the data is usually in the form of DNA sequences).

We cannot just rely blindly on statistics however, we also need an understanding of biology and it’s limitations i.e what does and doesn’t make sense biologically when running our analyses and interpreting the results.

Biological data appears meaningless outside a biological context – something that can make bioinformatics a tricksy form of data science – you need to know the underlying biology.

Before you read on, it may be handy to watch this quick recap on DNA, Genes and Genomes if you aren’t already familiar.

Who founded Bioinformatics?

Bioinformatics was founded by Dr Margaret Dayhoff, a pioneer in applying computer science to biology.

Dayhoff came up with the amino acid code after realising that she could reduce the size of her data by turning DNA triplets or codons into a one letter code.

She also set up the first protein sequence database, created an atlas of reference proteins and designed one of the first substitution matrices (we will go over what these are in a later post).

Togethor with Richard Eck, Dayhoff also created the first reconstruction of a phylogenetic tree!

Image taken from Smithsonian.com; Photo courtesy NIH National Library of Medicine / Ruth Dayhoff

Although a pioneer of her time, in the 60’s bioinformatics was not yet recognised as a scientific field, it was later that the term ‘Bioinformatics‘ was coined by Paulien Hogeweg and Ben Hesper.

Initially the field was defined as ‘the study of informatic processes in biotic systems’. You can read more about the development of the term bioinformatics from Paulien Hogeweg herself here.

Since the creation of the internet in the 90’s and the subsequent development of next-generation sequencing (NGS) in the 2000’s there was an explosion of new bioinformatic tools, algorithms and databases.

The big data revolution for biology had begun!

Nowadays the term bioinformatics has become almost exclusively associated with using informatics to prepare and analyse genomic data.

Take a look at some Bioinformatician job descriptions and you will almost always find genomics and NGS experience listed as essential skills.

Bioinformatics isn’t just concerned with genomics however, the field encompasses methods of generating, analysing, storing and sharing biological data.

In the next post on bioinformatics we will explore the different areas of this field in more detail and find out what bioinformatic data looks like.