Why use Bioinformatic software via the command line?

Often bioinformaticians need to run multiple programs to achieve a specific task, each with their own commands, arguments and options.

These programs usually run sequentially i.e the output of each program needs to be given as input to the next program.

Using the command line we can group these tasks into a single pipeline or workflow.

An example

Imagine you have received DNA sequences of bacteria from patients with a known disease.

After an initial DNA barcoding check you want to understand which bacteria are more closely related in your patient sample.

Your task is to create a phylogenetic tree from these DNA sequences, to achieve this you carry out the following workflow:

1.Run program A to align the DNA sequences.
2.Run program B to trim the aligned DNA sequences.
3.Run program C to find the best evolutionary model for your trimmed DNA alignment.
4.Run program D to build a phylogenetic tree from the DNA alignment using the best evolutionary model.

In this workflow we can see that each program relies on the output of the program before it.

We could run each program individually, specifying the commands, arguments and options, however, to save time we can combine these tasks into a single bioinformatic pipeline.

We can then run this pipeline through the command line using either a bash script or, for more complex workflows, a workflow manager.

In addition to saving you time automating these tasks also reduces the chance of human error, requiring less manual intervention.