Tips for testing scripts and pipelines

When writing a pipeline or a script there is inevitably some trial and error involved. One way to help yourself out is to design the right test files before you start.

For example, you want to write a script that can identify and read through (or ‘parse’) each fasta sequence entry in genome file.

You don’t want to test a script on an entire genome, this takes too long and as you’ll see just makes your editor crash.

Instead make a small multi-fasta file and run your script on it to identify your script is working correctly. This kind of method works for any file.

Test files are particularly useful when working on large computational jobs on the HPC. You don’t want to wait two weeks to realise you made a simple spelling error!

Something else I find handy is to imagine before hand the different entries you might encounter and artificially create a small test file that includes all of these.

This way you can see how your script handles them (handy if you have a script that’s parsing something and there are characters that might get in the way like tricky characters in a fasta header)