CPD Events Workshops

ASMR’s professional development webinar series for 2024 has concluded.
Below is a summary of our final webinar of the year, Bioinformatics: Considerations and tools for Biologists.

Key information from
Bioinformatics: Considerations and tools for biologists

ASMR’s last personal development webinar of 2024 covered key decision points in the workflow needed to extract biologically significant insights from the data generated by next generation sequencing. Below is a summary of what was presented.

Date: 26 October 2024
Presenter: Dr Dale Watkins, Illumina

The workflow:

  • Sample preparation
  • Library preparation
  • Sequencing
  • Analysis

Key consideration:

  • The application ­– the question being asked ­– determines the tools you will need. This encompasses the number of samples needed to achieve statistical significance to the level of coverage per sample, the number and length of reads needed and how to analyse the data.

ADVICE: Check the scientific literature for similar sequence-based applications, including how the sequence was analysed, down to whether commercial products or more task-specific open-source software was used.

Application classes
RESEQUENCING APPLICATION:
samples are aligned or tested for variation relative to a known reference genome.

DE NOVO APPLICATION:
the sequence of an unknown variant is being assembled for the first time.

Key considerations:

  • The ability to store, curate and process the required amount of data.
  • An understanding of the number of reads needed to distinguish between allelic  variation/polymorphisms and sequencing errors/ambiguities.

Single and paired end reads
Whether or not to read the sequence of a DNA fragment from one or both ends also depends on application. Typically:

  • Resequencing for SNP detection can use either. In this instance, coverage depth is more important.
  • Resequencing for indel mutations (insertions or deletions) require paired end reads since assembly of sequence is needed.
  • De novo genome or transcriptome assembly requires paired end reads.
  • RNA sequence can use either.
  • Differential expression of small RNA can use either.

Data analysis overview
There are three steps in the analysis pipeline:

  1. PRIMARY: generated by the sequencing instrument as it converts fluorescence measurements into a base call.
  2. SECONDARY: demultiplexing (separating samples based on their unique barcode) and alignment (for example, matching to a reference sequence or performing a sequence assembly).
  3. TERTIARY: to generate insights about a biological system, such as the cause of a genetic disease. This is the most challenging analytical challenge.

File types

  1. .fastq
    –  generated by the sequencing instrument
    –  files processed for de novo assembly applications typically remain in FASTQ formats
  2. .bam
    –  
    generated by alignment to a reference genome
  3. .vcf
    generated by variant calling software that identifies differences between a sample and reference sequence
    two formats are available: VCF and gVCF (genome Variant Call Format)

    Software considerations

    There are three main pathways:

1.     Commercially available software packages. Typically for a secondary level analysis relating to standard
applications.
2.     Command line coding using open source software. Typically for more specialised or non-standard applications. Also required when there is a need to generate specialised data representations or reports.
3.     Biostatistics expertise. Typically for more complex analytical challenges requiring a biostatistician to ensure that the experimental design, the data and the analysis can achieve statistical significance.
Ultimately, the question boils down to building or buying the software.

Illumina software packages
Illumina’s DRAGEN (Dynamic Read Analysis for GENomics) package is a secondary analysis software kit especially designed for biologists who lack bioinformatics skills.

More information is available at the Illumina website here.

It is suited to a wide range of applications, including whole genome, exomes, methylomes and transcriptome analysis. This range means it can replace about 30 open-source tools.

Illumina also offers educational resources and support.

More information
A recording of the webinar is available here.

This webinar is presented with thanks to ​, the exclusive sponsors of ASMR’s 2024 professional development webinar series