Semi-global Alignment

7. Semi-global Alignment#

7.1. Background#

We also have a third type of alignments, semi-global alignments, which are also known as semi-local or glocal alignments.

7.2. Principle#

Here we cross breed the Needleman-Wunsch and the Smith-Waterman algorithm, by initiating the dynamic programming matrix as for a local alignment, but using the same recursion as in the global alignment.

7.3. Implementation#

The implementation of the Semi-global alignment algorithm involves initializing an alignment matrix with zeros, and then iteratively updating its cells based on the scores of adjacent cells and the scoring schema (match, mismatch, and gap penalties).

7.4. Applications#

It is particularly useful in situations where we strive to align a short sequence to a longer sequence.

7.5. The definitions of the problem, and the solution#

Given two sequences \(a_1,\ldots,a_N\) and \(b_1,\ldots,b_M\), a scoring function \(d(x,y)\), find a semi-global alignment that gives an optimal (maximal) score.

The solution can be found by studying the dynamic programming matrix, \(S\), of size \((N+1,M+1)\), by using the recursions defined in equations (7.1) and (7.2).

(7.1)#\[\begin{split}\begin{align*} S_{0,0} = 0, & \\ S_{i,0} =& 0, & \textrm{for all}\ i \\ S_{0,j} =& 0, & \textrm{for all}\ j \end{align*}\end{split}\]

(7.2)#\[\begin{split}S_{i,j}=\max \begin{cases} S_{i-1,j-1} & + d(a_i,b_j)\\ S_{i-1,j} & + d(a_i,-)\\ S_{i,j-1} & + d(-,b_j). \end{cases}\end{split}\]

The optimal alignment is found by backtracing from the maximal bottommost or rightmost element, \(\max(\max_i S_{i,M},\max_j S_{N,j})\), to the first encountered leftmost or topmost (0) element.

7.6. Summary: Comparing Global, Local, and Semi-global Alignment#

Sequence alignment algorithms can be broadly categorized into three types: global (Needleman-Wunsch), local (Smith-Waterman), and semi-global alignment. Each serves a distinct purpose depending on the biological question and the nature of the sequences being compared.

Global alignment (Needleman-Wunsch) aligns two sequences from end to end, optimizing the alignment across their entire lengths. It is best suited for sequences of similar length and overall similarity.
Local alignment (Smith-Waterman) finds the highest-scoring matching region (subsequence) between two sequences, making it ideal for identifying conserved domains or motifs within otherwise divergent sequences.
Semi-global alignment is a hybrid approach, allowing for gaps at the ends of one or both sequences without penalty. This is particularly useful when aligning a shorter sequence (such as a sequencing read or a gene) to a longer reference (such as a chromosome or genome).

The table below summarizes the key differences:

Alignment Type	Matrix Initialization	Recursion Used	Traceback Start	Traceback End	Typical Use Case
Global	Gap penalties	Global (Needleman-Wunsch)	Bottom-right cell	Top-left cell	Full-length comparison of similar sequences
Local	Zeros	Local (Smith-Waterman)	Max cell anywhere	First zero	Motif/domain search, conserved regions
Semi-global	Zeros	Global (Needleman-Wunsch)	Max in last row/column	First zero in row/col	Short-to-long sequence alignment

Understanding these differences helps in selecting the appropriate algorithm for a given biological problem.