Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Semi-global Alignment

Background

We also have a third type of alignments, semi-global alignments, which are also known as semi-local or glocal alignments.

Principle

Here we cross breed the Needleman-Wunsch and the Smith-Waterman algorithm, by initiating the dynamic programming matrix as for a local alignment, but using the same recursion as in the global alignment.

Implementation

The implementation of the Semi-global alignment algorithm involves initializing an alignment matrix with zeros, and then iteratively updating its cells based on the scores of adjacent cells and the scoring schema (match, mismatch, and gap penalties).

Applications

It is particularly useful in situations where we strive to align a short sequence to a longer sequence.

The definitions of the problem, and the solution

Given two sequences a1,,aNa_1,\ldots,a_N and b1,,bMb_1,\ldots,b_M, a scoring function d(x,y)d(x,y), find a semi-global alignment that gives an optimal (maximal) score.

The solution can be found by studying the dynamic programming matrix, SS, of size (N+1,M+1)(N+1,M+1), by using the recursions defined in equations (1) and (2).

S0,0=0,Si,0=0,for all iS0,j=0,for all j\begin{align*} S_{0,0} = 0, & \\ S_{i,0} =& 0, & \textrm{for all}\ i \\ S_{0,j} =& 0, & \textrm{for all}\ j \end{align*}
Si,j=max{Si1,j1+d(ai,bj)Si1,j+d(ai,)Si,j1+d(,bj).S_{i,j}=\max \begin{cases} S_{i-1,j-1} & + d(a_i,b_j)\\ S_{i-1,j} & + d(a_i,-)\\ S_{i,j-1} & + d(-,b_j). \end{cases}

The optimal alignment is found by backtracing from the maximal bottommost or rightmost element, max(maxiSi,M,maxjSN,j)\max(\max_i S_{i,M},\max_j S_{N,j}), to the first encountered leftmost or topmost (0) element.

Summary: Comparing Global, Local, and Semi-global Alignment

Sequence alignment algorithms can be broadly categorized into three types: global (Needleman-Wunsch), local (Smith-Waterman), and semi-global alignment. Each serves a distinct purpose depending on the biological question and the nature of the sequences being compared.

The table below summarizes the key differences:

Alignment TypeMatrix InitializationRecursion UsedTraceback StartTraceback EndTypical Use Case
GlobalGap penaltiesGlobal (Needleman-Wunsch)Bottom-right cellTop-left cellFull-length comparison of similar sequences
LocalZerosLocal (Smith-Waterman)Max cell anywhereFirst zeroMotif/domain search, conserved regions
Semi-globalZerosGlobal (Needleman-Wunsch)Max in last row/columnFirst zero in row/colShort-to-long sequence alignment

Understanding these differences helps in selecting the appropriate algorithm for a given biological problem.