SimSearch: A New Variant of Dynamic Programming Based on Distance
Series for Optimal and Near-Optimal
Similarity Discovery in Biological Sequences
Sérgio A.D. Deusdado1, Paulo M.M. Carvalho2
Escola Superior Agrária1
Instituto Politécnico de Bragança
P-5300 Bragança, Portugal
E-mail: sergiod@ipb.pt
<>Universidade do Minho2
Departamento de Informática
P-4710-057 Braga, Portugal
E-mail: pmc at di.uminho.pt
>
Abstract
In this paper, we propose SimSearch, an algorithm implementing a new
variant of dynamic programming based on distance series for optimal and
near-optimal similarity discovery in biological sequences. The initial
phase of SimSearch is devoted to fulfil the binary similarity matrices
by signalling the distances between occurrences of the same symbol. The
scoring scheme is further applied, when analysed the maximal extension
of the pattern. Employing bit parallelism to analyse the global
similarity matrix’s upper triangle, the new methodology searches the
sequence(s) for all the exact and approximate patterns in regular or
reverse order. The algorithm accepts parameterization to work with
greater seeds for near-optimal results. Performance tests show
significant efficiency improvement over traditional optimal methods
based on dynamic programming. Comparing the new algorithm’s efficiency
against heuristic based methods, equalizing the required sensitivity,
the proposed algorithm remains acceptable.
IWPACBB'08,
Salamanca, Spain, October 2008