pynw
Rust-accelerated Needleman-Wunsch global sequence alignment for precomputed similarity matrices. Python bindings are built with PyO3.
Align two ordered sequences, allowing for gaps (insertions/deletions), given any precomputed pairwise similarity matrix. Useful for strings, time series, token sequences, or any domain where element order matters.
Features
- Fast: Alignment runs in $O(nm)$ time; a
1000×1000matrix takes <10 ms on modern CPUs. - NumPy-first: Accepts NumPy arrays directly — no intermediate Python objects.
- Domain-agnostic: Operates on a precomputed similarity matrix, so any scoring function (distance metrics, semantic similarity, etc.) works out of the box.
- Asymmetric gaps: Optionally set separate insert and delete penalties.
Installation
Requires Python 3.10+ and NumPy 1.21+. Prebuilt wheels for Linux, macOS, and Windows are published on PyPI.
pip install pynw
On platforms without a prebuilt wheel, pip will build from the source distribution. This requires a Rust toolchain (1.85+).
Quick start
Align two DNA sequences using a simple match/mismatch scoring scheme:
import numpy as np
from pynw import needleman_wunsch, alignment_indices
seq1 = np.array(list("GATTACA"))
seq2 = np.array(list("GCATGCA"))
# Build an (n, m) similarity matrix: +1 for match, -1 for mismatch
similarity_matrix = np.where(seq1[:, None] == seq2[None, :], 1.0, -1.0)
score, ops = needleman_wunsch(similarity_matrix, gap_penalty=-1.0)
src_idx, tgt_idx = alignment_indices(ops)
# Reconstruct aligned sequences; masked positions are gaps
aligned1 = np.ma.array(seq1).take(src_idx).filled("-")
aligned2 = np.ma.array(seq2).take(tgt_idx).filled("-")
print(f"Score: {score}\n{''.join(aligned1)}\n{''.join(aligned2)}")
# Score: 2.0
# G-ATTACA
# GCA-TGCA
Details
Precomputed similarity matrix
pynw takes a precomputed (n, m) similarity matrix rather than a scoring
function. This means the entire alignment runs in compiled Rust code with no
Python callbacks, and you can build scores with vectorized NumPy operations
rather than element-wise Python loops.
The trade-off is that you must allocate the full matrix up front, which uses $O(nm)$ memory even when the scoring rule could be expressed more compactly.
Scoring
The total alignment score is the sum of similarity-matrix entries for matched
positions and gap penalties for insertions/deletions. Gap penalties are
typically negative. By default a single gap_penalty applies to both
directions; set insert_penalty and/or delete_penalty to penalise them
independently.
When multiple alignments achieve the same optimal score, pynw breaks ties
deterministically: Align > Delete > Insert.
Edit-distance parameterizations
Needleman-Wunsch can reproduce common metrics with the right similarity-matrix values and gap penalty:
| Metric | S[i,j] match |
S[i,j] mismatch |
gap_penalty |
NW score equals |
|---|---|---|---|---|
| Levenshtein distance | 0 | -1 | -1 | -distance |
| Indel distance | 0 | -2 | -1 | -distance |
| LCS length | 1 | 0 | 0 | lcs_length |
| Hamming distance | 0 | -1 | -(n+1) |
-distance |
For Hamming distance, strings must have equal length.
1""" 2.. include:: ../README.md 3 :start-line: 1 4 :end-before: ## API 5""" 6 7from importlib.metadata import version 8 9from pynw._native import needleman_wunsch 10from pynw._ops import EditOp, alignment_indices 11 12__docformat__ = "numpy" 13__version__ = version("pynw") 14 15__all__ = [ 16 "needleman_wunsch", 17 "EditOp", 18 "alignment_indices", 19]
Align two ordered sequences given a precomputed similarity matrix.
The total alignment score is the sum of similarity-matrix entries for matched positions and gap penalties for insertions/deletions.
Parameters
- similarity_matrix (array_like, shape (n, m)):
similarity_matrix[i, j]is the similarity score for aligning element i of the source sequence with element j of the target sequence. - gap_penalty (float, default -1.0):
Penalty applied when a gap is inserted in either sequence.
Use
insert_penaltyordelete_penaltyto set them independently. - insert_penalty (float, optional):
Penalty for advancing the target sequence without consuming a source
element (gap in source). Defaults to
gap_penalty. - delete_penalty (float, optional):
Penalty for advancing the source sequence without consuming a target
element (gap in target). Defaults to
gap_penalty.
Raises
- ValueError: If
similarity_matrixis not 2-dimensional, or if any value insimilarity_matrixor the gap penalties isNaNorInf.
Returns
- score (float): The optimal alignment score.
- ops (ndarray of uint8, shape (k,)):
Sequence of edit operations describing the alignment. Each element
is of type
EditOp. Usealignment_indicesto reconstruct source and target index arrays.
Examples
Align two DNA sequences using a simple match/mismatch scoring scheme:
>>> import numpy as np
>>> from pynw import alignment_indices
>>> seq1 = np.array(list("GATTACA"))
>>> seq2 = np.array(list("GCATGCA"))
>>> sm = np.where(seq1[:, None] == seq2[None, :], 1.0, -1.0)
>>> score, ops = needleman_wunsch(sm, gap_penalty=-1.0)
>>> score
2.0
>>> src_idx, tgt_idx = alignment_indices(ops)
>>> "".join(np.ma.array(seq1).take(src_idx).filled("-"))
'G-ATTACA'
>>> "".join(np.ma.array(seq2).take(tgt_idx).filled("-"))
'GCA-TGCA'
Notes
When multiple alignments achieve the same optimal score, ties are
broken deterministically: Align > Delete > Insert. This prefers
substitutions over gaps, producing compact alignments. Other tools
may return different co-optimal alignments.
All values in similarity_matrix and the gap penalties must be finite.
18class EditOp(IntEnum): 19 """ 20 Edit operation codes. 21 22 Not guaranteed to be compatible between pynw version. 23 """ 24 25 Align = OP_ALIGN 26 Insert = OP_INSERT 27 Delete = OP_DELETE
Edit operation codes.
Not guaranteed to be compatible between pynw version.
56def alignment_indices( 57 ops: npt.ArrayLike, 58) -> tuple[MaskedIndexArray, MaskedIndexArray]: 59 """Reconstruct source and target indices from an ops array. 60 61 Converts a sequence of edit operations into a pair of masked index arrays. 62 Each array has one entry per alignment position. Positions where the 63 corresponding sequence has a gap are masked out. 64 65 Parameters 66 ---------- 67 ops : array_like of uint8, shape (k,) 68 Edit-operation sequence returned by ``needleman_wunsch``. 69 70 Returns 71 ------- 72 source_idx : masked array of intp, shape (k,) 73 Index into the source sequence at each alignment position. 74 Masked (invalid) at insert positions (gap in source). 75 target_idx : masked array of intp, shape (k,) 76 Index into the target sequence at each alignment position. 77 Masked (invalid) at delete positions (gap in target). 78 79 Raises 80 ------ 81 ValueError 82 If ``ops`` cannot be converted to a 1-D ``uint8`` array, if any 83 element is out of the ``uint8`` range, or if any element is not a 84 valid ``EditOp`` discriminant. 85 86 Examples 87 -------- 88 >>> import numpy as np 89 >>> from pynw import needleman_wunsch, alignment_indices 90 >>> source_seq = list("GAT") 91 >>> target_seq = list("GT") 92 >>> sm = np.where( 93 ... np.array(source_seq)[:, None] == np.array(target_seq)[None, :], 94 ... 1.0, -1.0, 95 ... ) 96 >>> _, ops = needleman_wunsch(sm, gap_penalty=-1.0) 97 >>> src, tgt = alignment_indices(ops) 98 >>> src.tolist() 99 [0, 1, 2] 100 >>> tgt.tolist() 101 [0, None, 1] 102 """ 103 src_idx, src_mask, tgt_idx, tgt_mask = _alignment_indices(ops) 104 return np.ma.array(src_idx, mask=src_mask), np.ma.array(tgt_idx, mask=tgt_mask)
Reconstruct source and target indices from an ops array.
Converts a sequence of edit operations into a pair of masked index arrays. Each array has one entry per alignment position. Positions where the corresponding sequence has a gap are masked out.
Parameters
- ops (array_like of uint8, shape (k,)):
Edit-operation sequence returned by
needleman_wunsch.
Returns
- source_idx (masked array of intp, shape (k,)): Index into the source sequence at each alignment position. Masked (invalid) at insert positions (gap in source).
- target_idx (masked array of intp, shape (k,)): Index into the target sequence at each alignment position. Masked (invalid) at delete positions (gap in target).
Raises
- ValueError: If
opscannot be converted to a 1-Duint8array, if any element is out of theuint8range, or if any element is not a validEditOpdiscriminant.
Examples
>>> import numpy as np
>>> from pynw import needleman_wunsch, alignment_indices
>>> source_seq = list("GAT")
>>> target_seq = list("GT")
>>> sm = np.where(
... np.array(source_seq)[:, None] == np.array(target_seq)[None, :],
... 1.0, -1.0,
... )
>>> _, ops = needleman_wunsch(sm, gap_penalty=-1.0)
>>> src, tgt = alignment_indices(ops)
>>> src.tolist()
[0, 1, 2]
>>> tgt.tolist()
[0, None, 1]