Internal API & Pipeline
This page documents the current implementation pipeline. Everything here is private and may change between releases; the stable package contract is the External API.
The running example below mirrors the worked comparison in Figure 1 of Edwards et al. (2008).
Pipeline Overview
The current code path for a pairwise comparison is:
_parse_motifstrips whitespace, expands grouping and alternation, parses tokens, and produces a canonical normalized motif string._expand_variantsresolves bounded repeat ranges into concrete motif variants with precomputed information content._find_precise_matchchecks exact full-length and exact subsequence relationships first to seed the best candidate before the full overlap search._evaluate_alignmentscores each overlap candidate considered across the expanded variant pairs and relative shifts._compare_parsedkeeps the best candidate and materializes the publicComparisonResult.
Figure 1 Worked Example
1. Parse and normalize the motifs
Parsing is the syntax-level normalization step. It canonicalizes residue classes and wildcard notation, records any alternation branches, and preserves bounded repeats in normalized form so later stages can expand them deliberately. Wildcard aliases are canonicalized to . even where the current oracle has a grouped-alternation quirk, because the package treats x, X, and . as intentionally equivalent syntax. Positive character classes are also treated as sets, so duplicate residues are discarded even though the oracle can score them differently.
julia> options = ComparisonOptions(; min_shared_positions = 1, normalized_ic_cutoff = 0.0);julia> parsed_query = CompariMotif._parse_motif("[KR].L.{0,1}[FYLIVMP]", options);julia> parsed_search = CompariMotif._parse_motif("R.LE", options);julia> parsed_query.normalized"[RK].L.{0,1}[ILMFPYV]"julia> parsed_search.normalized"R.LE"
2. Expand concrete variants
Variant expansion converts each parsed branch into one or more concrete motif variants with explicit positions and precomputed information content. This is the stage where repeat ranges become enumerated sequences, so all downstream alignment and scoring logic works with concrete variant objects rather than quantified syntax.
julia> spec = CompariMotif._alphabet_spec(options.alphabet);julia> query_variants = CompariMotif._expand_variants(parsed_query, options, spec);julia> search_variants = CompariMotif._expand_variants(parsed_search, options, spec);julia> [variant.normalized for variant in query_variants]2-element Vector{String}: "[RK].L[ILMFPYV]" "[RK].L.[ILMFPYV]"julia> round.([variant.information for variant in query_variants], digits = 3)2-element Vector{Float64}: 2.119 2.119julia> only(search_variants).normalized"R.LE"julia> round(only(search_variants).information, digits = 3)3.0
3. Check precise matches before overlap scoring
The precise-match pass looks for exact full-length and exact subsequence relationships among the expanded variants before the broader overlap search runs. The current implementation does not add a separate check for whether two motifs contain enough shared amino acids in any position to merit further comparison; after the exact-match pass, it proceeds directly to the sliding-window overlap search. Any exact hit seeds the current best candidate, but it still does not short-circuit later evaluation of other overlaps. We have tried implementing that, as it was suggested in the paper, but it did not improve performance in practice.
julia> found_precise, best_precise = CompariMotif._find_precise_match(query_variants, search_variants, options, spec);julia> found_precisefalse
4. Score the best overlap
Alignment scoring evaluates one query variant against one search variant at a specific relative shift. Each candidate carries the matched pattern, matched positions, relationship labels, and the information-content-derived metrics used for ranking. The current implementation orders candidates by higher match_ic, then matched_positions, then score. If all of those still tie, the first candidate encountered in the shift scan inferred from black-box oracle tie cases is kept. normalized_ic, core_ic, and score are still materialized on the candidate for inspection and output.
julia> query_variant = query_variants[2];julia> search_variant = only(search_variants);julia> candidate = CompariMotif._evaluate_alignment(query_variant, search_variant, 0, options, spec);julia> candidate.matched_pattern"[rk].Le"julia> candidate.matched_positions2julia> round(candidate.normalized_ic, digits = 3)0.835
5. Materialize the public result
After all precise matches and overlap candidates have been considered, _compare_parsed keeps the strongest candidate and materializes it as the public ComparisonResult. This final step copies the winning alignment's relationships, matched pattern, and information-content summary into the stable API object returned by compare.
julia> result = compare("[KR].L.{0,1}[FYLIVMP]", "R.LE", options);julia> (result.query_relationship, result.search_relationship)("Degenerate Parent", "Variant Subsequence")julia> round(result.match_ic, digits = 3)1.769julia> round(result.score, digits = 3)1.669
Internal Reference
CompariMotif._ParsedMotif — Type
_ParsedMotifInternal parsed representation of one user-supplied motif.
Fields:
original: motif text exactly as supplied by the caller.normalized: canonical motif text used for deterministic comparisons.tokens: token sequence for the first parsed branch.alternatives: token sequence for every expanded top-level alternation branch.
CompariMotif._MotifVariant — Type
_MotifVariantConcrete motif variant obtained after expanding bounded repeat ranges.
Fields:
positions: fixed sequence of parsed positions used during alignment.normalized: canonical motif text for this expanded variant.information: total information content of the variant.
CompariMotif._parse_motif — Function
_parse_motif(motif::AbstractString, options::ComparisonOptions)::_ParsedMotifParse one motif string into canonical internal representation.
CompariMotif._expand_variants — Function
_expand_variants(parsed::_ParsedMotif, options::ComparisonOptions, spec::_AlphabetSpec)::Vector{_MotifVariant}Expand ranged-repeat motifs into concrete variant sequences.
CompariMotif._find_precise_match — Function
_find_precise_match(query_variants, search_variants, options, spec)Search only exact same / exact-subsequence relationships. Returns (found_precise, best_candidate).
CompariMotif._evaluate_alignment — Function
_evaluate_alignment(query_variant, search_variant, shift, options, spec)Evaluate one concrete shift between two expanded motif variants. Returns _Candidate when all thresholds pass, otherwise nothing.
CompariMotif._compare_parsed — Function
_compare_parsed(parsed_query, parsed_search, options)::ComparisonResultCompare two already-parsed motifs.