CompariMotif.jl

Clean-room, unofficial Julia implementation of the motif–motif comparison strategy described by Edwards, Davey and Shields (Bioinformatics 24(10):1307–1309, 2008). It supports the comparison of protein, DNA and RNA motifs, represented as regular expressions.

API

  • ComparisonOptions(; kwargs...)
  • compare(a::AbstractString, b::AbstractString, options::ComparisonOptions)::ComparisonResult
  • compare(motifs::AbstractVector{<:AbstractString}, db::AbstractVector{<:AbstractString}, options::ComparisonOptions)::Matrix{ComparisonResult}
  • compare(motifs::AbstractVector{<:AbstractString}, options::ComparisonOptions)::Matrix{ComparisonResult}
  • normalize_motif(motif::AbstractString; alphabet = :protein)::String
  • to_column_table(result_or_results)::NamedTuple

Minimal example

using CompariMotif
using DataFrames

motifs = ["RKLI", "R[KR]L[IV]", "[KR]xLx[FYLIMVP]", "RxLE"]
options = ComparisonOptions(; min_shared_positions = 1, normalized_ic_cutoff = 0.0)
results = compare(motifs, options)

results[3, 4]  # single pair summary
table = to_column_table(results)
df = DataFrame(table)

to_column_table output can also be written with CSV.write("comparimotif_results.tsv", table).

Allowed regex symbols and syntax

Motif parsing supports a controlled regex-like subset.

  • Fixed residues from the selected alphabet:
    • protein (alphabet=:protein, default): ARNDCQEGHILKMFPSTWYV
    • DNA (alphabet=:dna): ACGT
    • RNA (alphabet=:rna): ACGU
  • Wildcards:
    • x, X, and . are equivalent and mean "any residue in the selected alphabet".
  • Character classes:
    • [KR] includes listed residues.
    • [^P] is negation within the selected alphabet only.
  • Anchors:
    • ^ and $ to idicate N- and C-terminus for protein motifs, or 5' and 3' ends for nucleic acid motifs.
  • Repeat quantifiers:
    • {n}, {m,n}.
  • Grouping and alternation:
    • (...) for grouping and | for alternatives (for example A(K|Q)LI).
  • Whitespace:
    • ignored inside motifs.

Official implementation

The official CompariMotif implementation is distributed as part of SLiMSuite: https://github.com/slimsuite/SLiMSuite (tool path: tools/comparimotif_V3.py).

Scope differences compared to the original CompariMotif

This package implements the paper-defined motif comparison core, but it does not aim to replicate the full SLiMSuite application surface. In particular:

  • no standalone CLI interface or SLiMSuite pipeline integration;
  • no raw .tdt compatibility/output mode (use to_column_table for tabular outputs);
  • no Name*/Desc* metadata fields in API results or fixtures (regex motifs only);
  • no XGMML/network export outputs.

Fixtures and oracle regeneration

Oracle fixtures, i.e. expected results for black-box tests, are committed under data/fixtures/ and tests do not call the CompariMotif code directly. Only normalized TSV fixtures are committed rather than the raw .tdt output. To regenerate fixtures see the README.md in data/fixtures/.

Default options parity

Compared against the upstream CompariMotif oracle as a black-box executable (without reading source code), package defaults match:

  • min_shared_positions = 2 (minshare=2)
  • normalized_ic_cutoff = 0.5 (normcut=0.5)
  • matchfix = MatchFixNone (matchfix=0)
  • mismatches = 0
  • allow_ambiguous_overlap = true (overlaps=T)

License hygiene

This repository is MIT-licensed. Implementation is derived from the paper and black-box oracle observations only. GPL CompariMotif source code is not used.

Citation

If you use this Julia pipeline in scientific work, please cite the original algorithm paper:

Public API

CompariMotif.ComparisonOptionsType
ComparisonOptions

Reusable configuration object for CompariMotif comparisons.

Construct once with ComparisonOptions(; kwargs...) and reuse across many compare calls.

Keywords

  • alphabet::Symbol = :protein: comparison alphabet (:protein, :dna, or :rna).
  • min_shared_positions::Int = 2: minimum number of matched, non-wildcard positions required for a hit.
  • normalized_ic_cutoff::Real = 0.5: minimum normalized information content.
  • matchfix::Union{MatchFixMode, Symbol, AbstractString} = MatchFixNone: fixed-position matching mode. Accepted symbol/string aliases are: none, query_fixed (query), search_fixed (search), both_fixed (both).
  • mismatches::Int = 0: tolerated count of defined-position mismatches.
  • allow_ambiguous_overlap::Bool = true: whether partial class overlaps are allowed as complex matches.
  • max_variants::Int = 10_000: maximum expanded variants per motif.

Examples

julia> using CompariMotif

julia> opts = ComparisonOptions(; alphabet = :rna);

julia> String(opts.alphabet)
"ACGU"

See also MatchFixMode, compare, ComparisonResult.

source
CompariMotif.ComparisonOptionsMethod
ComparisonOptions(; kwargs...) -> ComparisonOptions

Construct a reusable options object for motif comparisons.

julia> using CompariMotif

julia> opts = ComparisonOptions(; alphabet = :dna, min_shared_positions = 1);

julia> String(opts.alphabet)
"ACGT"
source
CompariMotif.ComparisonResultType
ComparisonResult

Result record produced by compare for one query/search motif pair.

Fields:

  • query, search: original input motifs.
  • normalized_query, normalized_search: canonicalized motifs used internally.
  • matched: whether the best-scoring valid alignment passed all thresholds.
  • query_relationship, search_relationship: human-readable relationship labels.
  • matched_pattern: consensus/overlap pattern for the selected alignment.
  • matched_positions: count of matched non-wildcard positions.
  • match_ic: total information content for matched positions.
  • normalized_ic: match_ic normalized by the lower motif information content.
  • core_ic: information content normalized by core overlap length.
  • score: derived summary score (normalized_ic * matched_positions).
  • query_information, search_information: total information content per motif.

See also ComparisonOptions, normalize_motif, to_column_table.

source
CompariMotif.MatchFixModeType
MatchFixMode

Fixed-position matching behavior used by CompariMotif:

  • MatchFixNone: no fixed-position requirement.
  • MatchFixQueryFixed: fixed query positions must have exact fixed matches.
  • MatchFixSearchFixed: fixed search positions must have exact fixed matches.
  • MatchFixBothFixed: enforce fixed-position matching on both motifs.

Used by the matchfix keyword in ComparisonOptions.

source
CompariMotif.compareFunction
compare(a::AbstractString, b::AbstractString, options::ComparisonOptions) -> ComparisonResult
compare(motifs::AbstractVector{<:AbstractString},
        db::AbstractVector{<:AbstractString},
        options::ComparisonOptions) -> Matrix{ComparisonResult}
compare(motifs::AbstractVector{<:AbstractString},
        options::ComparisonOptions) -> Matrix{ComparisonResult}

Compare motifs according to the CompariMotif scoring scheme described in Edwards et al. (2008).

  • Pairwise mode compares one query motif against one search motif.
  • Matrix mode computes all pairwise query-vs-database comparisons.
  • All-vs-all mode is a convenience alias for compare(motifs, motifs, options).

Examples

julia> using CompariMotif

julia> options = ComparisonOptions(; min_shared_positions = 1, normalized_ic_cutoff = 0.0);

julia> result = compare("RKLI", "R[KR]L[IV]", options);

julia> result.matched
true

Configure thresholds and matching semantics with ComparisonOptions. The result matrix has size (length(motifs), length(db)). Returns a ComparisonResult. Use normalize_motif for deterministic motif canonicalization. Convert results to column tables with to_column_table.

source
CompariMotif.compareMethod
compare(a::AbstractString, b::AbstractString, options::ComparisonOptions) -> ComparisonResult

Pairwise motif comparison.

source
CompariMotif.compareMethod
compare(motifs, db, options) -> Matrix{ComparisonResult}

Compare all query motifs against all search-database motifs.

source
CompariMotif.normalize_motifMethod
normalize_motif(motif::AbstractString; alphabet::Symbol = :protein) -> String

Parse and canonicalize a motif expression into a deterministic representation. Supported syntax includes fixed residues from the selected alphabet, bracket classes (including negation), x/X/. wildcards, ^/$ termini, and {n}/{m,n} repeat quantifiers. Grouping with (...) and alternation with | are also supported.

Wildcard tokens x, X, and . are equivalent and each means "any residue" in the selected alphabet (:protein, :dna, or :rna).

Examples

julia> using CompariMotif

julia> normalize_motif("r[kR].{0,1}l")
"R[RK]x{0,1}L"

Configure thresholds and matching semantics with ComparisonOptions. Compute similarities with compare.

source
CompariMotif.to_column_tableMethod
to_column_table(results::AbstractMatrix{<:ComparisonResult}) -> NamedTuple

Convert a result matrix to a column table with query_index and search_index.

source
CompariMotif.to_column_tableMethod
to_column_table(results::AbstractVector{<:ComparisonResult}) -> NamedTuple

Convert a result vector to a column table with result_index.

source
CompariMotif.to_column_tableMethod
to_column_table(results) -> NamedTuple

Convert comparison results into a column-oriented NamedTuple where each key is a column name and each value is a vector column.

  • to_column_table(::ComparisonResult) returns a one-row table.
  • to_column_table(::AbstractVector{<:ComparisonResult}) adds result_index.
  • to_column_table(::AbstractMatrix{<:ComparisonResult}) adds query_index and search_index with one row per matrix cell in deterministic row-major order.

The returned object can be converted to a DataFrame or written using CSV.write without requiring either dependency in the package itself.

Examples

julia> using CompariMotif, DataFrames

julia> motifs = ["RKLI", "R[KR]L[IV]"];

julia> options = ComparisonOptions(; min_shared_positions = 1, normalized_ic_cutoff = 0.0);

julia> table = to_column_table(compare(motifs, options));

julia> df = DataFrame(table);

julia> show(select(df, [:query_index, :search_index, :query, :search, :query_relationship]), allrows = true, allcols = true, truncate = 0)
4×5 DataFrame
 Row │ query_index  search_index  query       search      query_relationship
     │ Int64        Int64         String      String      String
─────┼───────────────────────────────────────────────────────────────────────
   1 │           1             1  RKLI        RKLI        Exact Match
   2 │           1             2  RKLI        R[KR]L[IV]  Variant Match
   3 │           2             1  R[KR]L[IV]  RKLI        Degenerate Match
   4 │           2             2  R[KR]L[IV]  R[KR]L[IV]  Exact Match

Compute similarities with compare. Returns a ComparisonResult.

source

Internal API

CompariMotif._canonical_tokenMethod
_canonical_token(position::_Position, options::ComparisonOptions) -> String

Render one parsed position into deterministic canonical motif syntax.

source
CompariMotif._class_maskMethod
_class_mask(raw::AbstractString, options::ComparisonOptions) -> ResidueMask

Parse a bracket class body into a residue mask.

source
CompariMotif._evaluate_alignmentMethod
_evaluate_alignment(query_variant, search_variant, shift, options)

Evaluate one concrete shift between two expanded motif variants. Returns _Candidate when all thresholds pass, otherwise nothing.

source
CompariMotif._expand_variantsMethod
_expand_variants(parsed::_ParsedMotif, options::ComparisonOptions) -> Vector{_MotifVariant}

Expand ranged-repeat motifs into concrete variant sequences.

source
CompariMotif._is_betterMethod
_is_better(candidate::_Candidate, best::Union{Nothing, _Candidate}) -> Bool

Apply deterministic candidate ordering:

  1. higher match_ic, 2) more matched positions, 3) more exact fixed matches.
source
CompariMotif._is_wildcardMethod
_is_wildcard(pos::_Position, options::ComparisonOptions) -> Bool

Return true when pos matches all residues in the selected alphabet.

source
CompariMotif._mask_to_charsMethod
_mask_to_chars(mask::ResidueMask, options::ComparisonOptions; as_lowercase = false) -> Vector{Char}

Materialize residues represented by a mask in canonical alphabet order.

source
CompariMotif._mask_to_symbolMethod
_mask_to_symbol(mask::ResidueMask, options::ComparisonOptions; as_lowercase = false, wildcard_symbol = "x") -> String

Render one residue mask as canonical motif syntax.

source
CompariMotif._match_symbolMethod
_match_symbol(qpos, spos, intersection, relation, mismatch, options) -> String

Render one output symbol for the overlap pattern.

source
CompariMotif._parse_motifMethod
_parse_motif(motif::AbstractString, options::ComparisonOptions) -> _ParsedMotif

Parse one motif string into canonical internal representation.

source
CompariMotif._position_icMethod
_position_ic(pos::_Position, options::ComparisonOptions) -> Float64

Compute information content for one parsed position.

source
CompariMotif.is_fixedMethod
is_fixed(a::ResidueClass) -> Bool

Return true when the residue class contains exactly one residue.

source
CompariMotif.is_wildcardMethod
is_wildcard(a::ResidueClass, opts::ComparisonOptions) -> Bool

Return true when the residue class spans the full selected alphabet.

source
CompariMotif.overlapsMethod
overlaps(a::ResidueClass, b::ResidueClass) -> Bool

Return true when two residue classes share at least one residue.

source