External API

The public API is intentionally small: configure a comparison with ComparisonOptions, compare motifs with compare, and turn the results into a table with to_column_table. These functions are built on a clean-room, black-box reproduction of the CompariMotif algorithm described by Edwards et al. (2008).

CompariMotif compares one motif against another and scores how well their positions overlap. The best overlap is returned as a ComparisonResult. If no significant overlap is found, the matched field of the result is false.

Quick Start

Start by loading the package:

julia> using CompariMotif

Then try a single pairwise comparison to see the main workflow. Create ComparisonOptions, call compare, and inspect the returned ComparisonResult.

julia> options = ComparisonOptions(; min_shared_positions = 1, normalized_ic_cutoff = 0.0)ComparisonOptions(
  alphabet                = CompariMotif.ProteinAlphabet()
  residue_frequencies     = nothing
  min_shared_positions    = 1
  normalized_ic_cutoff    = 0.0
  matchfix                = :none
  mismatches              = 0
  allow_ambiguous_overlap = true
  max_variants            = 10000
)
julia> result = compare("RKLI", "R[KR]L[IV]", options)ComparisonResult(
  query               = RKLI
  search              = R[KR]L[IV]
  normalized_query    = RKLI
  normalized_search   = R[RK]L[IV]
  matched             = true
  query_relationship  = Variant Match
  search_relationship = Degenerate Match
  matched_pattern     = R[rk]L[iv]
  matched_positions   = 4
  match_ic            = 3.537243573680481
  normalized_ic       = 1.0
  core_ic             = 0.8843108934201203
  score               = 4.0
  query_information   = 4.0
  search_information  = 3.537243573680481
)

Interpreting `ComparisonResult`

Each ComparisonResult summarizes the best accepted overlap between one query motif and one search motif. The relationship fields are directional: query_relationship uses the query as the reference point, while search_relationship describes the same alignment from the search side. That is why Variant and Degenerate, and Parent and Subsequence, normally appear as complementary pairs. For example, one side of a hit can read Degenerate Parent while the other reads Variant Subsequence.

The first word in the relationship label explains how residue choices compare along the overlap: Exact means the aligned residue sets coincide, Variant means the query is narrower, Degenerate means the query is broader, and Complex means the overlap mixes those cases or uses partly overlapping ambiguous classes. The second word explains coverage: Match for full-length coverage on both motifs, Parent when the query contains the search, Subsequence when the query is contained in the search, and Overlap when the best hit uses only part of each motif. The nomenclature used here is identical to that of Edwards et al. (2008), which is nicely summarized in Figure 2.

The score fields then tell you how strong that overlap is. match_ic is the raw information captured by the best alignment, normalized_ic puts that value on a comparable scale across motifs of different specificity, core_ic is the fraction of aligned-core information retained relative to the more informative side at each aligned position, and score combines normalized_ic with matched_positions for ranking. matched_positions counts informative aligned positions only, so any aligned position involving a wildcard on either side does not contribute to that total. matched_pattern is a compact rendering of the winning overlap: uppercase symbols usually mark clean exact agreement, whereas lowercase symbols mark positions broadened by ambiguity or wildcard handling. It is worth noting that the paper does not define a CoreIC score. Accordingly, in this package, core_ic follows the behavior of the upstream CompariMotif oracle.

The same fields become table columns when you call to_column_table. In that tabular form, query and search keep the original motif strings, while normalized_query and normalized_search expose the canonical forms used internally during comparison. Unmatched results keep matched = false, use No Match relationship labels, leave matched_pattern empty, and set matched_positions, all score fields, and both information totals to 0 or 0.0.

You can find more advanced examples of how to use this package in the FAQ / How-To section of the documentation.

Non-Uniform Residue Frequencies

By default, information content uses a uniform residue frequency distribution. To score motifs against a custom background model, pass residue_frequencies = Dict{Char,Float64}(...) when constructing ComparisonOptions.

julia> dna_freqs = Dict('A' => 0.3, 'C' => 0.2, 'G' => 0.2, 'T' => 0.3)Dict{Char, Float64} with 4 entries:
  'A' => 0.3
  'G' => 0.2
  'T' => 0.3
  'C' => 0.2
julia> weighted = ComparisonOptions(;
           alphabet = DNAAlphabet(),
           residue_frequencies = dna_freqs,
           min_shared_positions = 1,
           normalized_ic_cutoff = 0.0,
       )ComparisonOptions(
  alphabet                = CompariMotif.DNAAlphabet()
  residue_frequencies     = Dict( 'A' => 0.29999999999999993, 'C' => 0.19999999999999998, 'G' => 0.19999999999999998, 'T' => 0.29999999999999993 )
  min_shared_positions    = 1
  normalized_ic_cutoff    = 0.0
  matchfix                = :none
  mismatches              = 0
  allow_ambiguous_overlap = true
  max_variants            = 10000
)
julia> compare("ATG", "[AGT]TG", weighted)ComparisonResult(
  query               = ATG
  search              = [AGT]TG
  normalized_query    = ATG
  normalized_search   = [AGT]TG
  matched             = true
  query_relationship  = Variant Match
  search_relationship = Degenerate Match
  matched_pattern     = [agt]TG
  matched_positions   = 3
  match_ic            = 2.190410891970466
  normalized_ic       = 1.0
  core_ic             = 0.7558537172605841
  score               = 3.0
  query_information   = 2.8979296416098874
  search_information  = 2.190410891970466
)

Matrix Comparisons

When you have many motifs, you can compare the entire collection in a single call. The result is a square matrix of all-vs-all pairwise comparisons, which can be converted into a column-oriented table for storage or downstream analysis. The table can be easily converted into a DataFrame from the DataFrames package or saved into a comma-separated values (CSV) file with the CSV package.

julia> motifs = ["RKLI", "R[KR]L[IV]", "R.LE"]3-element Vector{String}:
 "RKLI"
 "R[KR]L[IV]"
 "R.LE"
julia> results = compare(motifs, options);
julia> size(results)(3, 3)
julia> table = to_column_table(results)(query_index = [1, 1, 1, 2, 2, 2, 3, 3, 3], search_index = [1, 2, 3, 1, 2, 3, 1, 2, 3], query = ["RKLI", "RKLI", "RKLI", "R[KR]L[IV]", "R[KR]L[IV]", "R[KR]L[IV]", "R.LE", "R.LE", "R.LE"], search = ["RKLI", "R[KR]L[IV]", "R.LE", "RKLI", "R[KR]L[IV]", "R.LE", "RKLI", "R[KR]L[IV]", "R.LE"], normalized_query = ["RKLI", "RKLI", "RKLI", "R[RK]L[IV]", "R[RK]L[IV]", "R[RK]L[IV]", "R.LE", "R.LE", "R.LE"], normalized_search = ["RKLI", "R[RK]L[IV]", "R.LE", "RKLI", "R[RK]L[IV]", "R.LE", "RKLI", "R[RK]L[IV]", "R.LE"], matched = Bool[1, 1, 0, 1, 1, 0, 0, 0, 1], query_relationship = ["Exact Match", "Variant Match", "No Match", "Degenerate Match", "Exact Match", "No Match", "No Match", "No Match", "Exact Match"], search_relationship = ["Exact Match", "Degenerate Match", "No Match", "Variant Match", "Exact Match", "No Match", "No Match", "No Match", "Exact Match"], matched_pattern = ["RKLI", "R[rk]L[iv]", "", "R[rk]L[iv]", "R[RK]L[IV]", "", "", "", "R.LE"], matched_positions = [4, 4, 0, 4, 4, 0, 0, 0, 3], match_ic = [4.0, 3.537243573680481, 0.0, 3.537243573680481, 3.537243573680481, 0.0, 0.0, 0.0, 3.0], normalized_ic = [1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0], core_ic = [1.0, 0.8843108934201203, 0.0, 0.8843108934201203, 1.0, 0.0, 0.0, 0.0, 1.0], score = [4.0, 4.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 3.0], query_information = [4.0, 4.0, 0.0, 3.537243573680481, 3.537243573680481, 0.0, 0.0, 0.0, 3.0], search_information = [4.0, 3.537243573680481, 0.0, 4.0, 3.537243573680481, 0.0, 0.0, 0.0, 3.0])

When you want to search a single query motif against a database of targets, pass the query as a string and the targets as a vector. This is useful when you already know the motif you want to look up and want to inspect or export all hits against a target set.

julia> query_hits = compare("RKLI", ["RKLI", "R.LE"], options);
julia> query_hits[1, 1].matchedtrue
julia> query_hits[1, 2].matchedfalse

Reference

The full public API is listed below. Use these docstrings when you need the precise meaning of an option, result field, or helper function.

CompariMotif.ProteinAlphabet — Type

ProteinAlphabet

Select the standard protein alphabet for ComparisonOptions. Allowed residues: ARNDCQEGHILKMFPSTWYV. Use as ProteinAlphabet().

source

CompariMotif.DNAAlphabet — Type

DNAAlphabet

Select the DNA alphabet for ComparisonOptions. Allowed residues: ACGT. Use as DNAAlphabet().

source

CompariMotif.RNAAlphabet — Type

RNAAlphabet

Select the RNA alphabet for ComparisonOptions. Allowed residues: ACGU. Use as RNAAlphabet().

source

CompariMotif.ComparisonOptions — Type

ComparisonOptions

Reusable configuration object for CompariMotif comparisons.

Construct once with ComparisonOptions(; kwargs...) and reuse across many compare calls.

Keywords

alphabet = ProteinAlphabet(): comparison alphabet (ProteinAlphabet(), DNAAlphabet(), or RNAAlphabet()).
residue_frequencies::Union{Nothing, AbstractDict{Char,<:Real}} = nothing: optional background residue frequencies for information-content scoring. When omitted, CompariMotif uses a uniform frequency distribution. Provided dictionaries must define every residue in the selected alphabet, use strictly positive finite values. Frequencies are normalized internally to sum to 1.0.
min_shared_positions::Int = 2: minimum number of matched, non-wildcard positions required for a hit.
normalized_ic_cutoff::Real = 0.5: minimum normalized information content.
matchfix::Symbol = :none: fixed-position matching mode. Accepted values are exactly :none, :query_fixed, :search_fixed, and :both_fixed.
mismatches::Int = 0: tolerated count of defined-position mismatches.
allow_ambiguous_overlap::Bool = true: whether partial class overlaps are allowed as complex matches.
max_variants::Int = 10_000: maximum expanded variants per motif.

Examples

julia> using CompariMotif

julia> options = ComparisonOptions();

julia> options.alphabet isa ProteinAlphabet
true

julia> options.matchfix == :none
true

See also compare, ComparisonResult.

source

CompariMotif.ComparisonResult — Type

ComparisonResult

Result record produced by compare for one query/search motif pair.

Fields:

query, search: original input motifs.
normalized_query, normalized_search: canonicalized motifs used internally.
matched: whether the best-scoring valid alignment passed all thresholds.
query_relationship, search_relationship: directional two-word labels. The first word describes specificity (Exact, Variant, Degenerate, Complex); the second describes coverage (Match, Parent, Subsequence, Overlap).
matched_pattern: compact rendering of the selected overlap; lowercase symbols mark positions broadened by ambiguity or wildcard handling.
matched_positions: number of informative aligned positions; any aligned position involving a wildcard on either side is excluded.
match_ic: raw information content captured by the selected alignment.
normalized_ic: match_ic scaled by the less informative of the two motif variants, making hits easier to compare across motif lengths and specificity.
core_ic: fraction of aligned-core information retained by the selected alignment (match_ic / sum(max(position_ic(query_i), position_ic(search_i))) across non-dual-wildcard aligned positions). This field is oracle-defined as the paper does not define it.
score: derived summary score (normalized_ic * matched_positions).
query_information, search_information: total information content for the winning query and search motif variants.

The relationship fields are asymmetric by design, so one side of the same hit can read Variant Subsequence while the other reads Degenerate Parent. When matched == false, the relationship labels are No Match, matched_pattern is empty, and all position/score/information totals stay at their zero defaults.

source

CompariMotif.compare — Function

compare(a::AbstractString, b::AbstractString, options::ComparisonOptions)::ComparisonResult
compare(motif::AbstractString,
        db::AbstractVector{<:AbstractString},
        options::ComparisonOptions)::Matrix{ComparisonResult}
compare(motifs::AbstractVector{<:AbstractString},
        db::AbstractVector{<:AbstractString},
        options::ComparisonOptions)::Matrix{ComparisonResult}
compare(motifs::AbstractVector{<:AbstractString},
        options::ComparisonOptions)::Matrix{ComparisonResult}

Compare motifs according to the CompariMotif scoring scheme described by Edwards et al. (2008).

Pairwise mode compares one query motif against one target motif.
Query-search mode compares one query motif against a database of target motifs.
Database mode compares one motif collection against another.
Passing a single motif collection allows for all-vs-all comparison within that collection.

Examples

julia> using CompariMotif

julia> options = ComparisonOptions(; min_shared_positions = 1, normalized_ic_cutoff = 0.0);

julia> result = compare("RKLI", "R[KR]L[IV]", options);

julia> result.matched
true

References

(Edwards et al., 2008) Edwards et al. Bioinformatics 24(10):1307-1309 (2008)

Configure thresholds and matching semantics with ComparisonOptions. The result matrix has size (length(motifs), length(db)). Returns a ComparisonResult. Convert results to column tables with to_column_table.

source

CompariMotif.to_column_table — Function

to_column_table(results)::NamedTuple

Convert comparison results into a column-oriented NamedTuple where each key is a column name and each value is a vector column.

to_column_table(::ComparisonResult) returns a one-row table.
to_column_table(::AbstractVector{<:ComparisonResult}) adds result_index.
to_column_table(::AbstractMatrix{<:ComparisonResult}) adds query_index and search_index with one row per matrix cell in deterministic row-major order.

The returned object can be converted to a DataFrame or written using CSV.write without requiring either dependency in the package itself.

Column names mirror the fields of ComparisonResult. In matrix form, query_index and search_index preserve the original row/column coordinates, and the relationship/score columns can be interpreted exactly as on the underlying result objects.

Examples

julia> using CompariMotif

julia> options = ComparisonOptions(; min_shared_positions = 1, normalized_ic_cutoff = 0.0);

julia> table = to_column_table(compare("RKLI", "R[KR]L[IV]", options));

julia> table.query_relationship[1]
"Variant Match"

Compute similarities with compare. Returns a ComparisonResult.

source

to_column_table(results::AbstractVector{<:ComparisonResult})::NamedTuple

Convert a result vector to a column table with result_index.

source

to_column_table(results::AbstractMatrix{<:ComparisonResult})::NamedTuple

Convert a result matrix to a column table with query_index and search_index.

source

External API

Quick Start

Interpreting ComparisonResult

Non-Uniform Residue Frequencies

Matrix Comparisons

Reference

Interpreting `ComparisonResult`