External API
The public API is intentionally small: configure a comparison with ComparisonOptions, compare motifs with compare, and turn the results into a table with to_column_table. These functions are built on a clean-room, black-box reproduction of the CompariMotif algorithm described by Edwards et al. (2008).
CompariMotif compares one motif against another and scores how well their positions overlap. The best overlap is returned as a ComparisonResult. If no significant overlap is found, the matched field of the result is false.
Quick Start
Start by loading the package:
julia> using CompariMotifThen try a single pairwise comparison to see the main workflow. Create ComparisonOptions, call compare, and inspect the returned ComparisonResult.
julia> options = ComparisonOptions(; min_shared_positions = 1, normalized_ic_cutoff = 0.0)ComparisonOptions( alphabet = CompariMotif.ProteinAlphabet() residue_frequencies = nothing min_shared_positions = 1 normalized_ic_cutoff = 0.0 matchfix = :none mismatches = 0 allow_ambiguous_overlap = true max_variants = 10000 )julia> result = compare("RKLI", "R[KR]L[IV]", options)ComparisonResult( query = RKLI search = R[KR]L[IV] normalized_query = RKLI normalized_search = R[RK]L[IV] matched = true query_relationship = Variant Match search_relationship = Degenerate Match matched_pattern = R[rk]L[iv] matched_positions = 4 match_ic = 3.537243573680481 normalized_ic = 1.0 core_ic = 0.8843108934201203 score = 4.0 query_information = 4.0 search_information = 3.537243573680481 )
Interpreting ComparisonResult
Each ComparisonResult summarizes the best accepted overlap between one query motif and one search motif. The relationship fields are directional: query_relationship uses the query as the reference point, while search_relationship describes the same alignment from the search side. That is why Variant and Degenerate, and Parent and Subsequence, normally appear as complementary pairs. For example, one side of a hit can read Degenerate Parent while the other reads Variant Subsequence.
The first word in the relationship label explains how residue choices compare along the overlap: Exact means the aligned residue sets coincide, Variant means the query is narrower, Degenerate means the query is broader, and Complex means the overlap mixes those cases or uses partly overlapping ambiguous classes. The second word explains coverage: Match for full-length coverage on both motifs, Parent when the query contains the search, Subsequence when the query is contained in the search, and Overlap when the best hit uses only part of each motif. The nomenclature used here is identical to that of Edwards et al. (2008), which is nicely summarized in Figure 2.
The score fields then tell you how strong that overlap is. match_ic is the raw information captured by the best alignment, normalized_ic puts that value on a comparable scale across motifs of different specificity, core_ic is the fraction of aligned-core information retained relative to the more informative side at each aligned position, and score combines normalized_ic with matched_positions for ranking. matched_positions counts informative aligned positions only, so any aligned position involving a wildcard on either side does not contribute to that total. matched_pattern is a compact rendering of the winning overlap: uppercase symbols usually mark clean exact agreement, whereas lowercase symbols mark positions broadened by ambiguity or wildcard handling. It is worth noting that the paper does not define a CoreIC score. Accordingly, in this package, core_ic follows the behavior of the upstream CompariMotif oracle.
The same fields become table columns when you call to_column_table. In that tabular form, query and search keep the original motif strings, while normalized_query and normalized_search expose the canonical forms used internally during comparison. Unmatched results keep matched = false, use No Match relationship labels, leave matched_pattern empty, and set matched_positions, all score fields, and both information totals to 0 or 0.0.
You can find more advanced examples of how to use this package in the FAQ / How-To section of the documentation.
Non-Uniform Residue Frequencies
By default, information content uses a uniform residue frequency distribution. To score motifs against a custom background model, pass residue_frequencies = Dict{Char,Float64}(...) when constructing ComparisonOptions.
julia> dna_freqs = Dict('A' => 0.3, 'C' => 0.2, 'G' => 0.2, 'T' => 0.3)Dict{Char, Float64} with 4 entries: 'A' => 0.3 'G' => 0.2 'T' => 0.3 'C' => 0.2julia> weighted = ComparisonOptions(; alphabet = DNAAlphabet(), residue_frequencies = dna_freqs, min_shared_positions = 1, normalized_ic_cutoff = 0.0, )ComparisonOptions( alphabet = CompariMotif.DNAAlphabet() residue_frequencies = Dict( 'A' => 0.29999999999999993, 'C' => 0.19999999999999998, 'G' => 0.19999999999999998, 'T' => 0.29999999999999993 ) min_shared_positions = 1 normalized_ic_cutoff = 0.0 matchfix = :none mismatches = 0 allow_ambiguous_overlap = true max_variants = 10000 )julia> compare("ATG", "[AGT]TG", weighted)ComparisonResult( query = ATG search = [AGT]TG normalized_query = ATG normalized_search = [AGT]TG matched = true query_relationship = Variant Match search_relationship = Degenerate Match matched_pattern = [agt]TG matched_positions = 3 match_ic = 2.190410891970466 normalized_ic = 1.0 core_ic = 0.7558537172605841 score = 3.0 query_information = 2.8979296416098874 search_information = 2.190410891970466 )
Matrix Comparisons
When you have many motifs, you can compare the entire collection in a single call. The result is a square matrix of all-vs-all pairwise comparisons, which can be converted into a column-oriented table for storage or downstream analysis. The table can be easily converted into a DataFrame from the DataFrames package or saved into a comma-separated values (CSV) file with the CSV package.
julia> motifs = ["RKLI", "R[KR]L[IV]", "R.LE"]3-element Vector{String}: "RKLI" "R[KR]L[IV]" "R.LE"julia> results = compare(motifs, options);julia> size(results)(3, 3)julia> table = to_column_table(results)(query_index = [1, 1, 1, 2, 2, 2, 3, 3, 3], search_index = [1, 2, 3, 1, 2, 3, 1, 2, 3], query = ["RKLI", "RKLI", "RKLI", "R[KR]L[IV]", "R[KR]L[IV]", "R[KR]L[IV]", "R.LE", "R.LE", "R.LE"], search = ["RKLI", "R[KR]L[IV]", "R.LE", "RKLI", "R[KR]L[IV]", "R.LE", "RKLI", "R[KR]L[IV]", "R.LE"], normalized_query = ["RKLI", "RKLI", "RKLI", "R[RK]L[IV]", "R[RK]L[IV]", "R[RK]L[IV]", "R.LE", "R.LE", "R.LE"], normalized_search = ["RKLI", "R[RK]L[IV]", "R.LE", "RKLI", "R[RK]L[IV]", "R.LE", "RKLI", "R[RK]L[IV]", "R.LE"], matched = Bool[1, 1, 0, 1, 1, 0, 0, 0, 1], query_relationship = ["Exact Match", "Variant Match", "No Match", "Degenerate Match", "Exact Match", "No Match", "No Match", "No Match", "Exact Match"], search_relationship = ["Exact Match", "Degenerate Match", "No Match", "Variant Match", "Exact Match", "No Match", "No Match", "No Match", "Exact Match"], matched_pattern = ["RKLI", "R[rk]L[iv]", "", "R[rk]L[iv]", "R[RK]L[IV]", "", "", "", "R.LE"], matched_positions = [4, 4, 0, 4, 4, 0, 0, 0, 3], match_ic = [4.0, 3.537243573680481, 0.0, 3.537243573680481, 3.537243573680481, 0.0, 0.0, 0.0, 3.0], normalized_ic = [1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0], core_ic = [1.0, 0.8843108934201203, 0.0, 0.8843108934201203, 1.0, 0.0, 0.0, 0.0, 1.0], score = [4.0, 4.0, 0.0, 4.0, 4.0, 0.0, 0.0, 0.0, 3.0], query_information = [4.0, 4.0, 0.0, 3.537243573680481, 3.537243573680481, 0.0, 0.0, 0.0, 3.0], search_information = [4.0, 3.537243573680481, 0.0, 4.0, 3.537243573680481, 0.0, 0.0, 0.0, 3.0])
When you want to search a single query motif against a database of targets, pass the query as a string and the targets as a vector. This is useful when you already know the motif you want to look up and want to inspect or export all hits against a target set.
julia> query_hits = compare("RKLI", ["RKLI", "R.LE"], options);julia> query_hits[1, 1].matchedtruejulia> query_hits[1, 2].matchedfalse
Reference
The full public API is listed below. Use these docstrings when you need the precise meaning of an option, result field, or helper function.
CompariMotif.ProteinAlphabet — Type
ProteinAlphabetSelect the standard protein alphabet for ComparisonOptions. Allowed residues: ARNDCQEGHILKMFPSTWYV. Use as ProteinAlphabet().
CompariMotif.DNAAlphabet — Type
DNAAlphabetSelect the DNA alphabet for ComparisonOptions. Allowed residues: ACGT. Use as DNAAlphabet().
CompariMotif.RNAAlphabet — Type
RNAAlphabetSelect the RNA alphabet for ComparisonOptions. Allowed residues: ACGU. Use as RNAAlphabet().
CompariMotif.ComparisonOptions — Type
ComparisonOptionsReusable configuration object for CompariMotif comparisons.
Construct once with ComparisonOptions(; kwargs...) and reuse across many compare calls.
Keywords
alphabet = ProteinAlphabet(): comparison alphabet (ProteinAlphabet(),DNAAlphabet(), orRNAAlphabet()).residue_frequencies::Union{Nothing, AbstractDict{Char,<:Real}} = nothing: optional background residue frequencies for information-content scoring. When omitted, CompariMotif uses a uniform frequency distribution. Provided dictionaries must define every residue in the selected alphabet, use strictly positive finite values. Frequencies are normalized internally to sum to1.0.min_shared_positions::Int = 2: minimum number of matched, non-wildcard positions required for a hit.normalized_ic_cutoff::Real = 0.5: minimum normalized information content.matchfix::Symbol = :none: fixed-position matching mode. Accepted values are exactly:none,:query_fixed,:search_fixed, and:both_fixed.mismatches::Int = 0: tolerated count of defined-position mismatches.allow_ambiguous_overlap::Bool = true: whether partial class overlaps are allowed as complex matches.max_variants::Int = 10_000: maximum expanded variants per motif.
Examples
julia> using CompariMotif
julia> options = ComparisonOptions();
julia> options.alphabet isa ProteinAlphabet
true
julia> options.matchfix == :none
trueSee also compare, ComparisonResult.
CompariMotif.ComparisonResult — Type
ComparisonResultResult record produced by compare for one query/search motif pair.
Fields:
query,search: original input motifs.normalized_query,normalized_search: canonicalized motifs used internally.matched: whether the best-scoring valid alignment passed all thresholds.query_relationship,search_relationship: directional two-word labels. The first word describes specificity (Exact,Variant,Degenerate,Complex); the second describes coverage (Match,Parent,Subsequence,Overlap).matched_pattern: compact rendering of the selected overlap; lowercase symbols mark positions broadened by ambiguity or wildcard handling.matched_positions: number of informative aligned positions; any aligned position involving a wildcard on either side is excluded.match_ic: raw information content captured by the selected alignment.normalized_ic:match_icscaled by the less informative of the two motif variants, making hits easier to compare across motif lengths and specificity.core_ic: fraction of aligned-core information retained by the selected alignment (match_ic / sum(max(position_ic(query_i), position_ic(search_i)))across non-dual-wildcard aligned positions). This field is oracle-defined as the paper does not define it.score: derived summary score (normalized_ic * matched_positions).query_information,search_information: total information content for the winning query and search motif variants.
The relationship fields are asymmetric by design, so one side of the same hit can read Variant Subsequence while the other reads Degenerate Parent. When matched == false, the relationship labels are No Match, matched_pattern is empty, and all position/score/information totals stay at their zero defaults.
See also ComparisonOptions, compare, to_column_table.
CompariMotif.compare — Function
compare(a::AbstractString, b::AbstractString, options::ComparisonOptions)::ComparisonResult
compare(motif::AbstractString,
db::AbstractVector{<:AbstractString},
options::ComparisonOptions)::Matrix{ComparisonResult}
compare(motifs::AbstractVector{<:AbstractString},
db::AbstractVector{<:AbstractString},
options::ComparisonOptions)::Matrix{ComparisonResult}
compare(motifs::AbstractVector{<:AbstractString},
options::ComparisonOptions)::Matrix{ComparisonResult}Compare motifs according to the CompariMotif scoring scheme described by Edwards et al. (2008).
- Pairwise mode compares one query motif against one target motif.
- Query-search mode compares one query motif against a database of target motifs.
- Database mode compares one motif collection against another.
- Passing a single motif collection allows for all-vs-all comparison within that collection.
Examples
julia> using CompariMotif
julia> options = ComparisonOptions(; min_shared_positions = 1, normalized_ic_cutoff = 0.0);
julia> result = compare("RKLI", "R[KR]L[IV]", options);
julia> result.matched
trueReferences
- (Edwards et al., 2008) Edwards et al. Bioinformatics 24(10):1307-1309 (2008)
Configure thresholds and matching semantics with ComparisonOptions. The result matrix has size (length(motifs), length(db)). Returns a ComparisonResult. Convert results to column tables with to_column_table.
CompariMotif.to_column_table — Function
to_column_table(results)::NamedTupleConvert comparison results into a column-oriented NamedTuple where each key is a column name and each value is a vector column.
to_column_table(::ComparisonResult)returns a one-row table.to_column_table(::AbstractVector{<:ComparisonResult})addsresult_index.to_column_table(::AbstractMatrix{<:ComparisonResult})addsquery_indexandsearch_indexwith one row per matrix cell in deterministic row-major order.
The returned object can be converted to a DataFrame or written using CSV.write without requiring either dependency in the package itself.
Column names mirror the fields of ComparisonResult. In matrix form, query_index and search_index preserve the original row/column coordinates, and the relationship/score columns can be interpreted exactly as on the underlying result objects.
Examples
julia> using CompariMotif
julia> options = ComparisonOptions(; min_shared_positions = 1, normalized_ic_cutoff = 0.0);
julia> table = to_column_table(compare("RKLI", "R[KR]L[IV]", options));
julia> table.query_relationship[1]
"Variant Match"Compute similarities with compare. Returns a ComparisonResult.
to_column_table(results::AbstractVector{<:ComparisonResult})::NamedTupleConvert a result vector to a column table with result_index.
to_column_table(results::AbstractMatrix{<:ComparisonResult})::NamedTupleConvert a result matrix to a column table with query_index and search_index.