Information
MIToS.Information
— ModuleThe Information
module of MIToS defines types and functions useful to calculate information measures (e.g. Mutual Information (MI) and Entropy) over a Multiple Sequence Alignment (MSA). This module was designed to count Residue
s (defined in the MSA
module) in special contingency tables (as fast as possible) and to derive probabilities from this counts. Also, includes methods for applying corrections to that tables, e.g. pseudocounts and pseudo frequencies. Finally, Information
allows to use this probabilities and counts to estimate information measures and other frequency based values.
Features
- Estimate multi dimensional frequencies and probabilities tables from sequences, MSAs, etc...
- Correction for small number of observations
- Correction for data redundancy on a MSA
- Estimate information measures
- Calculate corrected mutual information between residues
using MIToS.Information
Contents
Types
MIToS.Information.AdditiveSmoothing
— TypeAdditive Smoothing or fixed pseudocount λ
for ResidueCount
(in order to estimate probabilities when the number of samples is low).
Common values of λ
are:
0
: No cell frequency prior, gives you the maximum likelihood estimator.0.05
is the optimum value forλ
found in Buslje et al. 2009, similar results was obtained forλ
in the range [0.025, 0.075].1 / p
: Perks prior (Perks, 1947) wherep
the number of parameters (i.e. residues, pairs of residues) to estimate. Ifp
is the number of residues (20
without counting gaps), this gives you0.05
.sqrt(n) / p
: Minimax prior (Trybula, 1958) wheren
is the number of samples andp
the number of parameters to estimate. If the number of samplesn
is 400 (minimum number of sequence clusters for achieve good performance in Buslje et al. 2009) for estimating 400 parameters (pairs of residues without counting gaps) this gives you0.05
.0.5
: Jeffreys prior (Jeffreys, 1946).1
: Bayes-Laplace uniform prior, aka. Laplace smoothing.
References
- Buslje, Cristina Marino, et al. "Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information." Bioinformatics 25.9 (2009): 1125-1131.
- Perks, Wilfred. "Some observations on inverse probability including a new indifference rule." Journal of the Institute of Actuaries 73.2 (1947): 285-334.
- Trybula, Stanislaw. "Some problems of simultaneous minimax estimation." The Annals of Mathematical Statistics 29.1 (1958): 245-253.
- Jeffreys, Harold. "An invariant form for the prior probability in estimation problems." Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences 186.1007 (1946): 453-461.
MIToS.Information.BLOSUM_Pseudofrequencies
— TypeBLOSUM_Pseudofrequencies
type. It takes to arguments/fields:
α
: Usually the number of sequences or sequence clusters in the MSA.β
: The weight of the pseudofrequencies, a value close to 8.512 whenα
is the number of sequence clusters.
MIToS.Information.ContingencyTable
— TypeA ContingencyTable
is a multidimensional array. It stores the contingency matrix, its marginal values and total. The type also has an internal and private temporal array and an alphabet object. It's a parametric type, taking three ordered parameters:
T
: The element type of the multidimensional array.N
: It's the dimension of the array and should be anInt
.A
: This should be a type, subtype ofResidueAlphabet
, i.e.:UngappedAlphabet
,GappedAlphabet
orReducedAlphabet
.
A ContingencyTable
can be created from an alphabet if all the parameters are given. Otherwise, you need to give a type, a number (Val
) and an alphabet. You can also create a ContingencyTable
using a matrix and a alphabet. For example:
ContingencyTable{Float64,2,UngappedAlphabet}(UngappedAlphabet())
ContingencyTable(Float64, Val{2}, UngappedAlphabet())
ContingencyTable(zeros(Float64, 20, 20), UngappedAlphabet())
MIToS.Information.Frequencies
— TypeA Frequencies
object wraps a ContingencyTable
storing counts/frequencies.
MIToS.Information.NoPseudocount
— TypeYou can use NoPseudocount()
to avoid pseudocount corrections where a Pseudocount
type is needed.
MIToS.Information.NoPseudofrequencies
— TypeYou can use NoPseudofrequencies()
to avoid pseudocount corrections where a Pseudofrequencies
type is needed.
MIToS.Information.Probabilities
— TypeA Probabilities
object wraps a ContingencyTable
storing probabilities. It doesn't perform any check. If the total isn't one, you must use normalize
or normalize!
on the ContingencyTable
before wrapping it to make the sum of the probabilities equal to one.
MIToS.Information.Pseudocount
— TypeParametric abstract type to define pseudocount types
MIToS.Information.Pseudofrequencies
— TypeParametric abstract type to define pseudofrequencies types
Constants
MIToS.Information.BLOSUM62_Pi
— ConstantBLOSUM62 probabilities P(aa) for each residue on the UngappedAlphabet
. SUM: 0.9987
MIToS.Information.BLOSUM62_Pij
— ConstantTable with conditional probabilities of residues based on BLOSUM62. The normalization is done row based. The firts row contains the P(aa|A) and so one.
Macros
Methods and functions
Base.count!
— MethodIt populates a ContingencyTable
(first argument) using the frequencies in the sequences (last positional arguments). The dimension of the table must match the number of sequences and all the sequences must have the same length. You must indicate the used weights and pseudocounts as second and third positional arguments respectively. You can use NoPseudofrequencies()
and NoClustering()
to avoid the use of sequence weighting and pseudocounts, respectively.
DEPRECATED: Use frequencies!
instead. Note that frequencies!
defines the weigths and pseudocounts using keyword arguments instead of positional arguments.
Base.count
— MethodIt returns a ContingencyTable
wrapped in a Frequencies
type with the frequencies of residues in the sequences that takes as arguments. The dimension of the table is equal to the number of sequences. You can use the keyword arguments alphabet
, weights
and pseudocounts
to indicate the alphabet of the table (default to UngappedAlphabet()
), a clustering result (default to NoClustering()
) and the pseudocounts (default to NoPseudocount()
) to be used during the estimation of the frequencies.
DEPRECATED: Use frequencies
instead. Note that frequencies
defines the alphabet, weigths and pseudocounts using keyword arguments instead of positional arguments.
LinearAlgebra.normalize!
— Methodnormalize!
makes the sum of the frequencies to be one, in place.
LinearAlgebra.normalize
— Methodnormalize
returns another table where the sum of the frequencies is one.
MIToS.Information.APC!
— MethodMIToS.Information.BLMI
— MethodBLMI
takes an MSA and calculates a Z score (ZBLMI) and a corrected MI/MIp as described on Busjle et al. 2009 but using using BLOSUM62 pseudo frequencies instead of a fixed pseudocount.
Keyword argument, type, default value and descriptions:
- beta Float64 8.512 β for BLOSUM62 pseudo frequencies
- lambda Float64 0.0 Low count value
- threshold 62 Percent identity threshold for sequence clustering (Hobohm I)
- maxgap Float64 0.5 Maximum fraction of gaps in positions included in calculation
- apc Bool true Use APC correction (MIp)
- samples Int 50 Number of samples for Z-score
- fixedgaps Bool true Fix gaps positions for the random samples
This function returns:
- Z score (ZBLMI)
- MI or MIp using BLOSUM62 pseudo frequencies (BLMI/BLMIp)
References
MIToS.Information.apply_pseudocount!
— MethodIt adds the pseudocount
value to the table cells.
MIToS.Information.apply_pseudofrequencies!
— Methodapply_pseudofrequencies!{T}(Pab::ContingencyTable{T,2,UngappedAlphabet}, pseudofrequencies::BLOSUM_Pseudofrequencies)
When a BLOSUM_Pseudofrequencies(α,β)
is used, this function applies pseudofrequencies Gab
over Pab
, as a weighted mean of both. It uses the conditional probability matrix BLOSUM62_Pij
and the real frequencies/probabilities Pab
to estimate the pseudofrequencies Gab
. α is the weight of the real frequencies Pab
and β the weight of the pseudofrequencies.
Gab = Σcd Pcd ⋅ BLOSUM62( a | c ) ⋅ BLOSUM62( b | d )
Pab = (α ⋅ Pab + β ⋅ Gab )/(α + β)
MIToS.Information.buslje09
— Methodbuslje09
takes a MSA and calculates a Z score and a corrected MI/MIp as described on Busjle et al. 2009.
keyword argument, type, default value and descriptions:
- lambda Float64 0.05 Low count value
- clustering Bool true Sequence clustering (Hobohm I)
- threshold 62 Percent identity threshold for clustering
- maxgap Float64 0.5 Maximum fraction of gaps in positions included in calculation
- apc Bool true Use APC correction (MIp)
- samples Int 100 Number of samples for Z-score
- fixedgaps Bool true Fix gaps positions for the random samples
- alphabet ResidueAlphabet UngappedAlphabet() Residue alphabet to be used
This function returns:
- Z score
- MI or MIp
MIToS.Information.cumulative
— Methodcumulative
allows to calculate cumulative scores (i.e. cMI) as defined in Marino Buslje et al. 2010:
"We calculated a cumulative mutual information score (cMI) for each residue as the sum of MI values above a certain threshold for every amino acid pair where the particular residue appears. This value defines to what degree a given amino acid takes part in a mutual information network."
References
MIToS.Information.delete_dimensions!
— Methoddelete_dimensions!(out::ContingencyTable, in::ContingencyTable, dimensions::Int...)
This function fills a ContingencyTable with the counts/probabilities on in
after the deletion of dimensions
. i.e. This is useful for getting Pxy from Pxyz.
MIToS.Information.delete_dimensions
— Methoddelete_dimensions(in::ContingencyTable, dimensions::Int...)
This function creates a ContingencyTable with the counts/probabilities on in
after the deletion of dimensions
. i.e. This is useful for getting Pxy from Pxyz.
MIToS.Information.frequencies!
— Methodfrequencies!(table, seqs...; weights::WeightTypes, pseudocounts::Pseudocount)
It populates a ContingencyTable
or Frequencies
table (first argument) using the frequencies in the given sequences (last positional arguments). The dimension of the table must match the number of sequences and all the sequences must have the same length. You must indicate the used weights
and pseudocounts
as keyword arguments. Those arguments default to NoClustering()
and NoPseudocount()
respectively, to avoid the use of sequence weighting and pseudocounts.
MIToS.Information.frequencies
— Methodfrequencies(seqs...; alphabet=UngappedAlphabet(), weights=NoClustering(), pseudocounts=NoPseudocount()
This function returns a Frequencies
object wrapping a ContingencyTable
with the frequencies of residues in the sequences that takes as arguments. The dimension of the table is equal to the number of sequences. You can use the keyword arguments alphabet
, weights
and pseudocounts
to indicate the alphabet of the table, a clustering result and the pseudocounts to be used during the estimation of the frequencies.
MIToS.Information.gap_intersection_percentage
— MethodIt calculates the gap intersection as percentage from a table of Frequencies
.
MIToS.Information.gap_union_percentage
— MethodIt calculates the gap union as percentage from a table of Frequencies
.
MIToS.Information.gaussdca
— MethodWrapper function to GaussDCA.gDCA
. You need to install GaussDCA:
using Pkg
Pkg.add(PackageSpec(url = "https://github.com/carlobaldassi/GaussDCA.jl", rev = "master"))
Look into GaussDCA.jl README for further information. If you use this wrapper, please cite the GaussDCA publication and the package's doi.
It's possible to indicate the path to the julia binary where GaussDCA is installed. However, it's recommended to use the same version where MIToS is installed. That is because this function use serialize
/deserialize
to transfer data between the processes.
GaussDCA Publication: Baldassi, Carlo, Marco Zamparo, Christoph Feinauer, Andrea Procaccini, Riccardo Zecchina, Martin Weigt, and Andrea Pagnani. "Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners." PloS one 9, no. 3 (2014): e92721.
MIToS.Information.getalphabet
— Methodgetalphabet
allows to access the stored alphabet object.
MIToS.Information.getcontingencytable
— Methodgetcontingencytable
allows to access the wrapped ContingencyTable
in a Probabilities
or Frequencies
object.
MIToS.Information.getmarginals
— Methodgetmarginals
allows to access the array with the marginal values (NamedArray
).
MIToS.Information.getmarginalsarray
— Methodgetmarginalsarray
allows to access the array with the marginal values (Array
without names).
MIToS.Information.gettable
— Methodgettable
allows to access the table (NamedArray
).
MIToS.Information.gettablearray
— Methodgettablearray
allows to access the table (Array
without names).
MIToS.Information.gettotal
— Methodgettotal
allows to access the stored total value.
MIToS.Information.kullback_leibler
— Methodkullback_leibler(msa::AbstractArray{Residue}; background::Union{Array{T,N}, Probabilities{T,N,A}, ContingencyTable{T,N,A}}=BLOSUM62_Pi, base::Number=ℯ, kargs...)
It calculates the Kullback-Leibler (KL) divergence from a multiple sequence alignment (MSA). You can use the keyword argument background
to set the background distribution. This argument can take an Array
, Probabilities
, or ContingencyTable
object. The background distribution must have the same size and alphabet as the probabilities. The default is the BLOSUM62_Pi
table. The default base for the log is ℯ (base=ℯ
), so the result is in nats. You can use base = 2
to get the result in bits.
The other keyword arguments are passed to the mapfreq
function.
MIToS.Information.kullback_leibler
— Methodkullback_leibler(probabilities::Probabilities{T,N,A}, background::Union{
AbstractArray{T,N}, Probabilities{T,N,A}, ContingencyTable{T,N,A}}=BLOSUM62_Pi,
base::Number=ℯ)
It calculates the Kullback-Leibler (KL) divergence from a table of Probabilities
. You can use the keyword argument background
to set the background distribution. This argument can take an Array
, Probabilities
, or ContingencyTable
object. The background distribution must have the same size and alphabet as the probabilities. The default is the BLOSUM62_Pi
table. The default base for the log is ℯ (base=ℯ
), so the result is in nats. You can use base = 2
to get the result in bits.
MIToS.Information.mapcolfreq!
— MethodIt efficiently map a function (first argument) that takes a table of Frequencies
or Probabilities
(third argument). The table is filled in place with the counts or probabilities of each column from the msa
(second argument).
weights
(default:NoClustering()
): Weights to be used for table counting.pseudocounts
(default:NoPseudocount()
):Pseudocount
object to be applied to table.pseudofrequencies
(default:NoPseudofrequencies()
):Pseudofrequencies
to be applied to the normalized (probabilities) table.
MIToS.Information.mapcolpairfreq!
— MethodIt efficiently map a function (first argument) that takes a table of Frequencies
or Probabilities
(third argument). The table is filled in place with the counts or probabilities of each pair of columns from the msa
(second argument).
weights
(default:NoClustering()
): Weights to be used for table counting.pseudocounts
(default:NoPseudocount()
):Pseudocount
object to be applied to table.pseudofrequencies
(default:NoPseudofrequencies()
):Pseudofrequencies
to be applied to the normalized (probabilities) table.usediagonal
(default:true
): Iftrue
, the function will be also applied to the diagonal elements.diagonalvalue
(default:zero
): Value to fill diagonal elements ifusediagonal
isfalse
.
MIToS.Information.mapfreq
— Methodmapfreq(f, msa; rank = 1, dims = 2, alphabet = UngappedAlphabet(),
weights = NoClustering(), pseudocounts = NoPseudocount(),
pseudofrequencies = NoPseudofrequencies(), probabilities = true,
usediagonal = false, diagonalvalue = NaN, kargs...)
It efficiently map a function (first argument) that takes a table of Frequencies
or Probabilities
(depending on the probabilities
keyword argument) calculated on sequences (dims = 1
) or columns (dims = 2
, the default) of an msa
(second argument). If rank = 1
, the default, the function is applied to each sequence or column. If rank = 2
, the function is applied to each pair of sequences or columns. In that case, we can set the usediagonal
keyword argument to true
to apply the function to pairs of the same sequence or column. The diagonalvalue
keyword argument is used to set the value of the diagonal elements if usediagonal
is false
. By default, the function is not applied to the diagonal elements (i.e. usediagonal = false
) and the diagonalvalue
is set to NaN
. The alphabet
keyword argument can be used to set the alphabet used to construct the contingency table. The function also accepts the following keyword arguments:
weights
(default:NoClustering()
): Weights to be used for table counting.pseudocounts
(default:NoPseudocount()
):Pseudocount
object to be applied to table.pseudofrequencies
(default:NoPseudofrequencies()
):Pseudofrequencies
to be applied to the normalized (probabilities) table.
Note that the pseudofrequencies
argument is only valid if probabilities = true
. All the other keyword arguments are passed to the function f
.
julia> using Random, MIToS.MSA, MIToS.Information
julia> msa = rand(Random.MersenneTwister(1), Residue, 3, 6) # random MSA as an example
3×6 Matrix{Residue}:
F A F D E V
T R R G F I
N V S W Q T
julia> mapfreq(sum, msa) # default: rank=1, dims=2, probabilities=true
1×6 Named Matrix{Float64}
Function ╲ Col │ 1 2 3 4 5 6
───────────────┼─────────────────────────────
sum │ 1.0 1.0 1.0 1.0 1.0 1.0
julia> mapfreq(sum, msa, probabilities=false)
1×6 Named Matrix{Float64}
Function ╲ Col │ 1 2 3 4 5 6
───────────────┼─────────────────────────────
sum │ 3.0 3.0 3.0 3.0 3.0 3.0
julia> mapfreq(sum, msa, dims=1)
3×1 Named Matrix{Float64}
Seq ╲ Function │ sum
───────────────┼────
1 │ 1.0
2 │ 1.0
3 │ 1.0
MIToS.Information.mapseqfreq!
— MethodIt efficiently map a function (first argument) that takes a table of Frequencies
or Probabilities
(third argument). The table is filled in place with the counts or probabilities of each sequence from the msa
(second argument).
weights
(default:NoClustering()
): Weights to be used for table counting.pseudocounts
(default:NoPseudocount()
):Pseudocount
object to be applied to table.pseudofrequencies
(default:NoPseudofrequencies()
):Pseudofrequencies
to be applied to the normalized (probabilities) table.
MIToS.Information.mapseqpairfreq!
— MethodIt efficiently map a function (first argument) that takes a table of Frequencies
or Probabilities
(third argument). The table is filled in place with the counts or probabilities of each pair of sequences from the msa
(second argument).
weights
(default:NoClustering()
): Weights to be used for table counting.pseudocounts
(default:NoPseudocount()
):Pseudocount
object to be applied to table.pseudofrequencies
(default:NoPseudofrequencies()
):Pseudofrequencies
to be applied to the normalized (probabilities) table.usediagonal
(default:true
): Iftrue
, the function will be also applied to the diagonal elements.diagonalvalue
(default:zero
): Value to fill diagonal elements ifusediagonal
isfalse
.
MIToS.Information.marginal_entropy
— Methodmarginal_entropy(table::Union{Frequencies{T,N,A},Probabilities{T,N,A}}; margin::Int=1,
base::Number=ℯ)
It calculates marginal entropy (H) from a table of Frequencies
or Probabilities
. It takes two keyword arguments: margin
and base
. The first one is used to indicate the margin used to calculate the entropy, e.g. it estimates the entropy H(X) if margin is 1, H(Y) for 2, etc. The default value of margin
is 1. The second keyword argument is used to change the base of the log. The default base for the log is ℯ (base=ℯ
), so the result is in nats. You can use base = 2
to get the result in bits.
MIToS.Information.mutual_information
— Methodmutual_information(msa::AbstractArray{Residue}; base::Number=ℯ, kargs...)
It calculates Mutual Information (MI) from a multiple sequence alignment (MSA). The default base for the log is ℯ (base=ℯ
), so the result is in nats. You can use base = 2
to get the result in bits. The minimum value for rank
is 2 (the default value). By defualt, it uses counts/frequencies to calculate the MI, as it's faster. You can use the keyword argument probabilities = true
to calculate the MI from probabilities.
julia> using Random, MIToS.MSA, MIToS.Information
julia> msa = rand(Random.MersenneTwister(37), Residue, 3, 4)
3×4 Matrix{Residue}:
T R F K
S H C I
G G R V
julia> mi = mutual_information(msa);
julia> mi[1, 2]
1.0986122886681098
MIToS.Information.mutual_information
— Methodmutual_information(table::Union{Frequencies{T,2,A},Probabilities{T,2,A}}; base::Number=ℯ)
It calculates Mutual Information (MI) from a table of Frequencies
or Probabilities
. The default base for the log is ℯ (base=ℯ
), so the result is in nats. You can use base = 2
to get the result in bits. Note that calculating MI from Frequencies
is faster than from Probabilities
.
MIToS.Information.mutual_information
— Methodmutual_information(table::Union{Frequencies{T,3,A},Probabilities{T,3,A}}; base::Number=ℯ)
It calculates Mutual Information (MI) from a table of Frequencies
or Probabilities
with three dimensions. The default base for the log is ℯ (base=ℯ
), so the result is in nats. You can use base = 2
to get the result in bits.
julia> using Random, MIToS.MSA, MIToS.Information
julia> msa = rand(Random.MersenneTwister(37), Residue, 3, 4)
3×4 Matrix{Residue}:
T R F K
S H C I
G G R V
julia> Nxyz = frequencies(msa[:, 1], msa[:, 2], msa[:, 3]);
julia> mutual_information(Nxyz)
1.0986122886681093
MIToS.Information.normalized_mutual_information
— Methodnormalized_mutual_information(msa::AbstractArray{Residue}; kargs...)
This function calculates the Normalized Mutual Information (nMI) from a multiple sequence alignment using the mapfreq
function—all the keyword arguments are passed to mapfreq
. The mutual information score is normalized by the joint entropy of the two variables: $nMI(X, Y) = MI(X, Y) / H(X, Y)$ By default, it uses counts/frequencies to estimate the nMI, as it's faster than using probabilities.
MIToS.Information.normalized_mutual_information
— MethodIt calculates a Normalized Mutual Information (nMI) from a table of Frequencies
or Probabilities
. The mutual information score is normalized by the joint entropy of the two variables: $nMI(X, Y) = MI(X, Y) / H(X, Y)$
MIToS.Information.pairwisegapfraction
— MethodIt takes a MSA or a file and a FileFormat
as first arguments. It calculates the percentage of gaps on columns pairs (union and intersection) using sequence clustering (Hobohm I).
Argument, type, default value and descriptions:
- clustering Bool true Sequence clustering (Hobohm I)
- threshold 62 Percent identity threshold for sequence clustering (Hobohm I)
This function returns:
- pairwise gap union as percentage
- pairwise gap intersection as percentage
MIToS.Information.probabilities!
— MethodIt populates a ContingencyTable
(first argument) using the probabilities in the sequences (last positional arguments). The dimension of the table must match the number of sequences and all the sequences must have the same length. You must indicate the used weights, pseudocounts and pseudofrequencies as second, third and fourth positional arguments respectively. You can use NoClustering()
, NoPseudocount()
and NoPseudofrequencies()
to avoid the use of sequence weighting, pseudocounts and pseudofrequencies, respectively.
MIToS.Information.probabilities
— MethodIt returns a ContingencyTable
wrapped in a Probabilities
type with the probabilities of residues in the sequences that takes as arguments. The dimension of the table is equal to the number of sequences. You can use the keyword arguments alphabet
, weights
, pseudocounts
and pseudofrequencies
to indicate the alphabet of the table (default to UngappedAlphabet()
), a clustering result (default to NoClustering()
), the pseudocounts (default to NoPseudocount()
) and the pseudofrequencies (default to NoPseudofrequencies()
) to be used during the estimation of the probabilities.
MIToS.Information.shannon_entropy
— Methodshannon_entropy(msa::AbstractArray{Residue}; base::Number=ℯ,
probabilities::Bool=false, usediagonal::Bool=true, kargs...)
It calculates the Shannon entropy (H) on a MSA. You can use the keyword argument base
to change the base of the log. The default base for the log is ℯ (base=ℯ
), so the result is in nats. You can use base = 2
to get the result in bits. It uses mapfreq
under the hood, so it takes the same keyword arguments. By default, it measures the entropy of each column in the MSA. You can use dims = 1
to measure the entropy of each sequence. You can also set rank = 2
to measure the joint entropy of each pair of sequences or columns. This function sets by default the probabilities
keyword argument to false
because it's faster to calculate the entropy from counts/frequencies. It also sets usediagonal = true
to also calculate the entropy of the individual variables (sequences or columns).
julia> using MIToS.MSA, MIToS.Information
julia> msa = Residue['C' 'G'; 'C' 'L'; 'C' 'I']
3×2 Matrix{Residue}:
C G
C L
C I
julia> shannon_entropy(msa)
1×2 Named Matrix{Float64}
Function ╲ Col │ 1 2
────────────────┼─────────────────
shannon_entropy │ 0.0 1.09861
MIToS.Information.shannon_entropy
— Methodshannon_entropy(table::Union{Frequencies{T,N,A},Probabilities{T,N,A}}; base::Number=ℯ)
It calculates the Shannon entropy (H) from a table of Frequencies
or Probabilities
. Use last and optional positional argument to change the base of the log. The default base for the log is ℯ (base=ℯ
), so the result is in nats. You can use base = 2
to get the result in bits.