GPCRAnalysis
Documentation for GPCRAnalysis.
GPCRAnalysis.AccessionCode
GPCRAnalysis.BWScheme
GPCRAnalysis.NWGapCosts
GPCRAnalysis.SequenceMapping
GPCRAnalysis.StructAlign
GPCRAnalysis.align
GPCRAnalysis.align_closest
GPCRAnalysis.align_nw
GPCRAnalysis.align_nw
GPCRAnalysis.align_ranges
GPCRAnalysis.align_to_axes
GPCRAnalysis.align_to_membrane
GPCRAnalysis.alphacarbon_coordinates
GPCRAnalysis.alphacarbon_coordinates_matrix
GPCRAnalysis.alphafoldfile
GPCRAnalysis.alphafoldfiles
GPCRAnalysis.alphafoldfiles
GPCRAnalysis.chargelocations
GPCRAnalysis.chimerax_script
GPCRAnalysis.columnwise_entropy
GPCRAnalysis.download_alphafolds
GPCRAnalysis.features_from_structure
GPCRAnalysis.filter_long!
GPCRAnalysis.filter_species!
GPCRAnalysis.forcecomponents
GPCRAnalysis.forcedict
GPCRAnalysis.getchain
GPCRAnalysis.inward_ecl_residues
GPCRAnalysis.inward_tm_residues
GPCRAnalysis.lookupbw
GPCRAnalysis.map_closest
GPCRAnalysis.map_uniprot_retrieve
GPCRAnalysis.map_uniprot_status
GPCRAnalysis.map_uniprot_submit
GPCRAnalysis.marker
GPCRAnalysis.optimize_weights
GPCRAnalysis.pLDDTcolor
GPCRAnalysis.project_sequences
GPCRAnalysis.query_alphafold_latest
GPCRAnalysis.query_ebi_proteins
GPCRAnalysis.query_ncbi
GPCRAnalysis.query_uniprot_accession
GPCRAnalysis.residue_centroid
GPCRAnalysis.residue_centroid_matrix
GPCRAnalysis.residueindex
GPCRAnalysis.residueindex
GPCRAnalysis.sortperm_msa
GPCRAnalysis.species
GPCRAnalysis.try_download_alphafold
GPCRAnalysis.uniprotX
GPCRAnalysis.AccessionCode
— Methodac = AccessionCode(msa, seqname)
Return the Uniprot accession code associated with seqname
.
GPCRAnalysis.BWScheme
— MethodBWScheme(conserved_idx, tmspans)
Specify the Ballesteros-Weinstein scheme used for a particular protein. conserved_idx
is a list of 7 "most conserved" residues per helix (rhodopsin family: N1, D2, R3, W4, P5, P6, P7) and the span of each helix.
Examples
For mouse rhodopsin (P15409),
julia> opsd_scheme = BWScheme([55, 83, 135, 161, 215, 267, 303],
[37:61, 74:96, 111:133, 153:173, 203:224, 253:274, 287:308]);
GPCRAnalysis.NWGapCosts
— Methodgapcosts = NWGapCosts{T}(; extend1=0, extend2=0, open1=0, open2=0)
Create an affine cost for gaps in Needleman-Wunsch alignment. The cost of a gap of length k
is
extend * k + open
All costs must be nonnegative.
gapcosts(ϕ, idxs1, idxs2)
computes the contribution of gaps to the cost of alignment ϕ
between two sequences with idxs1 = eachindex(seq1)
and idxs2 = eachindex(seq2)
. (The indices are needed to determine whether the alignment starts or ends with a gap.)
GPCRAnalysis.SequenceMapping
— Methodsm = SequenceMapping([4, 5, 0, ...])
sm = SequenceMapping(seq::AnnotatedAlignedSequence)
A SequenceMapping
is a vector of indexes within a full sequence that map to a reference. Specifically, sm[i]
is the index of the residue in the full sequence that maps to the i
-th position in the reference. 0 is a placeholder for a position in the reference that has no mapping to the full sequence.
Example
SequenceMapping([4, 5, 0, ...])
indicates that:
- the first position in the reference maps to the fourth residue in the full sequence,
- the second position in the reference maps to the fifth residue in the full sequence, and
- the third position in the reference lacks a corresponding residue in the full sequence.
GPCRAnalysis.StructAlign
— MethodStructAlign(struct1::ChainLike, struct2::ChainLike, filename::AbstractString)
Create a structure-based alignment between struct1
and struct2
. filename
is the name of the TM-align "results" file (e.g., https://zhanggroup.org//TM-align/example/873772.html).
See also residueindex
.
GPCRAnalysis.align
— Methodtform = align(fixedpos::AbstractMatrix{Float64}, moving::Chain, sm::SequenceMapping)
tform = align(fixed::Chain, moving::Chain, sm::SequenceMapping)
Return a rotated and shifted version of moving
so that the centroids of residues moving[sm]
have least mean square error deviation from positions fixedpos
or those of residues fixed
. fixed
or fixedpos
should include just the residues or positions you want to align to, but moving
should be an entire chain.
GPCRAnalysis.align_closest
— Methodtform = align_closest(mapto::StructLike, mapfrom::StructLike; Dthresh=5)
tform = align_closest(coordsto, coordsfrom; Dthresh=5)
Return the rigid transformation best aligning mapfrom
to mapto
. The transformation is computed from residues matched by map_closest
, using only residues closer than Dthresh
.
Because the mapping is determined by distance, this can only "tweak" an already-close alignment.
GPCRAnalysis.align_nw
— Methodϕ = align_nw(D, gapcosts::NWGapCosts)
Given a pairwise penalty matrix D
(e.g., a pairwise distance matrix) and costs for opening and extending gaps, find the optimal pairings
ϕ = [(i1, j1), (i2, j2), ...]
that minimize
sum(D[ϕk...] for ϕk in ϕ) + gapcosts(ϕ, axes(D)...)
subject to the constraint that all(ϕ[k+1] .> ϕ[k])
for all k
.
GPCRAnalysis.align_nw
— Methodϕ = align_nw(seq1, seq2, gapcosts::NWGapCosts; mode=:distance_orientation)
Find the optimal ϕ
matching seq1[ϕ[k][1]]
to seq2[ϕ[k][2]]
for all k
. mode
controls the computation of pairwise matching penalties, and can be either :distance
or :distance_orientation
, where the latter adds any mismatch in sidechain orientation to the distance penalty.
seq1
and seq2
must be aligned to each other in 3D space before calling this function. See align
.
GPCRAnalysis.align_ranges
— Methodseqtms = align_ranges(seq1, seq2, seq2ranges::AbstractVector{<:AbstractUnitRange})
Transfer refranges
, a list of reside index spans in seq2
, to seq1
. seq1
and seq2
must be spatially aligned, and the assignment is made by minimizing inter-chain distance subject to the constraint of preserving sequence order.
GPCRAnalysis.align_to_axes
— Methodtform = align_to_axes(strct)
Compute the transformation needed to align the principle axes of inertia of strct
with the coordinate axes.
GPCRAnalysis.align_to_membrane
— Methodtform = align_to_membrane(chain::ChainLike, tms; extracellular=true)
Compute the rigid transformation tform
needed to align chain
to the membrane, given the transmembrane segments tms
as residue-indexes (e.g., [37:61, 74:96, ...]
). extracelluar
should be true if the N-terminus of the protein is extracellular (chain[first(tms[1])]
is at the extracellular face), and false otherwise.
applytransform!(chain, tform)
(or the model
that includes chain
) will re-orient chain
so that the center of the membrane is z=0
and extracellular is positive.
The algorithm finds the membrane normal u
by maximizing the ratio
sumᵢ (u ⋅ vᵢ)²
sumᵢ (u ⋅ δᵢ)²
where vᵢ
is a vector parallel to the i
th TM helix, and δᵢ
is a within-leaflet atomic displacement.
GPCRAnalysis.alphacarbon_coordinates
— Methodalphacarbon_coordinates(res::AbstractResidue)
Return the coordinates of the α-carbon in res
.
GPCRAnalysis.alphacarbon_coordinates_matrix
— Methodalphacarbon_coordinates_matrix(seq)
Return a matrix of αC coordinates as columns across all residues. See also alphacarbon_coordinates
.
GPCRAnalysis.alphafoldfile
— Functionfns = alphafoldfile(uniprotXname, dirname=pwd(); join=false)
Return the latest version of the AlphaFold file for uniprotXname
in dirname
. If join
is true
, then the full path is returned.
GPCRAnalysis.alphafoldfiles
— Functionmsacode2structfile = alphafoldfiles(msa::AnnotatedMultipleSequenceAlignment, dirname=pwd())
Return a dictionary mapping MSACode
s to the corresponding AlphaFold structure files.
GPCRAnalysis.alphafoldfiles
— Functionfns = alphafoldfiles(dirname=pwd(); join=false)
Return the latest version of all AlphaFold files in dirname
. If join
is true
, then the full paths are returned.
GPCRAnalysis.chargelocations
— Methodchargelocations(chain::ChainLike; include_his::Bool=false)
Return a list of potential charge locations in the protein structure. Each is a tuple (position, residueindex, AAname)
. The positions are those of N
(in positively-charged residues like Arg & Lys) or O
(in negatively-charged residues like Asp and Glu). N- and C-termini are not included in the list. While each residue will carry a net total charge of ±1, the location of each potential charge will be listed (1 for Lys, 2 each for Arg, Asp, and Glu).
By default, histidine is not considered charged, but you can include it by setting include_His=true
.
GPCRAnalysis.chimerax_script
— Methodchimerax_script(scriptfilename, uprot_list, msa::AnnotatedMultipleSequenceAlignment, colidxs;
dir=pwd(), align=true, chain_transparency=80, styles=Dict{Int,String}(), extras=String[])
Create a chimerax visualization script with name scriptfilename
. uprot_list
is a list of UniProtX names that you want to visualize. msa
is a Multiple Sequence alignment and colidxs
specifies the column indices in msa
corresponding to amino acid side chains that you'd like to visualize.
Keyword arguments:
dir
is the directory with the protein structure filesalign
determines whether to align the structures to the first one (uses thematchmaker
tool)chain_transparency
sets the transparency on the ribbon diagrams (0 = not
transparent)
styles
can be used to affect the display, e.g.,Dict(k => "@SD sphere")
would cause methionines at column index k
to be displayed with the sulfur in sphere mode.
extras
can be used to hand-specify a number of additional commands; this can
be useful if, for example, the msa
has occasional misalignments.
Examples
Suppose you have the msa
for rhodopsin (mouse: P15409), then:
chimerax_script("myscript.cxc", ["P15409"], msa, [i1, i2, i3])
where i1
through i3
are column-indices in the msa
that you'd like to view.
GPCRAnalysis.columnwise_entropy
— Functioncolumnwise_entropy(msa::AbstractMultipleSequenceAlignment, aacode = ReducedAlphabet("(AILMV)(NQST)(RHK)(DE)(FWY)CGP"))
Compute the entropy of each column in an MSA. Low entropy indicates high conservation.
Unmatched entries ('-'
residues) contribute to the entropy calculation as if they were an ordinary residue.
GPCRAnalysis.download_alphafolds
— Methoddownload_alphafolds(msa::AbstractMultipleSequenceAlignment; dirname=pwd())
Download all available AlphaFold structures for the sequences in msa
. Missing entries are silently skipped.
GPCRAnalysis.features_from_structure
— Functionmgmm = features_from_structure(seq::ChainLike, idxs=1:length(seq); combined=false)
Construct an IsotropicMultiGMM
from seq
by adding features for each residue in idxs
.
combined=true
causes all atoms sharing the same feature to be combined, reducing the total number of features in the resulting model. The default creates features for each atom separately.
The σfun
and ϕfun
keyword arguments are functions that determine the standard deviation and amplitude of each gaussian feature, respectively, and take arguments (atom, residue, feature)
.
The output mgmm
may include the following features:
:Steric
(the "hard center", a proxy for the repulsive core of Lennard-Jones potentials):Hydrophobe
(van der Waals interactions, a proxy for the attractive part of Lennard-Jones potentials):Aromatic
(the aromatic ring of phenylalanine, tyrosine, and tryptophan):PosIonizable
(the positively charged nitrogen of histidine, arginine, and lysine; histidine gets a fractional charge of +0.1):NegIonizable
(the negatively charged oxygen of aspartate and glutamate; each oxygen gets a fractional charge of -0.5):Donor
(the hydrogen of a hydrogen bond donor):Acceptor
(the oxygen of a hydrogen bond acceptor)
GPCRAnalysis.filter_long!
— Methodfilter_long!(msa::AbstractMultipleSequenceAlignment, minres::Real)
Remove all sequences from msa
with fewer than minres
matching residues.
GPCRAnalysis.filter_species!
— Methodfilter_species!(msa::AbstractMultipleSequenceAlignment, speciesname::AbstractString)
Remove all sequences from msa
except those with species(sequencename)
equal to speciesname
.
GPCRAnalysis.forcecomponents
— Functionforces = forcecomponents(seq, interactions::AbstractVector, residueindexes=eachindex(seq); kwargs...)
Calculate the forces between residues in seq
based on the features of each residue and the given interactions
, which must be a list of 2-tuples (:field1, :field2)
or pairs (:field1, :field2) => coef
, where :field1
and :field2
are the names of the features that interact and coef
is the coefficient of the force (defaults to 1). The optional keyword arguments kwargs
are as described in features_from_structure
.
The feature-names can be the ones used in features_from_structure
. Each pair of interactions should be listed, but only in one order (symmetry is automatically enforced). Optionally, you can also "bundle" features together: :Ionic
is a bundle of :PosIonizable
and :NegIonizable
, where like charges repel and opposite charges attract with the same magnitude of force. Using (:Ionic,:Ionic)
instead of listing all three interactions (:PosIonizable,:PosIonizable
, :NegIonizable,:NegIonizable
, and :PosIonizable,:NegIonizable
) separately ensures that any tuning of ionic forces will satisfy the symmetries of the real world.
Upon return, there is one force-matrix for each residue in seq
listed in residueindexes
. Each force-matrix is a 3×n
matrix where each row corresponds to a force component (x, y, z)
and the k
th column corresponds to interactions[k]
.
Examples
julia> seq = getchain("1GZM.pdb")
interactions = [(:Steric, :Steric) => 1, # repulsive
(:Hydrophobe, :Hydrophobe) => -1, # attractive
(:Donor, :Acceptor) => -1 # attractive
]
julia> forces = forcecomponents(seq, interactions)
The output is a list of 3×4 matrices, one for each residue in seq
.
See also: optimize_weights
, forcedict
.
GPCRAnalysis.forcedict
— Methodinteractiondict = forcedict(interactions::AbstractVector, w = ones(length(interactions)))
Create a dictionary of interactions, where interactions[i]
should be weighted by w[i]
.
GPCRAnalysis.getchain
— Methodgetchain(filename::AbstractString; model=1, chain="A")
Read a PDB or mmCIF file filename
and extract the specified chain.
GPCRAnalysis.inward_ecl_residues
— Methodinward_ecl_residues(seq, eclidxs)
Return an array of boolean[] indicating which residues (of those specified by eclidxs
) are inward-facing (i.e. downward toward the opening of the binding pocket).
eclidxs
is a vector (with each entry corresponding to an extracellular loop) of ranges of residue indices.
GPCRAnalysis.inward_tm_residues
— Methodinward_tm_residues(seq, tmidxs)
Return an array of boolean[] indicating which residues (of those specified by tmidxs
) are inward-facing.
tmidxs
is a vector (typically of length 7, with each entry corresponding to a transmembrane region) of ranges of residue indices.
GPCRAnalysis.lookupbw
— Methodhelix, residue_position = lookupbw(idx::Integer, scheme::BWScheme)
Calculate the Ballesteros-Weinstein residue number coresponding to residue idx
. tmspans
describes the transmembrane regions in the reference, and bwconserved
the index of the most-conserved residue.
Examples
For mouse rhodopsin (P15409),
julia> opsd_scheme = BWScheme([55, 83, 135, 161, 215, 267, 303],
[34:64, 73:99, 107:139, 150:173, 200:229, 246:277, 285:309]);
julia> lookupbw(160, opsd_scheme)
(4, 49)
julia> lookupbw((4, 49), opsd_scheme)
160
This is the residue just before the most-conserved residue of helix 4.
GPCRAnalysis.map_closest
— Methodmapping = map_closest(mapto::StructureLike, mapfrom::StructureLike)
Return a vector mapping[i] = (j, distij)
, matching the i
th residue in mapto
to the j
th residue in mapfrom
and reporting the distance between them. The mapping minimizes the sum of distances. mapto
and mapfrom
must already be aligned for this to be meaningful.
If mapfrom
is shorter than mapto
, some j
s will be 0, indicating a skipped residue in mapto
.
GPCRAnalysis.map_uniprot_retrieve
— Methodresult = GPCRAnalysis.map_uniprot_retrieve(jobId)
Retrieve the results of a Uniprot ID mapping job.
GPCRAnalysis.map_uniprot_status
— Methodstatus = GPCRAnalysis.map_uniprot_status(jobId)
Check the status of a Uniprot ID mapping job. Returns true
if the results are ready. Otherwise, returns the status object.
GPCRAnalysis.map_uniprot_submit
— FunctionjobID = GPCRAnalysis.map_uniprot_submit(ids, from, to="UniProtKB")
Submit a list of ids
to the Uniprot ID mapping service, to convert from ID convention from
to to
. The jobID can be used to check the status (map_uniprot_status
) and retrieve the results (map_uniprot_retrieve
).
Examples
jldoctest` julia> jobID = GPCRAnalysis.map_uniprot_submit(["ENSMUSG00000067064", "ENSMUSG00000057464"], "Ensembl");
GPCRAnalysis.marker
— Methodstr = marker(modelnum, pos, radius, color)
Return a string that represents a marker in ChimeraX. modelnum
is the model number, pos
is a 3D position, radius
is the radius of the marker, and color
is the color of the marker.
GPCRAnalysis.optimize_weights
— Methodw = optimize_weights(forces)
Tune the weights w
to approximately satisfy force balance, i.e., solve
min w.r.t. w sum(sum(abs2, f*w) for f in forces)
subject to sum(w) == 1, w .>= 0
This is based on the notion that the protein structure is presumably at an energy minimum.
This seems to work best for ubiquitous interactions, like :Steric
, :Hydrophobe
, and hydrogen-bonding. Rarer interactions (:Ionic
, :Aromatic
) may need to be tuned via different principles.
This function requires that you manually load JuMP and HiGHS, e.g., using JuMP, HiGHS
.
GPCRAnalysis.pLDDTcolor
— MethodpLDDTcolor(r::Residue)
pLDDTcolor(score::Real)
Return the color corresponding to the pLDDT score (a measure of confidence) of a residue.
GPCRAnalysis.project_sequences
— MethodX = project_sequences(msa::AbstractMultipleSequenceAlignment; fracvar::Real = 0.9)
Perform a classical multidimensional scaling analysis to project the sequences in msa
to a space in which pairwise distances approximately reproduce 100 - percentsimilarity(seq1, seq2)
. The dimensionality is chosen to reconstruction fracvar
of the variance.
GPCRAnalysis.query_alphafold_latest
— Methodurl = query_alphafold_latest(uniprotXname)
Query the AlphaFold API for the latest structure of uniprotXname
.
GPCRAnalysis.query_ebi_proteins
— Methodresult = query_ebi_proteins(id)
Query the EBI Proteins API for a protein with the specified id
, which must be the Uniprot accession code. You can also supply several proteins as a comma-separated list.
result
is a JSON3 object with many fields.
GPCRAnalysis.query_ncbi
— Methodresult = query_ncbi(id)
Query the NCBI API for a gene with the specified id
, which must be the gene accession code. result
is a JSON3 object with many fields.
GPCRAnalysis.query_uniprot_accession
— Methodaccession_code = query_uniprot_accession(id)
Perform a Uniprot search for id
, returning the canonical accession code.
Examples
julia> query_uniprot_accession("T2R38_MOUSE")
"Q7TQA6"
GPCRAnalysis.residue_centroid
— Methodresidue_centroid(r::AbstractResidue)
Compute the mean position of all atoms in r
excluding hydrogen.
Residue centers may yield a more reliable measure of "comparable residues" than the α-carbons (see CAmatrix
) because they incorporate the orientation of the side chain relative to the overall fold.
See also residue_centroid_matrix
.
GPCRAnalysis.residue_centroid_matrix
— Methodresidue_centroid_matrix(seq)
Return a matrix of all residue centroids as columns. See also residue_centroid
.
GPCRAnalysis.residueindex
— Methodresidueindex(sa::StructAlign, idx1, nothing)
residueindex(sa::StructAlign, idx1, nothing, ±1)
Calculate the residue index in structure 2 corresponding to residue idx1
in structure 1. The second form allows you to find the nearest corresponding residue in the forward (+1) or reverse (-1) directions, if idx1
is not among the mapped residues.
GPCRAnalysis.residueindex
— Methodresidueindex(sa::StructAlign, nothing, idx2)
residueindex(sa::StructAlign, nothing, idx2, ±1)
Calculate the residue index in structure 1 corresponding to residue idx2
in structure 2. The second form allows you to find the nearest corresponding residue in the forward (+1) or reverse (-1) directions, if idx2
is not among the mapped residues.
GPCRAnalysis.sortperm_msa
— Methodtour = sortperm_msa(msa::AbstractMultipleSequenceAlignment)
Order the sequences in msa
to minimize the "tour length" visiting each sequence once. The length between sequences is defined at 100 - percentsimilarity(seq1, seq2)
.
This can be useful for graphical or alignment display by grouping obviously-similar sequences near one another.
GPCRAnalysis.species
— Methodspecies(name)
Extract the species identifier from a UniProt "X_Y" entry or elaborated variant (e.g., PFAM sequence name).
See also uniprotX
.
Examples
julia> species("Q8VGW6_MOUSE/31-308")
"MOUSE"
GPCRAnalysis.try_download_alphafold
— Functiontry_download_alphafold(uniprotXname, path=alphafoldfilename(uniprotXname); version=4)
Attempt to download an AlphaFold structure. Returns nothing
if no entry corresponding to uniprotXname
exists; otherwise it returns path
, the pathname of the saved file.
In general, a better approach is to use download_alphafolds
for multiple proteins, or query_alphafold_latest
combined with Downloads.download
for a single protein.
GPCRAnalysis.uniprotX
— MethoduniprotX(name)
Extract the UniProt "X" entry from an X_Y entry or elaborated variant (e.g., PFAM sequence name).
See also species
.
Examples
julia> uniprotX("Q8VGW6_MOUSE/31-308")
"Q8VGW6"