GPCRAnalysis

Documentation for GPCRAnalysis.

GPCRAnalysis.BWSchemeMethod
BWScheme(conserved_idx, tmspans)

Specify the Ballesteros-Weinstein scheme used for a particular protein. conserved_idx is a list of 7 "most conserved" residues per helix (rhodopsin family: N1, D2, R3, W4, P5, P6, P7) and the span of each helix.

Examples

For mouse rhodopsin (P15409),

julia> opsd_scheme = BWScheme([55, 83, 135, 161, 215, 267, 303],
                              [37:61, 74:96, 111:133, 153:173, 203:224, 253:274, 287:308]);
source
GPCRAnalysis.NWGapCostsMethod
gapcosts = NWGapCosts{T}(; extend1=0, extend2=0, open1=0, open2=0)

Create an affine cost for gaps in Needleman-Wunsch alignment. The cost of a gap of length k is

extend * k + open

All costs must be nonnegative.

gapcosts(ϕ, idxs1, idxs2) computes the contribution of gaps to the cost of alignment ϕ between two sequences with idxs1 = eachindex(seq1) and idxs2 = eachindex(seq2). (The indices are needed to determine whether the alignment starts or ends with a gap.)

source
GPCRAnalysis.SequenceMappingMethod
sm = SequenceMapping([4, 5, 0, ...])
sm = SequenceMapping(seq::AnnotatedAlignedSequence)

A SequenceMapping is a vector of indexes within a full sequence that map to a reference. Specifically, sm[i] is the index of the residue in the full sequence that maps to the i-th position in the reference. 0 is a placeholder for a position in the reference that has no mapping to the full sequence.

Example

SequenceMapping([4, 5, 0, ...]) indicates that:

  • the first position in the reference maps to the fourth residue in the full sequence,
  • the second position in the reference maps to the fifth residue in the full sequence, and
  • the third position in the reference lacks a corresponding residue in the full sequence.
source
GPCRAnalysis.StructAlignMethod
StructAlign(struct1::ChainLike, struct2::ChainLike, filename::AbstractString)

Create a structure-based alignment between struct1 and struct2. filename is the name of the TM-align "results" file (e.g., https://zhanggroup.org//TM-align/example/873772.html).

See also residueindex.

source
GPCRAnalysis.alignMethod
tform = align(fixedpos::AbstractMatrix{Float64}, moving::Chain, sm::SequenceMapping)
tform = align(fixed::Chain, moving::Chain, sm::SequenceMapping)

Return a rotated and shifted version of moving so that the centroids of residues moving[sm] have least mean square error deviation from positions fixedpos or those of residues fixed. fixed or fixedpos should include just the residues or positions you want to align to, but moving should be an entire chain.

source
GPCRAnalysis.align_closestMethod
tform = align_closest(mapto::StructLike, mapfrom::StructLike; Dthresh=5)
tform = align_closest(coordsto, coordsfrom; Dthresh=5)

Return the rigid transformation best aligning mapfrom to mapto. The transformation is computed from residues matched by map_closest, using only residues closer than Dthresh.

Because the mapping is determined by distance, this can only "tweak" an already-close alignment.

source
GPCRAnalysis.align_nwMethod
ϕ = align_nw(D, gapcosts::NWGapCosts)

Given a pairwise penalty matrix D (e.g., a pairwise distance matrix) and costs for opening and extending gaps, find the optimal pairings

ϕ = [(i1, j1), (i2, j2), ...]

that minimize

sum(D[ϕk...] for ϕk in ϕ) + gapcosts(ϕ, axes(D)...)

subject to the constraint that all(ϕ[k+1] .> ϕ[k]) for all k.

source
GPCRAnalysis.align_nwMethod
ϕ = align_nw(seq1, seq2, gapcosts::NWGapCosts; mode=:distance_orientation)

Find the optimal ϕ matching seq1[ϕ[k][1]] to seq2[ϕ[k][2]] for all k. mode controls the computation of pairwise matching penalties, and can be either :distance or :distance_orientation, where the latter adds any mismatch in sidechain orientation to the distance penalty.

seq1 and seq2 must be aligned to each other in 3D space before calling this function. See align.

source
GPCRAnalysis.align_rangesMethod
seqtms = align_ranges(seq1, seq2, seq2ranges::AbstractVector{<:AbstractUnitRange})

Transfer refranges, a list of reside index spans in seq2, to seq1. seq1 and seq2 must be spatially aligned, and the assignment is made by minimizing inter-chain distance subject to the constraint of preserving sequence order.

source
GPCRAnalysis.align_to_axesMethod
tform = align_to_axes(strct)

Compute the transformation needed to align the principle axes of inertia of strct with the coordinate axes.

source
GPCRAnalysis.align_to_membraneMethod
tform = align_to_membrane(chain::ChainLike, tms; extracellular=true)

Compute the rigid transformation tform needed to align chain to the membrane, given the transmembrane segments tms as residue-indexes (e.g., [37:61, 74:96, ...]). extracelluar should be true if the N-terminus of the protein is extracellular (chain[first(tms[1])] is at the extracellular face), and false otherwise.

applytransform!(chain, tform) (or the model that includes chain) will re-orient chain so that the center of the membrane is z=0 and extracellular is positive.

The algorithm finds the membrane normal u by maximizing the ratio

sumᵢ (u ⋅ vᵢ)²

sumᵢ (u ⋅ δᵢ)²

where vᵢ is a vector parallel to the ith TM helix, and δᵢ is a within-leaflet atomic displacement.

source
GPCRAnalysis.alphafoldfileFunction
fns = alphafoldfile(uniprotXname, dirname=pwd(); join=false)

Return the latest version of the AlphaFold file for uniprotXname in dirname. If join is true, then the full path is returned.

source
GPCRAnalysis.alphafoldfilesFunction
msacode2structfile = alphafoldfiles(msa::AnnotatedMultipleSequenceAlignment, dirname=pwd())

Return a dictionary mapping MSACodes to the corresponding AlphaFold structure files.

source
GPCRAnalysis.alphafoldfilesFunction
fns = alphafoldfiles(dirname=pwd(); join=false)

Return the latest version of all AlphaFold files in dirname. If join is true, then the full paths are returned.

source
GPCRAnalysis.chargelocationsMethod
chargelocations(chain::ChainLike; include_his::Bool=false)

Return a list of potential charge locations in the protein structure. Each is a tuple (position, residueindex, AAname). The positions are those of N (in positively-charged residues like Arg & Lys) or O (in negatively-charged residues like Asp and Glu). N- and C-termini are not included in the list. While each residue will carry a net total charge of ±1, the location of each potential charge will be listed (1 for Lys, 2 each for Arg, Asp, and Glu).

By default, histidine is not considered charged, but you can include it by setting include_His=true.

source
GPCRAnalysis.chimerax_scriptMethod
chimerax_script(scriptfilename, uprot_list, msa::AnnotatedMultipleSequenceAlignment, colidxs;
                dir=pwd(), align=true, chain_transparency=80, styles=Dict{Int,String}(), extras=String[])

Create a chimerax visualization script with name scriptfilename. uprot_list is a list of UniProtX names that you want to visualize. msa is a Multiple Sequence alignment and colidxs specifies the column indices in msa corresponding to amino acid side chains that you'd like to visualize.

Keyword arguments:

  • dir is the directory with the protein structure files
  • align determines whether to align the structures to the first one (uses the matchmaker tool)
  • chain_transparency sets the transparency on the ribbon diagrams (0 = not

transparent)

  • styles can be used to affect the display, e.g., Dict(k => "@SD sphere")

would cause methionines at column index k to be displayed with the sulfur in sphere mode.

  • extras can be used to hand-specify a number of additional commands; this can

be useful if, for example, the msa has occasional misalignments.

Examples

Suppose you have the msa for rhodopsin (mouse: P15409), then:

chimerax_script("myscript.cxc", ["P15409"], msa, [i1, i2, i3])

where i1 through i3 are column-indices in the msa that you'd like to view.

source
GPCRAnalysis.columnwise_entropyFunction
columnwise_entropy(msa::AbstractMultipleSequenceAlignment, aacode = ReducedAlphabet("(AILMV)(NQST)(RHK)(DE)(FWY)CGP"))

Compute the entropy of each column in an MSA. Low entropy indicates high conservation.

Unmatched entries ('-' residues) contribute to the entropy calculation as if they were an ordinary residue.

source
GPCRAnalysis.features_from_structureFunction
mgmm = features_from_structure(seq::ChainLike, idxs=1:length(seq); combined=false)

Construct an IsotropicMultiGMM from seq by adding features for each residue in idxs.

combined=true causes all atoms sharing the same feature to be combined, reducing the total number of features in the resulting model. The default creates features for each atom separately.

The σfun and ϕfun keyword arguments are functions that determine the standard deviation and amplitude of each gaussian feature, respectively, and take arguments (atom, residue, feature).

The output mgmm may include the following features:

  • :Steric (the "hard center", a proxy for the repulsive core of Lennard-Jones potentials)
  • :Hydrophobe (van der Waals interactions, a proxy for the attractive part of Lennard-Jones potentials)
  • :Aromatic (the aromatic ring of phenylalanine, tyrosine, and tryptophan)
  • :PosIonizable (the positively charged nitrogen of histidine, arginine, and lysine; histidine gets a fractional charge of +0.1)
  • :NegIonizable (the negatively charged oxygen of aspartate and glutamate; each oxygen gets a fractional charge of -0.5)
  • :Donor (the hydrogen of a hydrogen bond donor)
  • :Acceptor (the oxygen of a hydrogen bond acceptor)
source
GPCRAnalysis.filter_long!Method
filter_long!(msa::AbstractMultipleSequenceAlignment, minres::Real)

Remove all sequences from msa with fewer than minres matching residues.

source
GPCRAnalysis.forcecomponentsFunction
forces = forcecomponents(seq, interactions::AbstractVector, residueindexes=eachindex(seq); kwargs...)

Calculate the forces between residues in seq based on the features of each residue and the given interactions, which must be a list of 2-tuples (:field1, :field2) or pairs (:field1, :field2) => coef, where :field1 and :field2 are the names of the features that interact and coef is the coefficient of the force (defaults to 1). The optional keyword arguments kwargs are as described in features_from_structure.

The feature-names can be the ones used in features_from_structure. Each pair of interactions should be listed, but only in one order (symmetry is automatically enforced). Optionally, you can also "bundle" features together: :Ionic is a bundle of :PosIonizable and :NegIonizable, where like charges repel and opposite charges attract with the same magnitude of force. Using (:Ionic,:Ionic) instead of listing all three interactions (:PosIonizable,:PosIonizable, :NegIonizable,:NegIonizable, and :PosIonizable,:NegIonizable) separately ensures that any tuning of ionic forces will satisfy the symmetries of the real world.

Upon return, there is one force-matrix for each residue in seq listed in residueindexes. Each force-matrix is a 3×n matrix where each row corresponds to a force component (x, y, z) and the kth column corresponds to interactions[k].

Examples

julia> seq = getchain("1GZM.pdb")
interactions = [(:Steric, :Steric) => 1,           # repulsive
                (:Hydrophobe, :Hydrophobe) => -1,  # attractive
                (:Donor, :Acceptor) => -1          # attractive
                ]
julia> forces = forcecomponents(seq, interactions)

The output is a list of 3×4 matrices, one for each residue in seq.

See also: optimize_weights, forcedict.

source
GPCRAnalysis.forcedictMethod
interactiondict = forcedict(interactions::AbstractVector, w = ones(length(interactions)))

Create a dictionary of interactions, where interactions[i] should be weighted by w[i].

source
GPCRAnalysis.getchainMethod
getchain(filename::AbstractString; model=1, chain="A")

Read a PDB or mmCIF file filename and extract the specified chain.

source
GPCRAnalysis.inward_ecl_residuesMethod
inward_ecl_residues(seq, eclidxs)

Return an array of boolean[] indicating which residues (of those specified by eclidxs) are inward-facing (i.e. downward toward the opening of the binding pocket).

eclidxs is a vector (with each entry corresponding to an extracellular loop) of ranges of residue indices.

source
GPCRAnalysis.inward_tm_residuesMethod
inward_tm_residues(seq, tmidxs)

Return an array of boolean[] indicating which residues (of those specified by tmidxs) are inward-facing.

tmidxs is a vector (typically of length 7, with each entry corresponding to a transmembrane region) of ranges of residue indices.

source
GPCRAnalysis.lookupbwMethod
helix, residue_position = lookupbw(idx::Integer, scheme::BWScheme)

Calculate the Ballesteros-Weinstein residue number coresponding to residue idx. tmspans describes the transmembrane regions in the reference, and bwconserved the index of the most-conserved residue.

Examples

For mouse rhodopsin (P15409),

julia> opsd_scheme = BWScheme([55, 83, 135, 161, 215, 267, 303],
           [34:64, 73:99, 107:139, 150:173, 200:229, 246:277, 285:309]);

julia> lookupbw(160, opsd_scheme)
(4, 49)

julia> lookupbw((4, 49), opsd_scheme)
160

This is the residue just before the most-conserved residue of helix 4.

source
GPCRAnalysis.map_closestMethod
mapping = map_closest(mapto::StructureLike, mapfrom::StructureLike)

Return a vector mapping[i] = (j, distij), matching the ith residue in mapto to the jth residue in mapfrom and reporting the distance between them. The mapping minimizes the sum of distances. mapto and mapfrom must already be aligned for this to be meaningful.

If mapfrom is shorter than mapto, some js will be 0, indicating a skipped residue in mapto.

source
GPCRAnalysis.map_uniprot_statusMethod
status = GPCRAnalysis.map_uniprot_status(jobId)

Check the status of a Uniprot ID mapping job. Returns true if the results are ready. Otherwise, returns the status object.

source
GPCRAnalysis.map_uniprot_submitFunction
jobID = GPCRAnalysis.map_uniprot_submit(ids, from, to="UniProtKB")

Submit a list of ids to the Uniprot ID mapping service, to convert from ID convention from to to. The jobID can be used to check the status (map_uniprot_status) and retrieve the results (map_uniprot_retrieve).

Examples

jldoctest` julia> jobID = GPCRAnalysis.map_uniprot_submit(["ENSMUSG00000067064", "ENSMUSG00000057464"], "Ensembl");

source
GPCRAnalysis.markerMethod
str = marker(modelnum, pos, radius, color)

Return a string that represents a marker in ChimeraX. modelnum is the model number, pos is a 3D position, radius is the radius of the marker, and color is the color of the marker.

source
GPCRAnalysis.optimize_weightsMethod
w = optimize_weights(forces)

Tune the weights w to approximately satisfy force balance, i.e., solve

min w.r.t. w  sum(sum(abs2, f*w) for f in forces)
subject to    sum(w) == 1, w .>= 0

This is based on the notion that the protein structure is presumably at an energy minimum.

This seems to work best for ubiquitous interactions, like :Steric, :Hydrophobe, and hydrogen-bonding. Rarer interactions (:Ionic, :Aromatic) may need to be tuned via different principles.

Note

This function requires that you manually load JuMP and HiGHS, e.g., using JuMP, HiGHS.

source
GPCRAnalysis.pLDDTcolorMethod
pLDDTcolor(r::Residue)
pLDDTcolor(score::Real)

Return the color corresponding to the pLDDT score (a measure of confidence) of a residue.

source
GPCRAnalysis.project_sequencesMethod
X = project_sequences(msa::AbstractMultipleSequenceAlignment; fracvar::Real = 0.9)

Perform a classical multidimensional scaling analysis to project the sequences in msa to a space in which pairwise distances approximately reproduce 100 - percentsimilarity(seq1, seq2). The dimensionality is chosen to reconstruction fracvar of the variance.

source
GPCRAnalysis.query_ebi_proteinsMethod
result = query_ebi_proteins(id)

Query the EBI Proteins API for a protein with the specified id, which must be the Uniprot accession code. You can also supply several proteins as a comma-separated list.

result is a JSON3 object with many fields.

source
GPCRAnalysis.query_ncbiMethod
result = query_ncbi(id)

Query the NCBI API for a gene with the specified id, which must be the gene accession code. result is a JSON3 object with many fields.

source
GPCRAnalysis.query_uniprot_accessionMethod
accession_code = query_uniprot_accession(id)

Perform a Uniprot search for id, returning the canonical accession code.

Examples

julia> query_uniprot_accession("T2R38_MOUSE")
"Q7TQA6"
source
GPCRAnalysis.residue_centroidMethod
residue_centroid(r::AbstractResidue)

Compute the mean position of all atoms in r excluding hydrogen.

Residue centers may yield a more reliable measure of "comparable residues" than the α-carbons (see CAmatrix) because they incorporate the orientation of the side chain relative to the overall fold.

See also residue_centroid_matrix.

source
GPCRAnalysis.residueindexMethod
residueindex(sa::StructAlign, idx1, nothing)
residueindex(sa::StructAlign, idx1, nothing, ±1)

Calculate the residue index in structure 2 corresponding to residue idx1 in structure 1. The second form allows you to find the nearest corresponding residue in the forward (+1) or reverse (-1) directions, if idx1 is not among the mapped residues.

source
GPCRAnalysis.residueindexMethod
residueindex(sa::StructAlign, nothing, idx2)
residueindex(sa::StructAlign, nothing, idx2, ±1)

Calculate the residue index in structure 1 corresponding to residue idx2 in structure 2. The second form allows you to find the nearest corresponding residue in the forward (+1) or reverse (-1) directions, if idx2 is not among the mapped residues.

source
GPCRAnalysis.sortperm_msaMethod
tour = sortperm_msa(msa::AbstractMultipleSequenceAlignment)

Order the sequences in msa to minimize the "tour length" visiting each sequence once. The length between sequences is defined at 100 - percentsimilarity(seq1, seq2).

This can be useful for graphical or alignment display by grouping obviously-similar sequences near one another.

source
GPCRAnalysis.try_download_alphafoldFunction
try_download_alphafold(uniprotXname, path=alphafoldfilename(uniprotXname); version=4)

Attempt to download an AlphaFold structure. Returns nothing if no entry corresponding to uniprotXname exists; otherwise it returns path, the pathname of the saved file.

In general, a better approach is to use download_alphafolds for multiple proteins, or query_alphafold_latest combined with Downloads.download for a single protein.

source