GPCRAnalysis

Documentation for GPCRAnalysis.

GPCRAnalysis.AccessionCode
GPCRAnalysis.BWScheme
GPCRAnalysis.NWGapCosts
GPCRAnalysis.SequenceMapping
GPCRAnalysis.StructAlign
GPCRAnalysis.align
GPCRAnalysis.align_closest
GPCRAnalysis.align_nw
GPCRAnalysis.align_nw
GPCRAnalysis.align_ranges
GPCRAnalysis.align_to_axes
GPCRAnalysis.align_to_membrane
GPCRAnalysis.alphacarbon_coordinates
GPCRAnalysis.alphacarbon_coordinates_matrix
GPCRAnalysis.alphafoldfile
GPCRAnalysis.alphafoldfiles
GPCRAnalysis.alphafoldfiles
GPCRAnalysis.chargelocations
GPCRAnalysis.chimerax_script
GPCRAnalysis.columnwise_entropy
GPCRAnalysis.download_alphafolds
GPCRAnalysis.features_from_structure
GPCRAnalysis.filter_long!
GPCRAnalysis.filter_species!
GPCRAnalysis.forcecomponents
GPCRAnalysis.forcedict
GPCRAnalysis.getchain
GPCRAnalysis.inward_ecl_residues
GPCRAnalysis.inward_tm_residues
GPCRAnalysis.lookupbw
GPCRAnalysis.map_closest
GPCRAnalysis.map_uniprot_retrieve
GPCRAnalysis.map_uniprot_status
GPCRAnalysis.map_uniprot_submit
GPCRAnalysis.marker
GPCRAnalysis.optimize_weights
GPCRAnalysis.pLDDTcolor
GPCRAnalysis.project_sequences
GPCRAnalysis.query_alphafold_latest
GPCRAnalysis.query_ebi_proteins
GPCRAnalysis.query_ncbi
GPCRAnalysis.query_uniprot_accession
GPCRAnalysis.residue_centroid
GPCRAnalysis.residue_centroid_matrix
GPCRAnalysis.residueindex
GPCRAnalysis.residueindex
GPCRAnalysis.sortperm_msa
GPCRAnalysis.species
GPCRAnalysis.try_download_alphafold
GPCRAnalysis.uniprotX

GPCRAnalysis.AccessionCode — Method

ac = AccessionCode(msa, seqname)

Return the Uniprot accession code associated with seqname.

source

GPCRAnalysis.BWScheme — Method

BWScheme(conserved_idx, tmspans)

Specify the Ballesteros-Weinstein scheme used for a particular protein. conserved_idx is a list of 7 "most conserved" residues per helix (rhodopsin family: N1, D2, R3, W4, P5, P6, P7) and the span of each helix.

Examples

For mouse rhodopsin (P15409),

julia> opsd_scheme = BWScheme([55, 83, 135, 161, 215, 267, 303],
                              [37:61, 74:96, 111:133, 153:173, 203:224, 253:274, 287:308]);

source

GPCRAnalysis.NWGapCosts — Method

gapcosts = NWGapCosts{T}(; extend1=0, extend2=0, open1=0, open2=0)

Create an affine cost for gaps in Needleman-Wunsch alignment. The cost of a gap of length k is

extend * k + open

All costs must be nonnegative.

gapcosts(ϕ, idxs1, idxs2) computes the contribution of gaps to the cost of alignment ϕ between two sequences with idxs1 = eachindex(seq1) and idxs2 = eachindex(seq2). (The indices are needed to determine whether the alignment starts or ends with a gap.)

source

GPCRAnalysis.SequenceMapping — Method

sm = SequenceMapping([4, 5, 0, ...])
sm = SequenceMapping(seq::AnnotatedAlignedSequence)

A SequenceMapping is a vector of indexes within a full sequence that map to a reference. Specifically, sm[i] is the index of the residue in the full sequence that maps to the i-th position in the reference. 0 is a placeholder for a position in the reference that has no mapping to the full sequence.

Example

SequenceMapping([4, 5, 0, ...]) indicates that:

the first position in the reference maps to the fourth residue in the full sequence,
the second position in the reference maps to the fifth residue in the full sequence, and
the third position in the reference lacks a corresponding residue in the full sequence.

source

GPCRAnalysis.StructAlign — Method

StructAlign(struct1::ChainLike, struct2::ChainLike, filename::AbstractString)

Create a structure-based alignment between struct1 and struct2. filename is the name of the TM-align "results" file (e.g., https://zhanggroup.org//TM-align/example/873772.html).

See also residueindex.

source

GPCRAnalysis.align — Method

tform = align(fixedpos::AbstractMatrix{Float64}, moving::Chain, sm::SequenceMapping)
tform = align(fixed::Chain, moving::Chain, sm::SequenceMapping)

Return a rotated and shifted version of moving so that the centroids of residues moving[sm] have least mean square error deviation from positions fixedpos or those of residues fixed. fixed or fixedpos should include just the residues or positions you want to align to, but moving should be an entire chain.

source

GPCRAnalysis.align_closest — Method

tform = align_closest(mapto::StructLike, mapfrom::StructLike; Dthresh=5)
tform = align_closest(coordsto, coordsfrom; Dthresh=5)

Return the rigid transformation best aligning mapfrom to mapto. The transformation is computed from residues matched by map_closest, using only residues closer than Dthresh.

Because the mapping is determined by distance, this can only "tweak" an already-close alignment.

source

GPCRAnalysis.align_nw — Method

ϕ = align_nw(D, gapcosts::NWGapCosts)

Given a pairwise penalty matrix D (e.g., a pairwise distance matrix) and costs for opening and extending gaps, find the optimal pairings

ϕ = [(i1, j1), (i2, j2), ...]

that minimize

sum(D[ϕk...] for ϕk in ϕ) + gapcosts(ϕ, axes(D)...)

subject to the constraint that all(ϕ[k+1] .> ϕ[k]) for all k.

source

GPCRAnalysis.align_nw — Method

ϕ = align_nw(seq1, seq2, gapcosts::NWGapCosts; mode=:distance_orientation)

Find the optimal ϕ matching seq1[ϕ[k][1]] to seq2[ϕ[k][2]] for all k. mode controls the computation of pairwise matching penalties, and can be either :distance or :distance_orientation, where the latter adds any mismatch in sidechain orientation to the distance penalty.

seq1 and seq2 must be aligned to each other in 3D space before calling this function. See align.

source

GPCRAnalysis.align_ranges — Method

seqtms = align_ranges(seq1, seq2, seq2ranges::AbstractVector{<:AbstractUnitRange})

Transfer refranges, a list of reside index spans in seq2, to seq1. seq1 and seq2 must be spatially aligned, and the assignment is made by minimizing inter-chain distance subject to the constraint of preserving sequence order.

source

GPCRAnalysis.align_to_axes — Method

tform = align_to_axes(strct)

Compute the transformation needed to align the principle axes of inertia of strct with the coordinate axes.

source

GPCRAnalysis.align_to_membrane — Method

tform = align_to_membrane(chain::ChainLike, tms; extracellular=true)

Compute the rigid transformation tform needed to align chain to the membrane, given the transmembrane segments tms as residue-indexes (e.g., [37:61, 74:96, ...]). extracelluar should be true if the N-terminus of the protein is extracellular (chain[first(tms[1])] is at the extracellular face), and false otherwise.

applytransform!(chain, tform) (or the model that includes chain) will re-orient chain so that the center of the membrane is z=0 and extracellular is positive.

The algorithm finds the membrane normal u by maximizing the ratio

sumᵢ (u ⋅ vᵢ)²

sumᵢ (u ⋅ δᵢ)²

where vᵢ is a vector parallel to the ith TM helix, and δᵢ is a within-leaflet atomic displacement.

source

GPCRAnalysis.alphacarbon_coordinates — Method

alphacarbon_coordinates(res::AbstractResidue)

Return the coordinates of the α-carbon in res.

source

GPCRAnalysis.alphacarbon_coordinates_matrix — Method

alphacarbon_coordinates_matrix(seq)

Return a matrix of αC coordinates as columns across all residues. See also alphacarbon_coordinates.

source

GPCRAnalysis.alphafoldfile — Function

fns = alphafoldfile(uniprotXname, dirname=pwd(); join=false)

Return the latest version of the AlphaFold file for uniprotXname in dirname. If join is true, then the full path is returned.

source

GPCRAnalysis.alphafoldfiles — Function

fns = alphafoldfiles(dirname=pwd(); join=false)

Return the latest version of all AlphaFold files in dirname. If join is true, then the full paths are returned.

source

GPCRAnalysis.alphafoldfiles — Function

msacode2structfile = alphafoldfiles(msa::AnnotatedMultipleSequenceAlignment, dirname=pwd())

Return a dictionary mapping MSACodes to the corresponding AlphaFold structure files.

source

GPCRAnalysis.chargelocations — Method

chargelocations(chain::ChainLike; include_his::Bool=false)

Return a list of potential charge locations in the protein structure. Each is a tuple (position, residueindex, AAname). The positions are those of N (in positively-charged residues like Arg & Lys) or O (in negatively-charged residues like Asp and Glu). N- and C-termini are not included in the list. While each residue will carry a net total charge of ±1, the location of each potential charge will be listed (1 for Lys, 2 each for Arg, Asp, and Glu).

By default, histidine is not considered charged, but you can include it by setting include_His=true.

source

GPCRAnalysis.chimerax_script — Method

chimerax_script(scriptfilename, uprot_list, msa::AnnotatedMultipleSequenceAlignment, colidxs;
                dir=pwd(), align=true, chain_transparency=80, styles=Dict{Int,String}(), extras=String[])

Create a chimerax visualization script with name scriptfilename. uprot_list is a list of UniProtX names that you want to visualize. msa is a Multiple Sequence alignment and colidxs specifies the column indices in msa corresponding to amino acid side chains that you'd like to visualize.

Keyword arguments:

dir is the directory with the protein structure files
align determines whether to align the structures to the first one (uses the matchmaker tool)
chain_transparency sets the transparency on the ribbon diagrams (0 = not

transparent)

styles can be used to affect the display, e.g., Dict(k => "@SD sphere")

would cause methionines at column index k to be displayed with the sulfur in sphere mode.

extras can be used to hand-specify a number of additional commands; this can

be useful if, for example, the msa has occasional misalignments.

Examples

Suppose you have the msa for rhodopsin (mouse: P15409), then:

chimerax_script("myscript.cxc", ["P15409"], msa, [i1, i2, i3])

where i1 through i3 are column-indices in the msa that you'd like to view.

source

GPCRAnalysis.columnwise_entropy — Function

columnwise_entropy(msa::AbstractMultipleSequenceAlignment, aacode = ReducedAlphabet("(AILMV)(NQST)(RHK)(DE)(FWY)CGP"))

Compute the entropy of each column in an MSA. Low entropy indicates high conservation.

Unmatched entries ('-' residues) contribute to the entropy calculation as if they were an ordinary residue.

source

GPCRAnalysis.download_alphafolds — Method

download_alphafolds(msa::AbstractMultipleSequenceAlignment; dirname=pwd())
download_alphafolds(ids; dirname=pwd())

Download all available AlphaFold structures for the sequences in msa. Missing entries are silently skipped.

If a AbstractMultipleSequenceAlignment is provided, the downloaded PDB file is checked to ensure that the residues in the MSA sequence match those in the PDB file. If they do not match, the PDB file is removed.

source

GPCRAnalysis.features_from_structure — Function

mgmm = features_from_structure(seq::ChainLike, idxs=1:length(seq); combined=false)

Construct an IsotropicMultiGMM from seq by adding features for each residue in idxs.

combined=true causes all atoms sharing the same feature to be combined, reducing the total number of features in the resulting model. The default creates features for each atom separately.

The σfun and ϕfun keyword arguments are functions that determine the standard deviation and amplitude of each gaussian feature, respectively, and take arguments (atom, residue, feature).

The output mgmm may include the following features:

:Steric (the "hard center", a proxy for the repulsive core of Lennard-Jones potentials)
:Hydrophobe (van der Waals interactions, a proxy for the attractive part of Lennard-Jones potentials)
:Aromatic (the aromatic ring of phenylalanine, tyrosine, and tryptophan)
:PosIonizable (the positively charged nitrogen of histidine, arginine, and lysine; histidine gets a fractional charge of +0.1)
:NegIonizable (the negatively charged oxygen of aspartate and glutamate; each oxygen gets a fractional charge of -0.5)
:Donor (the hydrogen of a hydrogen bond donor)
:Acceptor (the oxygen of a hydrogen bond acceptor)

source

GPCRAnalysis.filter_long! — Method

filter_long!(msa::AbstractMultipleSequenceAlignment, minres::Real)

Remove all sequences from msa with fewer than minres matching residues.

source

GPCRAnalysis.filter_species! — Method

filter_species!(msa::AbstractMultipleSequenceAlignment, speciesname::AbstractString)

Remove all sequences from msa except those with species(sequencename) equal to speciesname.

source

GPCRAnalysis.forcecomponents — Function

forces = forcecomponents(seq, interactions::AbstractVector, residueindexes=eachindex(seq); kwargs...)

Calculate the forces between residues in seq based on the features of each residue and the given interactions, which must be a list of 2-tuples (:field1, :field2) or pairs (:field1, :field2) => coef, where :field1 and :field2 are the names of the features that interact and coef is the coefficient of the force (defaults to 1). The optional keyword arguments kwargs are as described in features_from_structure.

The feature-names can be the ones used in features_from_structure. Each pair of interactions should be listed, but only in one order (symmetry is automatically enforced). Optionally, you can also "bundle" features together: :Ionic is a bundle of :PosIonizable and :NegIonizable, where like charges repel and opposite charges attract with the same magnitude of force. Using (:Ionic,:Ionic) instead of listing all three interactions (:PosIonizable,:PosIonizable, :NegIonizable,:NegIonizable, and :PosIonizable,:NegIonizable) separately ensures that any tuning of ionic forces will satisfy the symmetries of the real world.

Upon return, there is one force-matrix for each residue in seq listed in residueindexes. Each force-matrix is a 3×n matrix where each row corresponds to a force component (x, y, z) and the kth column corresponds to interactions[k].

Examples

julia> seq = getchain("1GZM.pdb")
interactions = [(:Steric, :Steric) => 1,           # repulsive
                (:Hydrophobe, :Hydrophobe) => -1,  # attractive
                (:Donor, :Acceptor) => -1          # attractive
                ]
julia> forces = forcecomponents(seq, interactions)

The output is a list of 3×4 matrices, one for each residue in seq.

See also: optimize_weights, forcedict.

source

GPCRAnalysis.forcedict — Method

interactiondict = forcedict(interactions::AbstractVector, w = ones(length(interactions)))

Create a dictionary of interactions, where interactions[i] should be weighted by w[i].

source

GPCRAnalysis.getchain — Method

getchain(filename::AbstractString; model=1, chain="A")

Read a PDB or mmCIF file filename and extract the specified chain.

source

GPCRAnalysis.inward_ecl_residues — Method

inward_ecl_residues(seq, eclidxs)

Return an array of boolean[] indicating which residues (of those specified by eclidxs) are inward-facing (i.e. downward toward the opening of the binding pocket).

eclidxs is a vector (with each entry corresponding to an extracellular loop) of ranges of residue indices.

source

GPCRAnalysis.inward_tm_residues — Method

inward_tm_residues(seq, tmidxs)

Return an array of boolean[] indicating which residues (of those specified by tmidxs) are inward-facing.

tmidxs is a vector (typically of length 7, with each entry corresponding to a transmembrane region) of ranges of residue indices.

source

GPCRAnalysis.lookupbw — Method

helix, residue_position = lookupbw(idx::Integer, scheme::BWScheme)

Calculate the Ballesteros-Weinstein residue number coresponding to residue idx. tmspans describes the transmembrane regions in the reference, and bwconserved the index of the most-conserved residue.

Examples

For mouse rhodopsin (P15409),

julia> opsd_scheme = BWScheme([55, 83, 135, 161, 215, 267, 303],
           [34:64, 73:99, 107:139, 150:173, 200:229, 246:277, 285:309]);

julia> lookupbw(160, opsd_scheme)
(4, 49)

julia> lookupbw((4, 49), opsd_scheme)
160

This is the residue just before the most-conserved residue of helix 4.

source

GPCRAnalysis.map_closest — Method

mapping = map_closest(mapto::StructureLike, mapfrom::StructureLike)

Return a vector mapping[i] = (j, distij), matching the ith residue in mapto to the jth residue in mapfrom and reporting the distance between them. The mapping minimizes the sum of distances. mapto and mapfrom must already be aligned for this to be meaningful.

If mapfrom is shorter than mapto, some js will be 0, indicating a skipped residue in mapto.

source

GPCRAnalysis.map_uniprot_retrieve — Method

result = GPCRAnalysis.map_uniprot_retrieve(jobID)

Retrieve the results of a Uniprot ID mapping job.

source

GPCRAnalysis.map_uniprot_status — Method

status = GPCRAnalysis.map_uniprot_status(jobID)

Check the status of a Uniprot ID mapping job. Returns true if the results are ready. Otherwise, returns the status object.

source

GPCRAnalysis.map_uniprot_submit — Function

jobID = GPCRAnalysis.map_uniprot_submit(ids, from="UniProtKB_AC-ID", to="UniProtKB")

Submit a list of ids to the Uniprot ID mapping service, to convert from ID convention from to to. The jobID can be used to check the status (map_uniprot_status) and retrieve the results (map_uniprot_retrieve).

Examples

jldoctest` julia> jobID = GPCRAnalysis.map_uniprot_submit(["ENSMUSG00000067064", "ENSMUSG00000057464"], "Ensembl");

source

GPCRAnalysis.marker — Method

str = marker(modelnum, pos, radius, color)

Return a string that represents a marker in ChimeraX. modelnum is the model number, pos is a 3D position, radius is the radius of the marker, and color is the color of the marker.

source

GPCRAnalysis.optimize_weights — Method

w = optimize_weights(forces)

Tune the weights w to approximately satisfy force balance, i.e., solve

min w.r.t. w  sum(sum(abs2, f*w) for f in forces)
subject to    sum(w) == 1, w .>= 0

This is based on the notion that the protein structure is presumably at an energy minimum.

This seems to work best for ubiquitous interactions, like :Steric, :Hydrophobe, and hydrogen-bonding. Rarer interactions (:Ionic, :Aromatic) may need to be tuned via different principles.

Note

This function requires that you manually load JuMP and HiGHS, e.g., using JuMP, HiGHS.

source

GPCRAnalysis.pLDDTcolor — Method

pLDDTcolor(r::Residue)
pLDDTcolor(score::Real)

Return the color corresponding to the pLDDT score (a measure of confidence) of a residue.

source

GPCRAnalysis.project_sequences — Method

X = project_sequences(msa::AbstractMultipleSequenceAlignment; fracvar::Real = 0.9)

Perform a classical multidimensional scaling analysis to project the sequences in msa to a space in which pairwise distances approximately reproduce 100 - percentsimilarity(seq1, seq2). The dimensionality is chosen to reconstruction fracvar of the variance.

source

GPCRAnalysis.query_alphafold_latest — Method

url = query_alphafold_latest(uniprotXname)

Query the AlphaFold API for the latest structure of uniprotXname.

source

GPCRAnalysis.query_ebi_proteins — Method

result = query_ebi_proteins(id)

Query the EBI Proteins API for a protein with the specified id, which must be the Uniprot accession code. You can also supply several proteins as a comma-separated list.

result is a JSON3 object with many fields.

source

GPCRAnalysis.query_ncbi — Method

result = query_ncbi(id)

Query the NCBI API for a gene with the specified id, which must be the gene accession code. result is a JSON3 object with many fields.

source

GPCRAnalysis.query_uniprot_accession — Method

accession_code = query_uniprot_accession(id)

Perform a Uniprot search for id, returning the canonical accession code.

Examples

julia> query_uniprot_accession("T2R38_MOUSE")
"Q7TQA6"

source

GPCRAnalysis.residue_centroid — Method

residue_centroid(r::AbstractResidue)

Compute the mean position of all atoms in r excluding hydrogen.

Residue centers may yield a more reliable measure of "comparable residues" than the α-carbons (see CAmatrix) because they incorporate the orientation of the side chain relative to the overall fold.