ocelot.schema package¶
Submodules¶
ocelot.schema.configuration module¶
-
class
ocelot.schema.configuration.
Config
(molconformers, unwrap_clean_pstructure: pymatgen.core.structure.Structure, occu=1.0)[source]¶ Bases:
object
-
__init__
(molconformers, unwrap_clean_pstructure: pymatgen.core.structure.Structure, occu=1.0)[source]¶ - Parameters
unwrap_clean_pstructure – Structure without disorder
-
classmethod
from_labeled_clean_pstructure
(pstructure: pymatgen.core.structure.Structure, occu=1.0)[source]¶
-
classmethod
from_pstructure
(pstructure: pymatgen.core.structure.Structure, occu=1.0, assign_siteids=False)[source]¶
-
get_bone_config
()[source]¶ - Returns
a configuration that has only terminated backbones a pmg structure that has only terminated backbones a list of pmg molecules
-
get_dimers_array
(maxfold=2, fast=False, symm=False)[source]¶ - Parameters
fast –
symm –
maxfold – translation vector in fc can be [h, h, h] where maxfold <= h <= maxfold
- Returns
dimers_array, z x z x n array, dimers[i][j][k] is the dimer of omols[i], omols[j] with translation vector as transv_fcs[k] transv_fcs
-
molconformers
: [MolConformer] = None¶
-
ocelot.schema.conformer module¶
-
class
ocelot.schema.conformer.
BasicConformer
(sites, siteids=False)[source]¶ Bases:
ocelot.schema.conformer.SiteidOperation
-
__init__
(sites, siteids=False)[source]¶ - Parameters
sites –
siteids – a list of siteids, siteids[index]=siteid of self[index]
-
property
atomic_numbers
¶ List of atomic numbers.
-
property
bondmat
¶
-
property
can_rdmol
¶
-
property
cart_coords
¶
-
property
composition
¶
-
property
distmat
¶
-
classmethod
from_siteids
(siteids, sites, copy=False)[source]¶ build up a ring based on a list of siteid and a list of sites
the sites should have been assigned siteids
-
property
geoc
¶ geometric center
- Returns
3x1 np array
-
static
get_bondmat
(sites, distmat, co=1.3)[source]¶ Bij = whether there is a bond between si and sj, i is NOT bonded with itself
if site is not a legit atom (e.g. bq in nics), it cannot have any bond
- Parameters
sites –
distmat –
co – coefficient for cutoff, default 1.3, based on covalent rad
- Returns
bool matrix
-
get_closest_sites
(other)[source]¶ other_sites – self_border_site — distmin — other_border_site – other_sites
this can be used to see if a dimer is interesting or not
- Parameters
other (AtomList) –
- Returns
i, j, distmin
conformer[i] is self_border_site, conformer[j] is other_border_site,
-
static
get_distmat
(coordmat)[source]¶ distanct matrix
- Parameters
coordmat – coordinates matrix nx3
- Returns
distmat[i][j] is the euclid distance between coordmat[i] and coordmat[j]
-
static
get_nbrmap
(bmat)[source]¶ - Parameters
bmat (np.ndarray) – bool bond matrix, i is not bonded to itself
- Returns
nbmap[i] is the index list of i’s neighbors
-
get_site_by_coords
(coords, tol=1e-05)[source]¶ will only return one site
- Parameters
coords –
tol –
- Returns
-
property
index2siteid
¶
-
intersection
(other, copy=False)[source]¶ - Parameters
other –
copy (bool) – whether generate a list of deepcopied sites or not
- Returns
a list of sites in conformer that belong to both
-
property
nbrmap
¶
-
orient
(pqo, origin='geoc')[source]¶ p, q, o are 3 orthonormal vectors in x, y, z basis
basis transformation from xyz to pqo, notice sites are deepcopied
- Parameters
pqo – 3x3 array
origin – default geoc, otherwise a 3d coord
-
property
pmgmol
¶
-
rotate_along
(theta, end1, end2, unit='degree')[source]¶ for each site, rotate the vector defined by (site.coords - end1) along (end2 - end1) and add this vector to site.coords
end1 - end2 | site
notice the coords are changed in-place
- Parameters
theta – angle of rotation
end1 – 3x1 float list/array
end2 – 3x1 float list/array
unit (str) – degree/radian
-
static
rotate_along_de_matrix
(theta, end1, end2, unit='degree')[source]¶ rotation matrix
rotate_along()
-
rotate_along_with_matrix
(matrix, end1)[source]¶ a quicker version rotate_along if we konw the rotation matrix, end2 is included in the matrix
-
property
siteid2index
¶
-
property
symbols
¶
-
to_graph
(nodename='siteid', graphtype='MolGraph', partition_scheme=None, joints: dict = None)[source]¶ get a Graph, default siteid –> nodename
otherwise nodename is assigned based on the order in self.sites
-
to_rdmol
(charge=0, sani=True, charged_fragments=None, force_single=False, expliciths=True)[source]¶ generate a rdmol obj with current conformer
siteid –dict–> atomidx in rdmol == index in conformer
- Parameters
charge –
sani –
charged_fragments –
force_single –
expliciths –
- Returns
-
property
volume_slow
¶ http://wiki.bkslab.org/index.php/Calculate_volume_of_the_binding_site_and_molecules
First, Lay a grid over the spheres.
Count the number or points contained in the spheres (Ns).
Count the number of points in the grid box (Ng).
Calculate the volume of the grid box (Vb).
don’t use this as it’s slow…
- Returns
volume in A^3
-
-
class
ocelot.schema.conformer.
BondConformer
(sites, siteids=False)[source]¶ Bases:
ocelot.schema.conformer.BasicConformer
-
__init__
(sites, siteids=False)[source]¶ - Parameters
sites –
siteids – a list of siteids, siteids[index]=siteid of self[index]
-
property
length
¶
-
-
class
ocelot.schema.conformer.
BoneConformer
(sites, siteids=False, conformer_properties=None, graph: ocelot.schema.graph.BackboneGraph = None)[source]¶ Bases:
ocelot.schema.conformer.FragConformer
-
__init__
(sites, siteids=False, conformer_properties=None, graph: ocelot.schema.graph.BackboneGraph = None)[source]¶ - Parameters
sites –
siteids – a list of siteids, siteids[index]=siteid of self[index]
-
classmethod
from_sites
(sites, graph=None, siteids=False, conformer_properties=None)[source]¶ we use this as the parent constructor
-
property
lp
¶ maxdiff( (s.coord - ref) proj at vp )
- Returns
short axis length
-
property
lq
¶ maxdiff( (s.coord - ref) proj at vq )
- Returns
long axis length
-
-
class
ocelot.schema.conformer.
ConformerDimer
(conformer_ref: ocelot.schema.conformer.MolConformer, conformer_var: ocelot.schema.conformer.MolConformer, label='')[source]¶ Bases:
object
-
__init__
(conformer_ref: ocelot.schema.conformer.MolConformer, conformer_var: ocelot.schema.conformer.MolConformer, label='')[source]¶ basically 2 conformers
- Parameters
conformer_ref –
conformer_var –
label (str) – mainly used to distinguish
- Attributes:
vslip: slip vector in cart vslipnorm: normalized vslip pslip: projection of vslip along vp qslip: projection of vslip along vq oslip: projection of vslip along vo pangle: angle btw vps qangle: angle btw vps oangle: angle btw vps, always acute jmol: jmol draw arrow string in console
-
as_dict
()[source]¶ keys are
vslipnorm, pslip, qslip, oslip, pangle, qangle, oangle, jmol, label, omol_ref, omol_var
-
bone_overlap
(algo='concave')[source]¶ project var backbone onto the plane of ref backbone as there’s the problem of alpha value in concave hull generation, maybe I should set default to convex, see hull_test
- Parameters
algo – concave/convex
- Returns
area of the overlap, ref omol area, var omol area
-
property
is_bone_close
¶ use to identify whether this dimer can have minimum wf overlap, ONLY consider bone distance
this should be called is_not_faraway…
- Parameters
cutoff – minbonedist less than which will be considered close
- Returns
bool
-
property
is_close
¶ use to identify whether this dimer can have minimum wf overlap
this should be called is_not_faraway…
- Parameters
cutoff –
- Returns
bool
-
property
is_identical
¶ whether two omols are identical, based on norm(vslip) < 1e-5
-
property
is_not_identical
¶
-
property
minbonedist
¶ - Returns
minimum dist between sites on different bones
-
property
mindist
¶ - Returns
minimum dist between sites on different bones
-
mol_overlap
(algo='concave')[source]¶ project var mol onto the plane of ref mol
- Parameters
algo – concave/convex
- Returns
area of the overlap, ref omol area, var omol area
-
plt_bone_overlap
(algo='convex', output='bone_overlap.eps')[source]¶ plot a 2d graph of how backbones overlap
using concave or convex hull
- Parameters
algo – concave/convex
output – output filename
-
property
sites
¶
-
-
class
ocelot.schema.conformer.
DimerCollection
(dimers: [<class 'ocelot.schema.conformer.ConformerDimer'>])[source]¶ Bases:
object
-
class
ocelot.schema.conformer.
FragConformer
(sites, siteids=False, conformer_properties=None, graph: Union[ocelot.schema.graph.FragmentGraph, ocelot.schema.graph.BackboneGraph, ocelot.schema.graph.SidechainGraph] = None)[source]¶ Bases:
ocelot.schema.conformer.BasicConformer
-
__init__
(sites, siteids=False, conformer_properties=None, graph: Union[ocelot.schema.graph.FragmentGraph, ocelot.schema.graph.BackboneGraph, ocelot.schema.graph.SidechainGraph] = None)[source]¶ - Parameters
sites –
siteids – a list of siteids, siteids[index]=siteid of self[index]
-
classmethod
from_pmgmol
(m: pymatgen.core.structure.Molecule, siteids=False, conformer_properties=None, graph=None)[source]¶
-
classmethod
from_siteids
(siteids, sites, graph=None, copy=False, conformer_properties=None)[source]¶ build up a ring based on a list of siteid and a list of sites
the sites should have been assigned siteids
-
-
class
ocelot.schema.conformer.
MolConformer
(sites, siteids=False, prop=None, graph: ocelot.schema.graph.MolGraph = None, rdmol: rdkit.Chem.rdchem.Mol = None, smiles: str = None, siteid2atomidx: dict = None, atomidx2siteid: dict = None, backbone: ocelot.schema.conformer.BoneConformer = None, sccs: [<class 'ocelot.schema.conformer.SidechainConformer'>] = None, backbone_graph: ocelot.schema.graph.BackboneGraph = None, scgs: [<class 'ocelot.schema.graph.SidechainGraph'>] = None, coplane_cutoff=30.0, chrombone: ocelot.schema.conformer.FragConformer = None, chromsccs: [<class 'ocelot.schema.conformer.FragConformer'>] = None, chrombone_graph: ocelot.schema.graph.FragmentGraph = None, chromscgs: [<class 'ocelot.schema.graph.FragmentGraph'>] = None)[source]¶ Bases:
ocelot.schema.conformer.BasicConformer
-
__init__
(sites, siteids=False, prop=None, graph: ocelot.schema.graph.MolGraph = None, rdmol: rdkit.Chem.rdchem.Mol = None, smiles: str = None, siteid2atomidx: dict = None, atomidx2siteid: dict = None, backbone: ocelot.schema.conformer.BoneConformer = None, sccs: [<class 'ocelot.schema.conformer.SidechainConformer'>] = None, backbone_graph: ocelot.schema.graph.BackboneGraph = None, scgs: [<class 'ocelot.schema.graph.SidechainGraph'>] = None, coplane_cutoff=30.0, chrombone: ocelot.schema.conformer.FragConformer = None, chromsccs: [<class 'ocelot.schema.conformer.FragConformer'>] = None, chrombone_graph: ocelot.schema.graph.FragmentGraph = None, chromscgs: [<class 'ocelot.schema.graph.FragmentGraph'>] = None)[source]¶ - Parameters
sites –
siteids – a list of siteids, siteids[index]=siteid of self[index]
-
static
chrom_partition
(mc: ocelot.schema.conformer.BasicConformer, rdmol, atomidx2siteid, molgraph: ocelot.schema.graph.MolGraph, withhalogen=True)[source]¶
-
classmethod
from_sites
(sites, siteids=False, prop=None, coplane_cutoff=30.0, withhalogen=True)[source]¶ we use this as the parent constructor
-
static
geo_partition
(bc: ocelot.schema.conformer.BasicConformer, molgraph: ocelot.schema.graph.MolGraph, coplane_cutoff=30.0)[source]¶
-
rings
: List[RingConformer] = None¶
-
-
class
ocelot.schema.conformer.
RingConformer
(sites, siteids=False)[source]¶ Bases:
ocelot.schema.conformer.BasicConformer
-
__init__
(sites, siteids=False)[source]¶ - Parameters
sites –
siteids – a list of siteids, siteids[index]=siteid of self[index]
-
property
bonds_in_ring
¶ bond objects can be extracted from this ring e.g. for benzene the C-H bonds are NOT here
- Returns
a list of bonds
-
iscoplane_with
(other, tol=20.0, tolunit='degree')[source]¶ whether two rings are on the same plane with tol
- Parameters
other (Ring) –
tol – degree default 20
tolunit – degree/radian
- Returns
bool
-
iscoplane_with_norm
(v2, tol=20.0, tolunit='degree')[source]¶ whether two rings are on the same plane with tol
- Parameters
v2 –
tol – degree default 20
tolunit – degree/radian
- Returns
bool
-
normal_along
(refnormal, tol=45.0)[source]¶ get the n1 or n2 that is along the direction of refnormal within a certain tol this is useful to identify 2 plane normals for a partially-bent structure
- Parameters
refnormal – 3x1 array
tol (float) – default 45 in degree
- Returns
None if no normal along refnormal found
-
-
class
ocelot.schema.conformer.
SidechainConformer
(sites, siteids=False, conformer_properties=None, graph: ocelot.schema.graph.SidechainGraph = None)[source]¶ Bases:
ocelot.schema.conformer.FragConformer
this can be considered as a special backbone with just one joint
-
__init__
(sites, siteids=False, conformer_properties=None, graph: ocelot.schema.graph.SidechainGraph = None)[source]¶ - Parameters
sites –
siteids – a list of siteids, siteids[index]=siteid of self[index]
-
-
class
ocelot.schema.conformer.
SiteidOperation
(sites: [<class 'pymatgen.core.sites.Site'>])[source]¶ Bases:
object
-
__init__
(sites: [<class 'pymatgen.core.sites.Site'>])[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
assign_siteid
(siteids)[source]¶ - Parameters
siteids – default None means siteid = index, otherwise siteid = siteids[i]
- Returns
-
property
sdict
¶
-
property
siteids
¶
-
property
status
¶ check status based on conformer.siteids
- Returns
-
-
ocelot.schema.conformer.
conformer_addh
(c: ocelot.schema.conformer.BasicConformer, joints=None, original: ocelot.schema.conformer.BasicConformer = None)[source]¶ all sites will be deep copied
if we know nothing about the joints, we have to find the under-coordinated sites, then terminate them based on # of valence electrons
if we know the joints as a list or set of siteids but not the original conformer, we just terminate them based on # of valence electrons
if we know the joints as a dict but not the original conformer, we terminate them based on len(joints[siteid])
if we know the joints and the orginal conformer, we can terminate them based on # of bonds broken during fragmenting, in this case joints is a dict
- Parameters
c –
joints –
original –
- Returns
d[siteid_of_the_joint] = a list of hydrogen sites
-
ocelot.schema.conformer.
conformer_addhmol
(c: ocelot.schema.conformer.BasicConformer, joints=None, original: ocelot.schema.conformer.BasicConformer = None)[source]¶
ocelot.schema.graph module¶
-
class
ocelot.schema.graph.
BasicGraph
(graph: networkx.classes.graph.Graph)[source]¶ Bases:
object
-
__init__
(graph: networkx.classes.graph.Graph)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
draw
(output: str, dpi=600)[source]¶ draw graph, using jmol color scheme for elements
- Parameters
output –
dpi –
- Returns
-
classmethod
from_rdmol
(rdmol: rdkit.Chem.rdchem.Mol, atomidx2nodename=None)[source]¶ - Parameters
rdmol –
atomidx2nodename – the dict to convert atomidx in rdmol to graph node, atomidx2nodename[rdmolid] == nodename
if not None then nodename will be set just based on atomidx :return:
-
graph_similarity
(other)[source]¶ calculate graph similarity (GED) between two MolGraphs
- Parameters
other –
- Returns
-
property
props_for_hash
¶ see https://stackoverflow.com/questions/46999771/ use with caution…
-
property
symbols
¶ a dict s.t. symbols[node] gives symbol
-
to_rdmol
(sani=True, charge=0, charged_fragments=None, force_single=False, expliciths=True)[source]¶ convert to rdmol, it will first try to convert to a radical, if failed b/c of rdkit valence rules, it will try to convert to charged fragment
- Parameters
sani – switch to call default sanitizer of rdkit
charge (int) – molecule charge
charged_fragments (bool) – switch to assign atomic charge
force_single (bool) – switch to force all single bonds in the resulting molecule
expliciths (bool) – add hydrogen explicitly
- Returns
a rdmol object, canonical smiles, atomidx2nodename, nodename2atomidx
-
-
class
ocelot.schema.graph.
FragmentGraph
(graph: networkx.classes.graph.Graph, joints: dict, partition_scheme: str)[source]¶ Bases:
ocelot.schema.graph.BasicGraph
-
__init__
(graph: networkx.classes.graph.Graph, joints: dict, partition_scheme: str)[source]¶ fragment is a BasicGraph with joints
I feel it’s not necessary to add parent BasicGraph as an attribute, but I maybe wrong
- Parameters
graph –
joints – a dict, keys are nodes in the subgraph that have parent graph edges not in this subgraph
-
-
class
ocelot.schema.graph.
MolGraph
(graph)[source]¶ Bases:
ocelot.schema.graph.BasicGraph
-
classmethod
from_basicgraph
(basicgraph: ocelot.schema.graph.BasicGraph)[source]¶
-
static
get_bone_and_frags_from_nxgraph
(bone_graph: networkx.classes.graph.Graph, fragments: [<class 'networkx.classes.graph.Graph'>], scheme)[source]¶
-
static
get_joints_and_subgraph
(subg1_nodes: [<class 'int'>], g: networkx.classes.graph.Graph)[source]¶ we assume subg1 cap subg2 == empty and subg1 + subg2 == g
-
get_rings
(k='nconnect', threshold=1)[source]¶ get connected rings between which # of edges is larger than a threshold
- Parameters
k –
threshold –
- Returns
-
partition
(bone_selection='lgfr', additional_ring_criteria=None, with_halogen=True)[source]¶ parition the molecule into a backbone graph and a list of fragments (graphs)
- Parameters
with_halogen –
bone_selection –
additional_ring_criteria – this should be a function to check whether a ring (a set of siteids) meets additional conditions # this does nothing for chrom scheme
- Returns
-
classmethod
-
class
ocelot.schema.graph.
SidechainGraph
(graph, joints, partition_scheme)[source]¶ Bases:
ocelot.schema.graph.FragmentGraph
-
__init__
(graph, joints, partition_scheme)[source]¶ it is not possible for a side chain to have two frag-joints connected to one bone-joint, as this should be already in backbone
- side chain (frags) cannot have 2 side joints, each connected to one bone-joint as this would be a ring fused
to backbone (side chain is a connected component)
- Variables
self.rankmap – this gives the rank for each node, rank is the shortest path length from the node to sc_joint
-
ocelot.schema.rdfunc module¶
-
class
ocelot.schema.rdfunc.
RdFunc
[source]¶ Bases:
object
-
static
conf2xyz
(conf: rdkit.Chem.rdchem.Conformer, outputname, atom_list: list, comment_line='')[source]¶
-
static
fp_similarity
(m1, m2, metric='Tanimoto')[source]¶ use RDK fingerprint similarity based on different metrics
TODO add args to customize RDKfp, see https://www.rdkit.org/docs/source/rdkit.Chem.rdmolops.html#rdkit.Chem.rdmolops.RDKFingerprint
see Landrum2012 for more details
- Parameters
m1 –
m2 –
metric (str) – “Tanimoto”, “Dice”, “Cosine”, “Sokal”, “Russel”, “RogotGoldberg”, “AllBit”, “Kulczynski”, “McConnaughey”, “Asymmetric”, “BraunBlanquet”,
- Returns
-
static
from_string
(fmt, string)[source]¶ construct a rdmol
- Parameters
fmt – ‘smiles’, ‘hsmiles’, ‘smarts’, ‘inchi’
string –
- Returns
-
static
get_3ddescriptors
(mol)[source]¶ using methods from rdkit
https://www.rdkit.org/docs/source/rdkit.Chem.Descriptors3D.html
Asphericity: 0 corresponds to spherical top molecules and 1 to linear molecules, For prolate (cigar-sh aped) molecules, ~ 0.25, whereas for oblate (disk-shaped) molecules, ~ 1.
Eccentricity: 0 corresponds to spherical top molecules and 1 to linear molecules.
RadiusOfGyration: a size descriptor based on the distribution of atomic masses in a molecule, a measure of molecular compactness for long-chain molecules and, specifically, small values are obtained when most o f the atoms are close to the center of mass
SpherocityIndex: spherosity index varies from zero for flat molecules, such as benzene, to unity for totally spherical molecules
- Key
Asphericity, Eccentricity, InertialShapeFactor, NPR1, NPR2, PMI1, PMI2, PMI3, RadiusOfGyration, SpherocityIndex
- Returns
dict
-
static
mol2xyz_by_confid
(molecule: rdkit.Chem.rdchem.Mol, prefix='rdmol', confid=0, comment_line='')[source]¶
-
static
Module contents¶
this contains the schema for dealing with conjugate organics, molecules and beyond
- Graph:
atom index: nodename
- Molecule:
RdMol format: smiles, smarts, inchi atom index: atomidx
- Conformer:
Dimer –> OrganicMolecule, Backbone, SideGroup, Ring –> BasicConformer format: xyz atom index: siteid
- Configuration:
SuperCell –> UnitCell –> AsymmUnit –> Configuration format: cif, poscar atom index: siteid
graph <–> molecule <–> conformer <–> configuration
#TODO Granular Schema