## Installation 0. make sure `conda` works properly 1. clone the latest version (e.g. `v0.2`) by ```bash git clone https://github.com/caer200/ocelot_api.git ``` 2. this yields a folder called `ocelot_api`, create a new venv with ```bash conda env create -f venv/environment.yaml ``` or ```bash conda create --name ocelot venv/spec-file.txt ``` 3. run `python setup.py install` there ## schema A `MolGraph` is a graph consists of integer nodes (nodename) and edges connecting them. For each node, there is an string attribute denotes the element. A `MolGraph` can be partitioned into a set of `FragmentGraph`. A `FragmentGraph` contains the information of `joints` at which fragmentation happened. The `FragmentGraph` is used to represent different functional fragments commonly seen in functional organic molecules. One level above the `MolGrap` is the molecualr graph that contains details of bonds and basic information about the electronic system. This is the "molecule" drawn by chemists and can be nicely described by the molecule class in `rdkit`. It should be noticed that converting `MolGraph` to `rdkit.mol` is not trivial. We use the method from [xyz2mol](https://github.com/jensengroup/xyz2mol) by [Jensen Group](https://github.com/jensengroup). With conformational information, a `MolGraph`/`FragmentGraph` becomes a `MolConformer`/`FragConformer` that can be uniquely defined by the Cartesian coordinates (xyz) of its atoms. Basically, they are `pymatgen.Molecule` except for each `site`, there is a property `siteid` that can be mapped to the nodes in `MoleGraph`. Adding periodicity to a `MolConformer` yields a `Config`, which is just a `pymatgen.strcuture` with no disordered sites. Disorder should be represented by a set of weighted `Config`, as this API is used primarily for handling organic molecular crystals in which the number of possible configurations is limited by molecular structure. ## Disorder Let's say you have a CIF file as `x17059.cif`. It looks like this in `Jmol`. ![tipgebw][tipgebw_jmol] [tipgebw_jmol]: ./tipgebw.png The problem here is the disordered sites at TIPGe groups, if you look into the cif file you will find 2 disorder groups. To extract no-disorder `Config` from the cif file, using ```python from ocelot.routines.disparser import DisParser ciffile = 'x17059.cif' dp = DisParser.from_ciffile(ciffile) dp.to_configs(write_files=True) # writes conf_x.cif ``` Here the `to_config` method will write *all* possible configurations of a unit cell, the disorder is treated to "max-entropy". That is, there will be no correlation between disordered sites sit at different asymmetric units (even the same asymmetric unit, if they are far away from each other). There are a lot of limitations come with this method, primarily from various notations used in cif file generation. For more info see [disorder_test](../tests/disorder_test) ## Bone Config A challenging task in analyzing molecular crystal structure is to classify the packing pattern of molecular "backbones". This can be done if we have a "clean" (no disorder, no fractional occupancies e.g. from solvent) configuration. The idea is first to strip the side groups: ```python from ocelot.schema.configuration import Config config = Config.from_file('conf_1.cif') bc, boneonly_pstructure, terminated_backbone_hmols = config.get_bone_config() boneonly_pstructure.to('cif', 'boneonly.cif') # a backbone only configuration ``` which would gives you a backbone only configuration ![tipgebw][tipgebw_bone] [tipgebw_bone]: ./boneonly.png ...and the bone-only configuration `bc` can be used as input for an identifier ```python from ocelot.task.pkid import PackingIdentifier pid = PackingIdentifier(bc) packingd = pid.identify_heuristic() print(packingd[0]['packing']) # brickwork ``` ## Backbone and Sidechain Usually, we work with the organic molecular having a conjugate backbone and a set of side groups. This allows us to partition the molecule, either based on their chemical structure (`MolGraph`) or their conformation (`Conformer`). Take rubrene as an example, if you do not have conformational information you can start with *SMILES* ```python from rdkit.Chem import MolFromSmiles from ocelot.schema.graph import MolGraph smiles = 'c1ccc(cc1)c7c2ccccc2c(c3ccccc3)c8c(c4ccccc4)c5ccccc5c(c6ccccc6)c78' rdmol = MolFromSmiles(smiles) mg = MolGraph.from_rdmol(rdmol) backbone, sidegroups = mg.partition_to_bone_frags('lgfr') print(backbone) for sg in sidegroups: print(sg) # BackboneGraph:; 6 C; 7 C; 8 C; 9 C; 10 C; 11 C; 12 C; 13 C; 20 C; 21 C; 28 C; 29 C; 30 C; 31 C; 32 C; 33 C; 34 C; 41 C # SidechainGraph:; 0 C; 1 C; 2 C; 3 C; 4 C; 5 C # SidechainGraph:; 14 C; 15 C; 16 C; 17 C; 18 C; 19 C # SidechainGraph:; 22 C; 23 C; 24 C; 25 C; 26 C; 27 C # SidechainGraph:; 35 C; 36 C; 37 C; 38 C; 39 C; 40 C ``` Here we first create an `rdmol` from smiles, then convert it to a `MolGraph` and partition it into a `BackboneGraph` and a list of `SidechainGraph`. The `lgfr` in partition method means "extract backbone based on the largest fused ring present in the molecule". For other schemes of extracting backbone, see API doc for `MolGraph`. You can also start from a xyz file of rubrene: ```python from ocelot.schema.conformer import MolConformer mc = MolConformer.from_file('rub.xyz') bone_conformer, sccs, bg, scgs = mc.partition(coplane_cutoff=20) print(bone_conformer) for sc in sccs: print(sc) ``` This would give the fragment conformers of this molecule, you can use `coplane_cutoff` to control whether you want to include the phenol rings into the backbone.