7.3. RDKit topology parser¶
Converts an RDKit rdkit.Chem.rdchem.Mol
into a MDAnalysis.core.Topology
.
See also
7.3.1. Classes¶
-
class
MDAnalysis.topology.RDKitParser.
RDKitParser
(filename)[source]¶ For RDKit structures
- Creates the following Attributes:
- Atomids
- Atomnames
- Aromaticities
- Elements
- Masses
- Bonds
- Resids
- Resnums
- Segids
- Guesses the following:
- Atomtypes
- Depending on RDKit’s input, the following Attributes might be present:
- Charges
- Resnames
- AltLocs
- ChainIDs
- ICodes
- Occupancies
- Tempfactors
Attributes table:
RDKit attribute MDAnalysis equivalent atom.GetMonomerInfo().GetAltLoc() altLocs atom.GetIsAromatic() aromaticities atom.GetMonomerInfo().GetChainId() chainIDs atom.GetDoubleProp(‘_GasteigerCharge’) atom.GetDoubleProp(‘_TriposPartialCharge’) charges atom.GetSymbol() elements atom.GetMonomerInfo().GetInsertionCode() icodes atom.GetIdx() indices atom.GetMass() masses atom.GetMonomerInfo().GetName() atom.GetProp(‘_TriposAtomName’) names atom.GetMonomerInfo().GetOccupancy() occupancies atom.GetMonomerInfo().GetResidueName() resnames atom.GetMonomerInfo().GetResidueNumber() resnums atom.GetMonomerInfo().GetTempFactor() tempfactors atom.GetProp(‘_TriposAtomType’) types New in version 2.0.0.
-
close
()¶ Close the trajectory file.
-
convert_forces_from_native
(force, inplace=True)¶ Conversion of forces array force from native to base units
Parameters: - force (array_like) – Forces to transform
- inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input force is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
New in version 0.7.7.
-
convert_forces_to_native
(force, inplace=True)¶ Conversion of force array force from base to native units.
Parameters: - force (array_like) – Forces to transform
- inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input force is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
New in version 0.7.7.
-
convert_pos_from_native
(x, inplace=True)¶ Conversion of coordinate array x from native units to base units.
Parameters: - x (array_like) – Positions to transform
- inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input x is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
Changed in version 0.7.5: Keyword inplace can be set to
False
so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.
-
convert_pos_to_native
(x, inplace=True)¶ Conversion of coordinate array x from base units to native units.
Parameters: - x (array_like) – Positions to transform
- inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input x is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
Changed in version 0.7.5: Keyword inplace can be set to
False
so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.
-
convert_time_from_native
(t, inplace=True)¶ Convert time t from native units to base units.
Parameters: - t (array_like) – Time values to transform
- inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input t is modified in place and also returned (although note that scalar values t are passed by value in Python and hence an in-place modification has no effect on the caller.) In-place operations improve performance because allocating new arrays is avoided.
Changed in version 0.7.5: Keyword inplace can be set to
False
so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.
-
convert_time_to_native
(t, inplace=True)¶ Convert time t from base units to native units.
Parameters: - t (array_like) – Time values to transform
- inplace (bool, optional) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input t is modified in place and also returned. (Also note that scalar values t are passed by value in Python and hence an in-place modification has no effect on the caller.)
Changed in version 0.7.5: Keyword inplace can be set to
False
so that a modified copy is returned unless no conversion takes place, in which case the reference to the unmodified x is returned.
-
convert_velocities_from_native
(v, inplace=True)¶ Conversion of velocities array v from native to base units
Parameters: - v (array_like) – Velocities to transform
- inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input v is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
New in version 0.7.5.
-
convert_velocities_to_native
(v, inplace=True)¶ Conversion of coordinate array v from base to native units
Parameters: - v (array_like) – Velocities to transform
- inplace (bool (optional)) – Whether to modify the array inplace, overwriting previous data
Note
By default, the input v is modified in place and also returned. In-place operations improve performance because allocating new arrays is avoided.
New in version 0.7.5.
7.4. RDKit molecule I/O — MDAnalysis.coordinates.RDKit
¶
Read coordinates data from an RDKit rdkit.Chem.rdchem.Mol
with
RDKitReader
into an MDAnalysis Universe. Convert it back to a
rdkit.Chem.rdchem.Mol
with RDKitConverter
.
Example
To read an RDKit molecule and then convert the AtomGroup back to an RDKit molecule:
>>> from rdkit import Chem
>>> import MDAnalysis as mda
>>> mol = Chem.MolFromMol2File("docking_poses.mol2", removeHs=False)
>>> u = mda.Universe(mol)
>>> u
<Universe with 42 atoms>
>>> u.trajectory
<RDKitReader with 10 frames of 42 atoms>
>>> u.atoms.convert_to("RDKIT")
<rdkit.Chem.rdchem.Mol object at 0x7fcebb958148>
7.4.1. Classes¶
-
class
MDAnalysis.coordinates.RDKit.
RDKitReader
(filename, **kwargs)[source]¶ Coordinate reader for RDKit.
New in version 2.0.0.
Read coordinates from an RDKit molecule. Each conformer in the original RDKit molecule will be read as a frame in the resulting universe.
Parameters: filename (rdkit.Chem.rdchem.Mol) – RDKit molecule
-
class
MDAnalysis.coordinates.RDKit.
RDKitConverter
[source]¶ Convert MDAnalysis
AtomGroup
orUniverse
to RDKitMol
MDanalysis attributes are stored in each RDKit
Atom
of the resulting molecule in two different ways:- in an
AtomPDBResidueInfo
object available through theGetMonomerInfo()
method if it’s an attribute that is typically found in a PDB file, - directly as an atom property available through the
GetProp()
methods for the others.
Supported attributes:
MDAnalysis attribute RDKit altLocs atom.GetMonomerInfo().GetAltLoc() chainIDs atom.GetMonomerInfo().GetChainId() icodes atom.GetMonomerInfo().GetInsertionCode() names atom.GetMonomerInfo().GetName() occupancies atom.GetMonomerInfo().GetOccupancy() resnames atom.GetMonomerInfo().GetResidueName() resids atom.GetMonomerInfo().GetResidueNumber() segindices atom.GetMonomerInfo().GetSegmentNumber() tempfactors atom.GetMonomerInfo().GetTempFactor() bfactors atom.GetMonomerInfo().GetTempFactor() charges atom.GetDoubleProp(“_MDAnalysis_charge”) indices atom.GetIntProp(“_MDAnalysis_index”) segids atom.GetProp(“_MDAnalysis_segid”) types atom.GetProp(“_MDAnalysis_type”) Example
To access MDAnalysis properties:
>>> import MDAnalysis as mda >>> from MDAnalysis.tests.datafiles import PDB_full >>> u = mda.Universe(PDB_full) >>> mol = u.select_atoms('resname DMS').convert_to('RDKIT') >>> mol.GetAtomWithIdx(0).GetMonomerInfo().GetResidueName() 'DMS'
To create a molecule for each frame of a trajectory:
from MDAnalysisTests.datafiles import PSF, DCD from rdkit.Chem.Descriptors3D import Asphericity u = mda.Universe(PSF, DCD) elements = mda.topology.guessers.guess_types(u.atoms.names) u.add_TopologyAttr('elements', elements) ag = u.select_atoms("resid 1-10") for ts in u.trajectory: mol = ag.convert_to("RDKIT") x = Asphericity(mol)
Notes
The converter requires the
Elements
attribute to be present in the topology, else it will fail.It also requires the bonds attribute, although they will be automatically guessed if not present.
If both
tempfactors
andbfactors
attributes are present, the conversion will fail, since only one of these should be present. Refer to Issue #1901 for a solutionHydrogens should be explicit in the topology file. If this is not the case, use the parameter
NoImplicit=False
when using the converter to allow implicit hydrogens and disable inferring bond orders and charges.Since one of the main use case of the converter is converting trajectories and not just a topology, creating a new molecule from scratch for every frame would be too slow so the converter uses a caching system. The cache only remembers the id of the last AtomGroup that was converted, as well as the arguments that were passed to the converter. This means that using
u.select_atoms("protein").convert_to("RDKIT")
will not benefit from the cache since the selection is deleted from memory as soon as the conversion is finished. Instead, users should do this in two steps by first saving the selection in a variable and then converting the saved AtomGroup. It also means thatag.convert_to("RDKIT")
followed byag.convert_to("RDKIT", NoImplicit=False)
will not use the cache. Finally if you’re modifying the AtomGroup in place between two conversions, the id of the AtomGroup won’t change and thus the converter will use the cached molecule. For this reason, you can pass acache=False
argument to the converter to bypass the caching system. Note that the cached molecule doesn’t contain the coordinates of the atoms.New in version 2.0.0.
-
atomgroup_to_mol
(ag, NoImplicit=True, max_iter=200)[source]¶ Converts an AtomGroup to an RDKit molecule.
Parameters: - ag (MDAnalysis.core.groups.AtomGroup) – The AtomGroup to convert
- NoImplicit (bool) – Prevent adding hydrogens to the molecule
- max_iter (int) – Maximum number of iterations to standardize conjugated systems.
See
_rebuild_conjugated_bonds()
-
convert
(obj, cache=True, NoImplicit=True, max_iter=200)[source]¶ Write selection at current trajectory frame to
Mol
.Parameters: - obj (
AtomGroup
orUniverse
) – - cache (bool) – Use a cached copy of the molecule’s topology when available. To be used, the cached molecule and the new one have to be made from the same AtomGroup object (same id) and with the same arguments passed to the converter (with the exception of this cache argument)
- NoImplicit (bool) – Prevent adding hydrogens to the molecule
- max_iter (int) – Maximum number of iterations to standardize conjugated systems.
See
_rebuild_conjugated_bonds()
- obj (
- in an
-
MDAnalysis.coordinates.RDKit.
_infer_bo_and_charges
(mol)[source]¶ Infer bond orders and formal charges from a molecule.
Since most MD topology files don’t explicitly retain information on bond orders or charges, it has to be guessed from the topology. This is done by looping over each atom and comparing its expected valence to the current valence to get the Number of Unpaired Electrons (NUE). If an atom has a negative NUE, it needs a positive formal charge (-NUE). If two neighbouring atoms have UEs, the bond between them most likely has to be increased by the value of the smallest NUE. If after this process, an atom still has UEs, it needs a negative formal charge of -NUE.
Parameters: mol (rdkit.Chem.rdchem.RWMol) – The molecule is modified inplace and must have all hydrogens added Notes
This algorithm is order dependant. For example, for a carboxylate group R-C(-O)-O the first oxygen read will receive a double bond and the other one will be charged. It will also affect more complex conjugated systems.
-
MDAnalysis.coordinates.RDKit.
_standardize_patterns
(mol, max_iter=200)[source]¶ Standardizes functional groups
Uses
_rebuild_conjugated_bonds()
to standardize conjugated systems, and SMARTS reactions for other functional groups. Due to the way reactions work, we first have to split the molecule by fragments. Then, for each fragment, we apply the standardization reactions. Finally, the fragments are recombined.Parameters: - mol (rdkit.Chem.rdchem.RWMol) – The molecule to standardize
- max_iter (int) – Maximum number of iterations to standardize conjugated systems
Returns: mol – The standardized molecule
Return type: Notes
The following functional groups are transformed in this order:
Name Reaction conjugated [*-;!O:1]-[*:2]=[*:3]-[*-:4]>>[*+0:1]=[*:2]-[*:3]=[*+0:4]
conjugated-N+ [N;X3;v3:1]-[*:2]=[*:3]-[*-:4]>>[N+:1]=[*:2]-[*:3]=[*+0:4]
conjugated-O- [O:1]=[#6:2]-[*:3]=[*:4]-[*-:5]>>[O-:1]-[*:2]=[*:3]-[*:4]=[*+0:5]
Cterm [C-;X2:1]=[O:2]>>[C+0:1]=[O:2]
Nterm [N-;X2;H1:1]>>[N+0:1]
keto-enolate [#6-:1]-[#6:2]=[O:3]>>[#6+0:1]=[#6:2]-[O-:3]
arginine [N;H1:1]-[C-;X3;H0:2](-[N;H2:3])-[N;H2:4]>>[N:1]-[C+0:2](-[N:3])=[N+:4]
histidine [#6+0;H0:1]=[#6+0:2]-[#7;X3:3]-[#6-;X3:4]>>[#6:1]=[#6:2]-[#7+:3]=[#6+0:4]
sulfone [S;X4;v4:1](-[O-;X1:2])-[O-;X1:3]>>[S:1](=[O+0:2])=[O+0:3]
nitro [N;X3;v3:1](-[O-;X1:2])-[O-;X1:3]>>[N+:1](-[O-:2])=[O+0:3]
-
MDAnalysis.coordinates.RDKit.
_rebuild_conjugated_bonds
(mol, max_iter=200)[source]¶ Rebuild conjugated bonds without negatively charged atoms at the beginning and end of the conjugated system
Depending on the order in which atoms are read during the conversion, the
_infer_bo_and_charges()
function might write conjugated systems with a double bond less and both edges of the system as anions instead of the usual alternating single and double bonds. This function corrects this behaviour by using an iterative procedure. The problematic molecules always follow the same pattern:anion[-*=*]n-anion
instead of*=[*-*=]n*
, wheren
is the number of successive single and double bonds. The goal of the iterative procedure is to maken
as small as possible by consecutively transforminganion-*=*
into*=*-anion
until it reaches the smallest pattern withn=1
. This last pattern is then transformed fromanion-*=*-anion
to*=*-*=*
. Sinceanion-*=*
is the same as*=*-anion
in terms of SMARTS, we can control that we don’t transform the same triplet of atoms back and forth by adding their indices to a list. The molecule needs to be kekulized first to also cover systems with aromatic rings.Parameters: - mol (rdkit.Chem.rdchem.RWMol) – The molecule to transform, modified inplace
- max_iter (int) – Maximum number of iterations performed by the function