6.9. H5MD trajectories — `MDAnalysis.coordinates.H5MD`¶

The H5MD trajectory file format is based upon the general, high performance HDF5 file format. HDF5 files are self documenting and can be accessed with the h5py library. HDF5 can make use of parallel file system features through the MPI-IO interface of the HDF5 library to improve parallel reads and writes.

The HDF5 library and h5py must be installed; otherwise, H5MD files cannot be read by MDAnalysis. If h5py is not installed, a RuntimeError is raised.

6.9.1. Units¶

H5MD files are very flexible and can store data in a wide range of physical units. The H5MDReader will attempt to match the units in order to convert all data to the standard MDAnalysis units (see MDAnalysis.units).

Units are read from the attributes of the position, velocity, force, and time datasets provided by the H5MD file. The unit string is translated from H5MD notation to MDAnalysis notation. If MDAnalysis does not recognize the unit (likely because that unit string is not defined in MDAnalysis.units) provided, a RuntimeError is raised. If no units are provided, MDAnalysis stores a value of None for each unit. If the H5MD file does not contain units and convert_units=True, MDAnalysis will raise a :exc`ValueError`. To load a universe from an H5MD file with no units, set convert_units=False.

6.9.2. Example: Loading an H5MD simulation¶

To load an H5MD simulation from an H5MD trajectory data file (using the H5MDReader), pass the topology and trajectory files to Universe:

import MDAnalysis as mda
u = mda.Universe("topology.tpr", "trajectory.h5md")

It is also possible to pass an open h5py.File file stream into the reader:

import MDAnalysis as mda
with h5py.File("trajectory.h5md", 'r') as f:
     u = mda.Universe("topology.tpr", f)

Note

Directly using a h5py.File does not work yet. See issue #2884.

6.9.3. Example: Opening an H5MD file in parallel¶

The parallel features of HDF5 can be accessed through h5py (see parallel h5py docs for more detail) by using the mpi4py Python package with a Parallel build of HDF5. To load a an H5MD simulation with parallel HDF5, pass driver and comm arguments to Universe:

import MDAnalysis as mda
from mpi4py import MPI
u = mda.Universe("topology.tpr", "trajectory.h5md",
                 driver="mpio", comm=MPI.COMM_WORLD)

Note

h5py must be built with parallel features enabled on top of a parallel HDF5 build, and HDF5 and mpi4py must be built with a working MPI implementation. See instructions below.

6.9.3.1. Building parallel h5py and HDF5 on Linux¶

Building a working parallel HDF5/h5py/mpi4py environment can be challenging and is often specific to your local computing resources, e.g., the supercomputer that you’re running on typically already has its preferred MPI installation. As a starting point we provide instructions that worked in a specific, fairly generic environment.

These instructions successfully built parallel HDF5/h5py with OpenMPI 4.0.4, HDF5 1.10.6, h5py 2.9.0, and mpi4py 3.0.3 on Ubuntu 16.0.6. You may have to play around with different combinations of versions of h5py/HDF5 to get a working parallel build.

Build MPI from sources
Build HDF5 from sources with parallel settings enabled:
./configure --enable-parallel --enable-shared
make
make install
Install mpi4py, making sure to point mpicc to where you’ve installed your MPI implemenation:
env MPICC=/path/to/mpicc pip install mpi4py
Build h5py from sources, making sure to enable mpi and to point to your parallel build of HDF5:
export HDF5_PATH=path-to-parallel-hdf5
python setup.py clean --all
python setup.py configure -r --hdf5-version=X.Y.Z --mpi --hdf5=$HDF5_PATH
export gcc=gcc
CC=mpicc HDF5_DIR=$HDF5_PATH python setup.py build
python setup.py install

If you have questions or want to share how you managed to build parallel hdf5/h5py/mpi4py please let everyone know on the MDAnalysis forums.

6.9.4. Classes¶

class MDAnalysis.coordinates.H5MD.Timestep(n_atoms, **kwargs)[source]¶

H5MD Timestep

Create a Timestep, representing a frame of a trajectory

Parameters:

n_atoms (int) – The total number of atoms this Timestep describes
positions (bool, optional) – Whether this Timestep has position information [True]
velocities (bool (optional)) – Whether this Timestep has velocity information [False]
forces (bool (optional)) – Whether this Timestep has force information [False]
reader (Reader (optional)) – A weak reference to the owning Reader. Used for when attributes require trajectory manipulation (e.g. dt)
dt (float (optional)) – The time difference between frames (ps). If time is set, then dt will be ignored.
time_offset (float (optional)) – The starting time from which to calculate time (in ps)

Changed in version 0.11.0: Added keywords for positions, velocities and forces. Can add and remove position/velocity/force information by using the has_* attribute.

positions¶: coordinates of the atoms as a numpy.ndarray of shape (n_atoms, 3)

velocities¶: velocities of the atoms as a numpy.ndarray of shape (n_atoms, 3); only available if the trajectory contains velocities or if the velocities = True keyword has been supplied.

forces¶: forces of the atoms as a numpy.ndarray of shape (n_atoms, 3); only available if the trajectory contains forces or if the forces = True keyword has been supplied.

dimensions¶

unitcell dimensions (A, B, C, alpha, beta, gamma)

lengths A, B, C are in the MDAnalysis length unit (Å), and angles are in degrees.

Setting dimensions will populate the underlying native format description (triclinic box vectors). If edges is a matrix, the box is of triclinic shape with the edge vectors given by the rows of the matrix.

class MDAnalysis.coordinates.H5MD.H5MDReader(filename, convert_units=True, driver=None, comm=None, **kwargs)[source]¶

Reader for the H5MD format.

See h5md documentation for a detailed overview of the H5MD file format.

The reader attempts to convert units in the trajectory file to the standard MDAnalysis units (MDAnalysis.units) if convert_units is set to True.

Additional data in the observables group of the H5MD file are loaded into the Timestep.data dictionary.

Only 3D-periodic boxes or no periodicity are supported; for no periodicity, Timestep.dimensions will return None.

Although H5MD can store varying numbers of particles per time step as produced by, e.g., GCMC simulations, MDAnalysis can currently only process a fixed number of particles per step. If the number of particles changes a ValueError is raised.

The H5MDReader reads .h5md files with the following HDF5 hierarchy:

Notation:
(name) is an HDF5 group that the reader recognizes
{name} is an HDF5 group with arbitrary name
[variable] is an HDF5 dataset
<dtype> is dataset datatype
+-- is an attribute of a group or dataset

H5MD root
 \-- (h5md)
    +-- version <int>
    \-- author
        +-- name <str>, author's name
        +-- email <str>, optional email address
    \-- creator
        +-- name <str>, file that created .h5md file
        +-- version
 \-- (particles)
    \-- {group1}
        \-- (box)
            +-- dimension : <int>, number of spatial dimensions
            +-- boundary : <str>, boundary conditions of unit cell
            \-- (edges)
                \-- [step] <int>, gives frame
                \-- [value] <float>, gives box dimensions
                    +-- unit <str>
        \-- (position)
            \-- [step] <int>, gives frame
            \-- [time] <float>, gives time
                +-- unit <str>
            \-- [value] <float>, gives numpy arrary of positions
                                 with shape (n_atoms, 3)
                +-- unit <str>
        \-- (velocity)
            \-- [step] <int>, gives frame
            \-- [time] <float>, gives time
                +-- unit <str>
            \-- [value] <float>, gives numpy arrary of velocities
                                 with shape (n_atoms, 3)
                +-- unit <str>
        \-- (force)
            \-- [step] <int>, gives frame
            \-- [time] <float>, gives time
                +-- unit <str>
            \-- [value] <float>, gives numpy arrary of forces
                                 with shape (n_atoms, 3)
                +-- unit <str>
 \-- (observables)
    \-- (lambda)
        \-- [step] <int>, gives frame
        \-- [time] <float>, gives time
        \-- [value] <float>
    \-- (step)
        \-- [step] <int>, gives frame
        \-- [time] <float>, gives time
        \-- [value] <int>, gives integration step

Note

The reader does not currently read mass or charge data.

Note

If the driver and comm arguments were used to open the hdf5 file (specifically, driver="mpio") then the _reopen() method does not close and open the file like most readers because the information about the MPI communicator would be lost; instead it rewinds the trajectory back to the first timestep.

New in version 2.0.0.

Parameters:

filename (str or h5py.File) – trajectory filename or open h5py file
convert_units (bool (optional)) – convert units to MDAnalysis units
driver (str (optional)) – H5PY file driver used to open H5MD file
comm (MPI.Comm (optional)) – MPI communicator used to open H5MD file Must be passed with ‘mpio’ file driver
**kwargs (dict) – General reader arguments.

Raises:

RuntimeError – when H5PY is not installed
RuntimeError – when a unit is not recognized by MDAnalysis
ValueError – when n_atoms changes values between timesteps
ValueError – when convert_units=True but the H5MD file contains no units
ValueError – when dimension of unitcell is not 3
ValueError – when an MPI communicator object is passed to the reader but driver != 'mpio'
NoDataError – when the H5MD file has no ‘position’, ‘velocity’, or ‘force’ group

_reopen()[source]¶: reopen trajectory

Note

If the driver and comm arguments were used to open the hdf5 file (specifically, driver="mpio") then this method does not close and open the file like most readers because the information about the MPI communicator would be lost; instead it rewinds the trajectory back to the first timstep.

close()[source]¶: close reader

has_forces¶: True if ‘force’ group is in trajectory.

has_positions¶: True if ‘position’ group is in trajectory.

has_velocities¶: True if ‘velocity’ group is in trajectory.

n_frames¶: number of frames in trajectory

open_trajectory()[source]¶: opens the trajectory file using h5py library

class MDAnalysis.coordinates.H5MD.H5PYPicklable(name, mode=None, driver=None, libver=None, userblock_size=None, swmr=False, rdcc_nslots=None, rdcc_nbytes=None, rdcc_w0=None, track_order=None, **kwds)[source]¶

H5PY file object (read-only) that can be pickled.

This class provides a file-like object (as returned by h5py.File) that, unlike standard Python file objects, can be pickled. Only read mode is supported.

When the file is pickled, filename, mode, driver, and comm of h5py.File in the file are saved. On unpickling, the file is opened by filename, mode, driver. This means that for a successful unpickle, the original file still has to be accessible with its filename.

Parameters:	filename (str or file-like) – a filename given a text or byte string. driver (str (optional)) – H5PY file driver used to open H5MD file

Example

f = H5PYPicklable('filename', 'r')
print(f['particles/trajectory/position/value'][0])
f.close()

can also be used as context manager:

with H5PYPicklable('filename', 'r'):
    print(f['particles/trajectory/position/value'][0])

Note

Pickling of an h5py.File opened with driver=”mpio” and an MPI communicator is currently not supported

6.9. H5MD trajectories — MDAnalysis.coordinates.H5MD¶

6.9.1. Units¶

6.9.2. Example: Loading an H5MD simulation¶

6.9.3. Example: Opening an H5MD file in parallel¶

6.9.3.1. Building parallel h5py and HDF5 on Linux¶

6.9.4. Classes¶

6.9. H5MD trajectories — `MDAnalysis.coordinates.H5MD`¶