Input and Output Routines

The jungle of file formats

There are a lot of file formats used and produced by chemistry applications. Each program has his way to store geometries, trajectories, energies and properties etc. chemlab tries to encompass all of those different properties by using a lightweight way to handle such differences.

Reading and writing data

The classes responsible for the I/O are subclasses of chemlab.io.handlers.IOHandler. These handlers take a file-like object as the first argument and they work all in the same way, here is an example of GroHandler:

from chemlab.io.handlers import GromacsIO

fd = open('waterbox.gro', 'rb')
infile = GromacsIO(fd)
system = infile.read('system')

# Modify system as you wish...
fd = open('waterbox_out.gro', 'w')
outfile = GromacsIO(fd)
outfile.write('system', system)

You first create the handler instance for a certain format and then you can read a certain feature provided by the handler. In this example we read and write the system feature.

Some file formats may have some extra data for each atom, molecule or system. For example the ”.gro” file formats have his own way to call the atoms in a water molecule: OW, HW1, HW2. To handle such issues, you can write this information in the export arrays contained in the data structures, such as Atom.export, Molecule.export, and their array-based counterparts Molecule.atom_export_array, System.mol_export and System.atom_export_array.

Those attributes are especially important where you write in some data format, since you may have to provide those attribute when you initialize your Atom, Molecule and System.

You can easily open a data file without even having to search his format handler by using the utility function chemlab.io.datafile() this is the recommended way for automatically opening a file:

from chemlab.io import datafile

# For reading
sys = datafile('waterbox.gro').read('system')
t, coords = datafile('traj.xtc').read('trajectory')

# For writing
datafile("output.gro", "w").write("system", sys)

Implementing your own IOHandler

Implementing or improving an existing IOHandler is a great way to participate in chemlab development. Fortunately, it’s extremely easy to set one up.

It boils down to a few steps:

  1. Subclass IOHandler;
  2. Define the class attributes can_read and can_write;
  3. Implement the write and read methods for the features that you added in can_read and can_write;
  4. Write the documentation for each feature.

Here is an example of the xyz handler:

import numpy as np
from chemlab.io.handlers import IOHandler
from chemlab.core import Molecule

class XyzIO(IOHandler):
    '''The XYZ format is described in this wikipedia article
    http://en.wikipedia.org/wiki/XYZ_file_format.

    **Features**

    .. method:: read("molecule")

       Read the coordinates in a :py:class:`~chemlab.core.Molecule` instance.

    .. method:: write("molecule", mol)

       Writes a :py:class:`~chemlab.core.Molecule` instance in the XYZ format.
    '''

    can_read = ['molecule']
    can_write = ['molecule']

    def read(self, feature):
        self.check_feature(feature, "read")
        lines = self.fd.readlines()

        num = int(lines[0])
        title = lines[1]

        if feature == 'title':
            return title

        if feature == 'molecule':
            type_array = []
            r_array = []
            for l in lines[2:]:
                type, x, y, z = l.split()
                r_array.append([float(x),float(y),float(z)])
                type_array.append(type)

            r_array = np.array(r_array)/10 # To nm
            type_array = np.array(type_array)

            return Molecule.from_arrays(r_array=r_array, type_array=type_array)


    def write(self, feature, mol):
        self.check_feature(feature, "write")
        lines = []
        if feature == 'molecule':
            lines.append(str(mol.n_atoms))

            lines.append('Generated by chemlab')
            for t, (x, y, z) in zip(mol.type_array, mol.r_array):
                lines.append('    %s       %.6f      %.6f      %.6f' %
                             (t, x*10, y*10, z*10))

            self.fd.write('\n'.join(lines))

A few remarks:

  • It is recommended to use the method

    check_feature() before performing read/write. This will check that the feature is present in the can_read/can_write list;

  • If you want to squeeze out performance you should use

    Molecule.from_arrays() and System.from_arrays();

  • You can read whatever data you wish, for example the

    EdrIO handler does not read Molecule or System at all;

  • You can definitely take inspiration from the handlers included in chemlab, Supported File Formats.