Database#

The main goals of the atooms.database package are

standardize interaction models for classical molecular dynamics and Monte Carlo simulations
provide an extensible database of models and trajectory samples to quickly start new simulations
provide a simple Database class that you can use to tag, browse and organize simulation data

In a nutshell, these are the basic operations:

database.models(): get all available models
database.model("lennard_jones"): get a specific model (here, Lennard-Jones)
database.samples(): get all available samples
database.sample("lennard_jones/0_13ce47602b259f7802e89e23ffd57f19.xyz"): get a specific sample

Under the hoods, there are two default databases:

a custom TinyDB instance called database.models to store interaction models
a Database instance called database.samples to store samples of trajectories

Interaction models#

Let’s see how to browse the models database.

Available models#

Available models are returned as a list by iterating over atooms.database.models. Each entry is a dictionary, whose keys are specified below.

To see which models are currently defined, iterate over models like this

import atooms.database as db
for model in db.models:
    model['name'], model['reference']

You can pretty print the available models like this

db.pprint(db.models, columns=['name', 'reference'], sort_by='name')

name                                 reference
--------------------------------------------------------------------------------
Bernu-Hiwatari-Hansen-Pastore        Phys. Rev. A 36, 4891 (1987)
Coslovich-Pastore                    J. Phys.: Condens. Matter 21, 285107 (2009)
Coslovich-Pastore                    J. Phys.: Condens. Matter 21, 285107 (2009)
Dellavalle-Gazzillo-Frattini-Pastore Phys. Rev. B 49, 12625 (1994)
Dellavalle-Gazzillo-Frattini-Pastore Phys. Rev. B 49, 12625 (1994)
Gaussian core                        Phys. Rev. Lett. 106, 015701 (2011)
Grigera-Cavagna-Giardina-Parisi      Phys. Rev. Lett. 88, 055502 (2002)
Kob-Andersen                         Phys. Rev. Lett. 73, 1376 (1994)
Kob-Andersen                         Phys. Rev. Lett. 73, 1376 (1994)
Lennard-Jones                        Proc. R. Soc. Lond. A, 106, 463 (1924)
Roux-Barrat-Hansen                   J. Phys.: Condens. Matter 1, 7171 (1989)
Wahnstrom                            Phys. Rev. A 44, 3752 (1991)
harmonic spheres                     Phys. Rev. E 80, 021502 (2009)

To get the columns (fields) of the database

print(db.models.columns())

['name', 'potential', 'reference', 'schema_version', 'version']

To get a specific model, use database.model(name), where name is for instance lennard-jones.

Define a new model#

To define a new model, you can use a simple dictionary like this

{
    "potential": [
    {
        "type": "lennard_jones",
        "parameters": {"epsilon": [[1.0]], "sigma": [[1.0]]}
    }
    ],
    "cutoff": [
    {
        "type": "cut_shift",
        "parameters": {"rcut": [[2.5]]}
    }
    ]
}

The default schema requires only the potential and cutoff fields. A few more optional fields define metadata, such as publication details about the model.

from pprint import pprint
pprint(db.schemas[1])

{'$schema': 'https://json-schema.org/draft/2020-12/schema',
 'properties': {'cutoff': {'items': {'properties': {'parameters': {'type': 'object'},
                                                    'type': {'type': 'string'}},
                                     'required': ['type', 'parameters'],
                                     'type': 'object'},
                           'type': 'array'},
                'doi': {'type': 'string'},
                'name': {'type': 'string'},
                'notes': {'type': 'string'},
                'potential': {'items': {'properties': {'parameters': {'type': 'object'},
                                                       'type': {'type': 'string'}},
                                        'required': ['type', 'parameters'],
                                        'type': 'object'},
                              'type': 'array'},
                'reference': {'type': 'string'},
                'version': {'type': 'integer'}},
 'required': ['potential', 'cutoff'],
 'type': 'object'}

To choose or build your model, we need some more details about the schema layout, so keep reading.

Potentials and cutoffs#

Each model is associated to a list of interaction potentials along with their parameters. If we limit ourselves to two-body potentials, we write \(u_{\alpha\beta}(r) = u_{\alpha\beta}^{(1)}(r) + u_{\alpha\beta}^{(2)}(r) + \dots\), where \(\alpha\) and \(\beta\) are chemical species in a mixture.

Get the Lennard-Jones one-component model and inspect the parameters

pprint(db.model("lennard_jones"))

{'cutoff': [{'parameters': {'rcut': [[2.5]]}, 'type': 'cut_shift'}],
 'doi': '10.1098/rspa.1924.0082',
 'name': 'Lennard-Jones',
 'potential': [{'parameters': {'epsilon': [[1.0]], 'sigma': [[1.0]]},
                'type': 'lennard_jones'}],
 'reference': 'Proc. R. Soc. Lond. A, 106, 463 (1924)',
 'schema_version': 1,
 'version': 0}

Notice how the epsilon and sigma parameters are both 2d arrays, as they depend on the chemical species of the interacting particles. There is a cutoff field as well, since the potentials are typically cut off at some distance.

The version field allows one to introduce several variants of a model (ex. changes in the cutoff). The default model has always version=0.

The potentials and cutoffs have standardized names

for potential in sorted(db.potentials):
    print(potential)

fene
gaussian
harmonic
inverse_power
lennard_jones
sum_inverse_power
yukawa

The cutoffs have some aliases too.

for cutoff in sorted(db.cutoffs):
    print(cutoff)

cubic_spline
cut
cut_shift
cut_shift_linear
cut_shift_quadratic

Import a new model#

New models are stored programmatically.

Here we add a new version of the Lennard-Jones model, which differs from the default version because of the cut-off distance.

import atooms.database as db

model = {
    "name": "lennard_jones",
    "version": 1,
    "potential": [{
        "type": "lennard_jones",
        "parameters": {"epsilon": [[1.0]], "sigma": [[1.0]]}
    }],
    "cutoff": [{
        "type": "cut_shift",
        "parameters": {"rcut": [[4.0]]}
    }]
}

We autodetect the schema version information using the function db.hooks.version(), which returns a dict containing schema_version

model.update(db.hooks.version(model))

If the model does not match any schema version, an exception will be raised.

We can now insert the model

db.models.insert(model)

The models are documented in an org mode file at the root of the package. The whole database can be recreated from scratch by executing the corresponding source blocks.

Schema versions#

There are currently two schema versions for the models. The default is version 1, which is shown above. Version 2 stores the interaction parameters by pairs of species, instead of a single array.

pprint(db.model("kob_andersen", schema_version=2))

[{'name': 'Kob-Andersen',
  'potential': [{'cutoff': {'parameters': {'1-1': {'rcut': 2.5},
                                           '1-2': {'rcut': 2.0},
                                           '2-2': {'rcut': 2.2}},
                            'type': 'cut_shift'},
                 'parameters': {'1-1': {'epsilon': 1.0, 'sigma': 1.0},
                                '1-2': {'epsilon': 1.5, 'sigma': 0.8},
                                '2-2': {'epsilon': 0.5, 'sigma': 0.88}},
                 'type': 'lennard_jones'}],
  'reference': 'Phys. Rev. Lett. 73, 1376 (1994)',
  'schema_version': 2,
  'version': 0}]

This is useful to generate interaction potentials for some Potentials for backends.

Potentials for backends#

In addition to specifying the interaction models, atooms.models can also generate actual potentials for different interaction backends. In particular, there are implementations for the native f90 atooms backend and for RUMD.

Create a Lennard-Jones model interaction and compute the potential energy of the configuration stored in local_file

from atooms.backends.f90.trajectory import Trajectory
from atooms.backends.f90 import Interaction

with Trajectory(local_file) as th:
    system = th[0]

model = models.get("lennard_jones")
system.interaction = Interaction(model)
print(system.potential_energy(per_particle=True))

-3.807977629191845

We can compare it to the reference value stored in the samples database and check they agree. This is useful if you want to test a new implementation of a backend.

query = db.query.path == "lennard_jones/0_13ce47602b259f7802e89e23ffd57f19.xyz"
print(db.samples.search(query)[0]["potential_energy_per_particle"])

-3.8079776291909284

In addition to the native f90 backend, there is good support for RUMD. It is possible to generate potential objects for RUMD like this

model = db.model("kob_andersen", schema_version=2)
potentials = db.rumd.potential("kob_andersen")

The potentials list can then be passed to an Simulation instance defined in RUMD to carry out a simulation.

Trajectory samples#

We store trajectory samples to make it easy to

start a new simulation
check the implementation of interactions in third-party code
provide a little repository for further analysis

Each sample is associated to one the models defined above, but can of course be used for any other.

Browsing the database#

Browse the database of trajectory samples.

db.pprint(db.samples, columns=['path'])

path
----------------------------------------------------------------------
bernu_hiwatari_hansen_pastore/0_f61d7e58b9656cf9640f6e5754441930.xyz
coslovich_pastore/0_488db481cdac35e599922a26129c3e35.xyz
grigera_cavagna_giardina_parisi/0_0ac97fa8c69c320e48bd1fca80855e8a.xyz
kob_andersen/0_8f4a9fe755e5c1966c10b50c9a53e6bf.xyz
lennard_jones/0_13ce47602b259f7802e89e23ffd57f19.xyz
lennard_jones/0_5cc3b80bc415fa5c262e83410ca65779.xyz

Get a local copy of a Lennard-Jones fluid sample (by default, stored in a temporary directory)

query = database.query
local_file = database.samples.copy(query.path == "lennard_jones/0_13ce47602b259f7802e89e23ffd57f19.xyz")[0]

The local_file can then be used to start a simulation, benchmarking your code or further analysis using other atooms packages.

We rely on TinyDB to perform queries on the available samples. For instance, here we look for samples of the Lennard-Jones models with a unit density

samples = db.samples.search((db.query.model == 'lennard_jones') &
                            (db.query.density == 1.0))
db.pprint(samples, ['density', 'number_of_particles', 'path'])

density number_of_particles path
--------------------------------------------------------------------------------
1.0     108                 lennard_jones/0_5cc3b80bc415fa5c262e83410ca65779.xyz
1.0     256                 lennard_jones/0_13ce47602b259f7802e89e23ffd57f19.xyz

Here we inspect the full metadata of a given sample (identified by the file path)

query = db.query.path == "lennard_jones/0_13ce47602b259f7802e89e23ffd57f19.xyz"
pprint(db.samples.search(query))

[{'density': 1.0,
  'format': None,
  'md5_hash': '13ce47602b259f7802e89e23ffd57f19',
  'model': 'lennard_jones',
  'number_of_particles': 256,
  'path': 'lennard_jones/0_13ce47602b259f7802e89e23ffd57f19.xyz',
  'potential_energy_per_particle': -3.8079776291909284,
  'version': 0}]

For more advanced queries, check out what TinyDB can do!

The package currently provides some provenance information, via the notes optional field. We do not aim at “full reproducibility” (workflow management) here.

Adding metadata on the fly#

Adding hooks allows you to add or find additional metadata of trajectory files. Here we use a hook to inspect the trajectory properties using atooms.trajectory.

The columns get populated with additional fields

db.samples.add_hook(db.hooks.metadata_from_atooms)
print(db.samples.columns())

['cell side', 'cell volume', 'composition', 'density', 'format', 'frames', 'md5_hash', 'megabytes', 'model', 'number_of_particles', 'particles', 'path', 'potential_energy_per_particle', 'size dispersion', 'species', 'version']

We can inspect some of them now

db.pprint(db.samples, ['model', 'number_of_particles', 'composition', 'density'])

model                           number_of_particles composition          density
-------------------------------------------------------------------------------------
bernu_hiwatari_hansen_pastore   216                 {'2': 108, '1': 108} 0.5341880342
coslovich_pastore               300                 {'A': 99, 'B': 201}  1.655
grigera_cavagna_giardina_parisi 1000                {'A': 500, 'B': 500} 1.0
kob_andersen                    150                 {'1': 120, '2': 30}  1.2
lennard_jones                   108                 {'A': 108}           1.0
lennard_jones                   256                 {'A': 256}           1.0000019881

You can write your own hooks as functions that takes the database entry as input parameter and return a dictionary of metadata.

Setting up your own simulation database#

You can use the Database object to keep track of your simulations, independent of the atooms.samples database. Let us first create a bunch of trajectories

from atooms.core.utils import mkdir
from atooms.system import System
from atooms.trajectory import Trajectory

for rho in [1.0, 1.1, 1.2]:
    for n in range(100):
        outdir = f"/tmp/dataset/rho{rho}/n{n}"
        mkdir(outdir)
        system = System(N=100)
        system.density = 1.0
        with Trajectory(f"{outdir}/config.xyz", "w") as th:
            for i in range(100):
                th.write(system, step=i)

We now create a database and import all the trajectories contained in the folder /tmp/dataset/

import atooms.database
from atooms.database import Database, pprint
from atooms.core.utils import rmf

db = Database('/tmp/db.json')
db.insert_glob('/tmp/dataset/**/config.xyz', model="ideal_gas")
pprint(db.all(), max_rows=5)

100% 300/300 [00:02<00:00, 145.81it/s]
absolute_path                      md5_hash                     model     path
--------------------------------------------------     -----------------------------------------------------
/tmp/dataset/rho1.1/n30/config.xyz fea825c0e8d4e3f ... 9bb4f863 ideal_gas /tmp/dataset/rho1.1/n30/config.xyz
/tmp/dataset/rho1.1/n32/config.xyz f543cebeedddcdc ... 92ec669e ideal_gas /tmp/dataset/rho1.1/n32/config.xyz
/tmp/dataset/rho1.1/n27/config.xyz 62927adaaff255c ... 3fcad4b0 ideal_gas /tmp/dataset/rho1.1/n27/config.xyz
/tmp/dataset/rho1.1/n65/config.xyz 1b1aa34a4f5997f ... cf7bb6da ideal_gas /tmp/dataset/rho1.1/n65/config.xyz
... 296 entries not shown ...

The insertion methods db.insert() (and its bulk versions db_insert_glob() and db.insert_multiple()) leave the trajectories in place. You can copy them in the database root folder (i.e., the dirname of the json database file) by adding copy=True.

If we now try to insert again the same files, no new entries will be created (this behavior can be changed by setting db.require to an empty tuple). However, we can populate the entries with additional metatdata by adding some hooks

from atooms.database.hooks import metadata_from_atooms, metadata_from_path

db.add_hook(metadata_from_atooms)
db.add_hook(metadata_from_path)
pprint(db.all(), columns=('density', 'frames', 'n', 'path'), sort_by='n', max_rows=5)

density frames n  path
----------------------------------------------------
1.0     100    0  /tmp/dataset/rho1.1/n0/config.xyz
1.0     100    0  /tmp/dataset/rho1.2/n0/config.xyz
1.0     100    0  /tmp/dataset/rho1.0/n0/config.xyz
1.0     100    1  /tmp/dataset/rho1.1/n1/config.xyz
... 295 entries not shown

Note

The results of the metadata_from_atooms() hook are cached for efficiency: if the trajectory file has not changed since the last execution of the hook, the hook will skip evaluation of the metadata.

By default, the hooks do not actually store the metadata in the database. However, we can update the database to store them persistently

db.update()

Now the metadata are permanently stored in the database, as we can see by reopening the database.

db = Database('/tmp/db.json')
pprint(db.all(), columns=('density', 'frames', 'n', 'path'), sort_by=('path', 'n'), max_rows=5)

density frames n  path
----------------------------------------------------
1.0     100    0  /tmp/dataset/rho1.0/n0/config.xyz
1.0     100    1  /tmp/dataset/rho1.0/n1/config.xyz
1.0     100    10 /tmp/dataset/rho1.0/n10/config.xyz
1.0     100    11 /tmp/dataset/rho1.0/n11/config.xyz
... 295 entries not shown

Note

Updating the database with update() will also remove from the database entries whose path does not exist anymore.