Database#
The main goals of the atooms.database
package are
standardize interaction models for classical molecular dynamics and Monte Carlo simulations
provide an extensible database of models and trajectory samples to quickly start new simulations
provide a simple
Database
class that you can use to tag, browse and organize simulation data
In a nutshell, these are the basic operations:
database.models()
: get all available modelsdatabase.model("lennard_jones")
: get a specific model (here, Lennard-Jones)database.samples()
: get all available samplesdatabase.sample("lennard_jones/0_13ce47602b259f7802e89e23ffd57f19.xyz")
: get a specific sample
Under the hoods, there are two default databases:
a custom
TinyDB
instance calleddatabase.models
to store interaction modelsa
Database
instance calleddatabase.samples
to store samples of trajectories
Interaction models#
Let’s see how to browse the models database.
Available models#
Available models are returned as a list by iterating over atooms.database.models
. Each entry is a dictionary, whose keys are specified below.
To see which models are currently defined, iterate over models
like this
import atooms.database as db
for model in db.models:
model['name'], model['reference']
You can pretty print the available models like this
db.pprint(db.models, columns=['name', 'reference'], sort_by='name')
name reference
--------------------------------------------------------------------------------
Bernu-Hiwatari-Hansen-Pastore Phys. Rev. A 36, 4891 (1987)
Coslovich-Pastore J. Phys.: Condens. Matter 21, 285107 (2009)
Coslovich-Pastore J. Phys.: Condens. Matter 21, 285107 (2009)
Dellavalle-Gazzillo-Frattini-Pastore Phys. Rev. B 49, 12625 (1994)
Dellavalle-Gazzillo-Frattini-Pastore Phys. Rev. B 49, 12625 (1994)
Gaussian core Phys. Rev. Lett. 106, 015701 (2011)
Grigera-Cavagna-Giardina-Parisi Phys. Rev. Lett. 88, 055502 (2002)
Kob-Andersen Phys. Rev. Lett. 73, 1376 (1994)
Kob-Andersen Phys. Rev. Lett. 73, 1376 (1994)
Lennard-Jones Proc. R. Soc. Lond. A, 106, 463 (1924)
Roux-Barrat-Hansen J. Phys.: Condens. Matter 1, 7171 (1989)
Wahnstrom Phys. Rev. A 44, 3752 (1991)
harmonic spheres Phys. Rev. E 80, 021502 (2009)
To get the columns (fields) of the database
print(db.models.columns())
['name', 'potential', 'reference', 'schema_version', 'version']
To get a specific model, use database.model(name)
, where name
is for instance lennard-jones
.
Define a new model#
To define a new model, you can use a simple dictionary like this
{
"potential": [
{
"type": "lennard_jones",
"parameters": {"epsilon": [[1.0]], "sigma": [[1.0]]}
}
],
"cutoff": [
{
"type": "cut_shift",
"parameters": {"rcut": [[2.5]]}
}
]
}
The default schema requires only the potential
and cutoff
fields. A few more optional fields define metadata, such as publication details about the model.
from pprint import pprint
pprint(db.schemas[1])
{'$schema': 'https://json-schema.org/draft/2020-12/schema',
'properties': {'cutoff': {'items': {'properties': {'parameters': {'type': 'object'},
'type': {'type': 'string'}},
'required': ['type', 'parameters'],
'type': 'object'},
'type': 'array'},
'doi': {'type': 'string'},
'name': {'type': 'string'},
'notes': {'type': 'string'},
'potential': {'items': {'properties': {'parameters': {'type': 'object'},
'type': {'type': 'string'}},
'required': ['type', 'parameters'],
'type': 'object'},
'type': 'array'},
'reference': {'type': 'string'},
'version': {'type': 'integer'}},
'required': ['potential', 'cutoff'],
'type': 'object'}
To choose or build your model, we need some more details about the schema layout, so keep reading.
Potentials and cutoffs#
Each model is associated to a list of interaction potentials along with their parameters. If we limit ourselves to two-body potentials, we write \(u_{\alpha\beta}(r) = u_{\alpha\beta}^{(1)}(r) + u_{\alpha\beta}^{(2)}(r) + \dots\), where \(\alpha\) and \(\beta\) are chemical species in a mixture.
Get the Lennard-Jones one-component model and inspect the parameters
pprint(db.model("lennard_jones"))
{'cutoff': [{'parameters': {'rcut': [[2.5]]}, 'type': 'cut_shift'}],
'doi': '10.1098/rspa.1924.0082',
'name': 'Lennard-Jones',
'potential': [{'parameters': {'epsilon': [[1.0]], 'sigma': [[1.0]]},
'type': 'lennard_jones'}],
'reference': 'Proc. R. Soc. Lond. A, 106, 463 (1924)',
'schema_version': 1,
'version': 0}
Notice how the epsilon
and sigma
parameters are both 2d arrays, as they depend on the chemical species of the interacting particles. There is a cutoff field as well, since the potentials are typically cut off at some distance.
The version
field allows one to introduce several variants of a model (ex. changes in the cutoff). The default model has always version=0
.
The potentials and cutoffs have standardized names
for potential in sorted(db.potentials):
print(potential)
fene
gaussian
harmonic
inverse_power
lennard_jones
sum_inverse_power
yukawa
The cutoffs have some aliases too.
for cutoff in sorted(db.cutoffs):
print(cutoff)
cubic_spline
cut
cut_shift
cut_shift_linear
cut_shift_quadratic
Import a new model#
New models are stored programmatically.
Here we add a new version of the Lennard-Jones model, which differs from the default version because of the cut-off distance.
import atooms.database as db
model = {
"name": "lennard_jones",
"version": 1,
"potential": [{
"type": "lennard_jones",
"parameters": {"epsilon": [[1.0]], "sigma": [[1.0]]}
}],
"cutoff": [{
"type": "cut_shift",
"parameters": {"rcut": [[4.0]]}
}]
}
We autodetect the schema version information using the function db.hooks.version()
, which returns a dict
containing schema_version
model.update(db.hooks.version(model))
If the model
does not match any schema version, an exception will be raised.
We can now insert the model
db.models.insert(model)
The models are documented in an org mode file at the root of the package. The whole database can be recreated from scratch by executing the corresponding source blocks.
Schema versions#
There are currently two schema versions for the models. The default is version 1, which is shown above. Version 2 stores the interaction parameters by pairs of species, instead of a single array.
pprint(db.model("kob_andersen", schema_version=2))
[{'name': 'Kob-Andersen',
'potential': [{'cutoff': {'parameters': {'1-1': {'rcut': 2.5},
'1-2': {'rcut': 2.0},
'2-2': {'rcut': 2.2}},
'type': 'cut_shift'},
'parameters': {'1-1': {'epsilon': 1.0, 'sigma': 1.0},
'1-2': {'epsilon': 1.5, 'sigma': 0.8},
'2-2': {'epsilon': 0.5, 'sigma': 0.88}},
'type': 'lennard_jones'}],
'reference': 'Phys. Rev. Lett. 73, 1376 (1994)',
'schema_version': 2,
'version': 0}]
This is useful to generate interaction potentials for some Potentials for backends.
Potentials for backends#
In addition to specifying the interaction models, atooms.models
can also generate actual potentials for different interaction backends
. In particular, there are implementations for the native f90
atooms backend and for RUMD
.
Create a Lennard-Jones model interaction and compute the potential energy of the configuration stored in local_file
from atooms.backends.f90.trajectory import Trajectory
from atooms.backends.f90 import Interaction
with Trajectory(local_file) as th:
system = th[0]
model = models.get("lennard_jones")
system.interaction = Interaction(model)
print(system.potential_energy(per_particle=True))
-3.807977629191845
We can compare it to the reference value stored in the samples database and check they agree. This is useful if you want to test a new implementation of a backend.
query = db.query.path == "lennard_jones/0_13ce47602b259f7802e89e23ffd57f19.xyz"
print(db.samples.search(query)[0]["potential_energy_per_particle"])
-3.8079776291909284
In addition to the native f90
backend, there is good support for RUMD
. It is possible to generate potential objects for RUMD like this
model = db.model("kob_andersen", schema_version=2)
potentials = db.rumd.potential("kob_andersen")
The potentials
list can then be passed to an Simulation
instance defined in RUMD to carry out a simulation.
Trajectory samples#
We store trajectory samples to make it easy to
start a new simulation
check the implementation of interactions in third-party code
provide a little repository for further analysis
Each sample is associated to one the models defined above, but can of course be used for any other.
Browsing the database#
Browse the database of trajectory samples.
db.pprint(db.samples, columns=['path'])
path
----------------------------------------------------------------------
bernu_hiwatari_hansen_pastore/0_f61d7e58b9656cf9640f6e5754441930.xyz
coslovich_pastore/0_488db481cdac35e599922a26129c3e35.xyz
grigera_cavagna_giardina_parisi/0_0ac97fa8c69c320e48bd1fca80855e8a.xyz
kob_andersen/0_8f4a9fe755e5c1966c10b50c9a53e6bf.xyz
lennard_jones/0_13ce47602b259f7802e89e23ffd57f19.xyz
lennard_jones/0_5cc3b80bc415fa5c262e83410ca65779.xyz
Get a local copy of a Lennard-Jones fluid sample (by default, stored in a temporary directory)
query = database.query
local_file = database.samples.copy(query.path == "lennard_jones/0_13ce47602b259f7802e89e23ffd57f19.xyz")[0]
The local_file
can then be used to start a simulation, benchmarking your code or further analysis using other atooms
packages.
We rely on TinyDB
to perform queries on the available samples. For instance, here we look for samples of the Lennard-Jones models with a unit density
samples = db.samples.search((db.query.model == 'lennard_jones') &
(db.query.density == 1.0))
db.pprint(samples, ['density', 'number_of_particles', 'path'])
density number_of_particles path
--------------------------------------------------------------------------------
1.0 108 lennard_jones/0_5cc3b80bc415fa5c262e83410ca65779.xyz
1.0 256 lennard_jones/0_13ce47602b259f7802e89e23ffd57f19.xyz
Here we inspect the full metadata of a given sample (identified by the file path
)
query = db.query.path == "lennard_jones/0_13ce47602b259f7802e89e23ffd57f19.xyz"
pprint(db.samples.search(query))
[{'density': 1.0,
'format': None,
'md5_hash': '13ce47602b259f7802e89e23ffd57f19',
'model': 'lennard_jones',
'number_of_particles': 256,
'path': 'lennard_jones/0_13ce47602b259f7802e89e23ffd57f19.xyz',
'potential_energy_per_particle': -3.8079776291909284,
'version': 0}]
For more advanced queries, check out what TinyDB
can do!
The package currently provides some provenance information, via the notes
optional field. We do not aim at “full reproducibility” (workflow management) here.
Adding metadata on the fly#
Adding hooks allows you to add or find additional metadata of trajectory files. Here we use a hook to inspect the trajectory properties using atooms.trajectory
.
The columns get populated with additional fields
db.samples.add_hook(db.hooks.metadata_from_atooms)
print(db.samples.columns())
['cell side', 'cell volume', 'composition', 'density', 'format', 'frames', 'md5_hash', 'megabytes', 'model', 'number_of_particles', 'particles', 'path', 'potential_energy_per_particle', 'size dispersion', 'species', 'version']
We can inspect some of them now
db.pprint(db.samples, ['model', 'number_of_particles', 'composition', 'density'])
model number_of_particles composition density
-------------------------------------------------------------------------------------
bernu_hiwatari_hansen_pastore 216 {'2': 108, '1': 108} 0.5341880342
coslovich_pastore 300 {'A': 99, 'B': 201} 1.655
grigera_cavagna_giardina_parisi 1000 {'A': 500, 'B': 500} 1.0
kob_andersen 150 {'1': 120, '2': 30} 1.2
lennard_jones 108 {'A': 108} 1.0
lennard_jones 256 {'A': 256} 1.0000019881
You can write your own hooks as functions that takes the database entry as input parameter and return a dictionary of metadata.
Setting up your own simulation database#
You can use the Database
object to keep track of your simulations, independent of the atooms.samples
database. Let us first create a bunch of trajectories
from atooms.core.utils import mkdir
from atooms.system import System
from atooms.trajectory import Trajectory
for rho in [1.0, 1.1, 1.2]:
for n in range(100):
outdir = f"/tmp/dataset/rho{rho}/n{n}"
mkdir(outdir)
system = System(N=100)
system.density = 1.0
with Trajectory(f"{outdir}/config.xyz", "w") as th:
for i in range(100):
th.write(system, step=i)
We now create a database and import all the trajectories contained in the folder /tmp/dataset/
import atooms.database
from atooms.database import Database, pprint
from atooms.core.utils import rmf
db = Database('/tmp/db.json')
db.insert_glob('/tmp/dataset/**/config.xyz', model="ideal_gas")
pprint(db.all(), max_rows=5)
100% 300/300 [00:02<00:00, 145.81it/s]
absolute_path md5_hash model path
-------------------------------------------------- -----------------------------------------------------
/tmp/dataset/rho1.1/n30/config.xyz fea825c0e8d4e3f ... 9bb4f863 ideal_gas /tmp/dataset/rho1.1/n30/config.xyz
/tmp/dataset/rho1.1/n32/config.xyz f543cebeedddcdc ... 92ec669e ideal_gas /tmp/dataset/rho1.1/n32/config.xyz
/tmp/dataset/rho1.1/n27/config.xyz 62927adaaff255c ... 3fcad4b0 ideal_gas /tmp/dataset/rho1.1/n27/config.xyz
/tmp/dataset/rho1.1/n65/config.xyz 1b1aa34a4f5997f ... cf7bb6da ideal_gas /tmp/dataset/rho1.1/n65/config.xyz
... 296 entries not shown ...
The insertion methods db.insert()
(and its bulk versions db_insert_glob()
and db.insert_multiple()
) leave the trajectories in place. You can copy them in the database root folder (i.e., the dirname
of the json database file) by adding copy=True
.
If we now try to insert again the same files, no new entries will be created (this behavior can be changed by setting db.require
to an empty tuple). However, we can populate the entries with additional metatdata by adding some hooks
from atooms.database.hooks import metadata_from_atooms, metadata_from_path
db.add_hook(metadata_from_atooms)
db.add_hook(metadata_from_path)
pprint(db.all(), columns=('density', 'frames', 'n', 'path'), sort_by='n', max_rows=5)
density frames n path
----------------------------------------------------
1.0 100 0 /tmp/dataset/rho1.1/n0/config.xyz
1.0 100 0 /tmp/dataset/rho1.2/n0/config.xyz
1.0 100 0 /tmp/dataset/rho1.0/n0/config.xyz
1.0 100 1 /tmp/dataset/rho1.1/n1/config.xyz
... 295 entries not shown
Note
The results of the metadata_from_atooms()
hook are cached for efficiency: if the trajectory file has not changed since the last execution of the hook, the hook will skip evaluation of the metadata.
By default, the hooks do not actually store the metadata in the database. However, we can update the database to store them persistently
db.update()
Now the metadata are permanently stored in the database, as we can see by reopening the database.
db = Database('/tmp/db.json')
pprint(db.all(), columns=('density', 'frames', 'n', 'path'), sort_by=('path', 'n'), max_rows=5)
density frames n path
----------------------------------------------------
1.0 100 0 /tmp/dataset/rho1.0/n0/config.xyz
1.0 100 1 /tmp/dataset/rho1.0/n1/config.xyz
1.0 100 10 /tmp/dataset/rho1.0/n10/config.xyz
1.0 100 11 /tmp/dataset/rho1.0/n11/config.xyz
... 295 entries not shown
Note
Updating the database with update()
will also remove from the database entries whose path
does not exist anymore.