parsnip¶
Overview
An interface for reading CIF files in Python.
Importing parsnip allows users to read CIF 1.1 files, as well as many features from the CIF 2.0 and mmCIF formats.
Creating a CifFile object provides easy access to name-value pairs, as well
as loop_-delimited loops. Data entries can be extracted as python primitives or
numpy arrays for further use.
The CIF Format
This is an example of a simple CIF file. A key (data name or tag) must start with
an underscore, and is separated from the data value with whitespace characters.
A table begins with the loop_ keyword, and contain a header block and a data
block. The vertical position of a tag in the table headings corresponds with the
horizontal position of the associated column in the table values.
# A header describing this portion of the file
data_cif_Cu-FCC
# Several key-value pairs
_journal_year 1999
_journal_page_first 0
_journal_page_last 123
_chemical_name_mineral 'Copper FCC'
_chemical_formula_sum 'Cu'
# Key-value pairs describing the unit cell (Å and °)
_cell_length_a 3.6
_cell_length_b 3.6
_cell_length_c 3.6
_cell_angle_alpha 90.0
_cell_angle_beta 90.0
_cell_angle_gamma 90.0
# A table with 6 columns and one row
loop_
_atom_site_label
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_type_symbol
_atom_site_Wyckoff_label
Cu1 0.0000000000 0.0000000000 0.0000000000 Cu a
_symmetry_space_group_name_H-M 'Fm-3m' # One more key-value pair
# A table with two columns and four rows:
loop_
_symmetry_equiv_pos_site_id
_symmetry_equiv_pos_as_xyz
1 x,y,z
96 z,y+1/2,x+1/2
118 z+1/2,-y,x+1/2
192 z+1/2,y+1/2,x
Classes:
|
Lightweight, performant parser for CIF files. |
- class CifFile(fn, cast_values=False)¶
Bases:
objectLightweight, performant parser for CIF files.
Example
To get started, simply provide a filename:
>>> from parsnip import CifFile >>> cif = CifFile("example_file.cif") >>> print(cif) CifFile(fn=example_file.cif) : 12 data entries, 2 data loops
Data entries are accessible via the
pairsandloopsattributes:>>> cif.pairs {'_journal_year': '1999', '_journal_page_first': '0', ...} >>> cif.loops[0] array([[('Cu1', '0.0000000000', '0.0000000000', '0.0000000000', 'Cu', 'a')]], dtype=...) >>> cif.loops[1] array([[('1', 'x,y,z')], [('96', 'z,y+1/2,x+1/2')], [('118', 'z+1/2,-y,x+1/2')], [('192', 'z+1/2,y+1/2,x')]], dtype=...)
Tip
See the docs for
__getitem__andget_from_loopsto query for data by key or column label!- Parameters:
Attributes:
A dict containing key-value pairs extracted from the file.
A list of data tables (loop_'s) extracted from the file.
The lattice vectors of the unit cell, with \(\vec{a_1}\perp[100]\).
A list of column labels for each data array.
Extract the symmetry operations from a CIF file.
Extract symmetry-irreducible, fractional x,y,z coordinates from a CIF file.
Whether to cast "number-like" values to ints & floats.
Regex patterns used when parsing files.
Methods:
__getitem__(index)get_from_pairs(index)Return an item or items from the dictionary of key-value pairs.
get_from_loops(index)Return a column or columns from the matching table in
loops.read_cell_params([degrees, normalize])Read the unit cell parameters (lengths and angles) from a CIF file.
build_unit_cell([n_decimal_places, verbose])Reconstruct fractional atomic positions from Wyckoff sites and symops.
Convert a structured (column-labeled) array to a standard unstructured array.
- property pairs¶
A dict containing key-value pairs extracted from the file.
Numeric values will be parsed to int or float if possible. In these cases, precision specifiers will be stripped.
- property loops¶
A list of data tables (loop_’s) extracted from the file.
These are stored as numpy structured arrays, which can be indexed by column labels. See the
structured_to_unstructuredhelper function below for details on converting to standard arrays.- Returns:
A list of structured arrays containing table data from the file.
- Return type:
list[
numpy.ndarray[str]]
- __getitem__(index)¶
Return an item or list of items from
pairs()andloops().This getter searches the entire CIF state to identify the input keys, returning None if the key does not match any data. Matching columns from loop tables are returned as 1D arrays.
Tip
This method of accessing data is recommended for most uses, as it ensures data is returned wherever possible.
get_from_loops()may be useful when multi-column slices of an array are needed.Example
Indexing the class with a single key:
>>> cif["_journal_year"] '1999' >>> cif["_atom_site_label"] array([['Cu1']], dtype='<U12')
Indexing with a list of keys:
>>> cif[["_chemical_name_mineral", "_symmetry_equiv_pos_as_xyz"]] ["'Copper FCC'", array([['x,y,z'], ['z,y+1/2,x+1/2'], ['z+1/2,-y,x+1/2'], ['z+1/2,y+1/2,x']], dtype='<U14')]
Wildcards are supported for lookups with this method:
>>> cif[["_journal*", "_atom_site_fract_?"]] [['1999', '0', '123'], ...array([['0.0000000000', '0.0000000000', '0.0000000000']], dtype='<U12')]
- get_from_pairs(index)¶
Return an item or items from the dictionary of key-value pairs.
Tip
This method supports a few unix-style wildcards. Use * to match any number of any character, and ? to match any single character. If a wildcard matches more than one key, a list is returned for that index.
Indexing with a string returns the value from the
pairs()dict. Indexing with an Iterable of strings returns a list of values, with None as a placeholder for keys that did not match any data.Example
Indexing the class with a single key:
>>> cif.get_from_pairs("_journal_year") '1999'
Indexing with a list of keys:
>>> cif.get_from_pairs(["_journal_page_first", "_journal_page_last"]) ['0', '123']
Indexing with wildcards:
>>> cif.get_from_pairs("_journal*") ['1999', '0', '123']
Single-character wildcards can generalize keys across CIF and mmCIF files:
>>> cif.get_from_pairs("_symmetry?space_group_name_H-M") "'Fm-3m'"
- get_from_loops(index)¶
Return a column or columns from the matching table in
loops.If index is a single string, a single column will be returned from the matching table. If index is an Iterable of strings, the corresponding table slices will be returned. Slices from the same table will be grouped in the output array, but slices from different arrays will be returned separately.
Tip
It is highly recommended that queries across multiple loops are provided in separated calls to this function. This helps ensure output data is ordered as expected and allows for easier handling of cases where non-matching keys are provided.
Example
Extract a single column from a single table:
>>> cif.get_from_loops("_symmetry_equiv_pos_as_xyz") array([['x,y,z'], ['z,y+1/2,x+1/2'], ['z+1/2,-y,x+1/2'], ['z+1/2,y+1/2,x']], dtype='<U14')
Extract multiple columns from a single table:
>>> table_1_cols = ["_symmetry_equiv_pos_site_id", "_symmetry_equiv_pos_as_xyz"] >>> table_1 = cif.get_from_loops(table_1_cols) >>> table_1 array([['1', 'x,y,z'], ['96', 'z,y+1/2,x+1/2'], ['118', 'z+1/2,-y,x+1/2'], ['192', 'z+1/2,y+1/2,x']], dtype='<U14')
Wildcard patterns are accepted for single input keys:
>>> assert (cif.get_from_loops("_symmetry_equiv_pos*") == table_1).all()
Extract multiple columns from multiple loops:
>>> table_1_cols = ["_symmetry_equiv_pos_site_id", "_symmetry_equiv_pos_as_xyz"] >>> table_2_cols = ["_atom_site_type_symbol", "_atom_site_Wyckoff_label"] >>> [cif.get_from_loops(cols) for cols in (table_1_cols, table_2_cols)] [array([['1', 'x,y,z'], ['96', 'z,y+1/2,x+1/2'], ['118', 'z+1/2,-y,x+1/2'], ['192', 'z+1/2,y+1/2,x']], dtype='<U14'), array([['Cu', 'a']], dtype='<U12')]
Caution
Returned arrays will match the ordering of input
indexkeys if all indices correspond to a single table. Indices that match multiple loops will return all possible matches, in the order of the input loops. Lists of input that correspond with multiple loops will return data from those loops in the order they were read from the file.Case where ordering of output matches the input file, not the provided keys:
>>> cif.get_from_loops([*table_1_cols, *table_2_cols]) [array([['Cu', 'a']], dtype='<U12'), array([['1', 'x,y,z'], ['96', 'z,y+1/2,x+1/2'], ['118', 'z+1/2,-y,x+1/2'], ['192', 'z+1/2,y+1/2,x']], dtype='<U14')]
- Parameters:
index (str | Iterable[str]) – A column name or list of column names.
- Returns:
A list of unstructured arrays corresponding with matches from the input keys. If the resulting list would have length 1, the data is returned directly instead. See the note above for data ordering.
- Return type:
list[
numpy.ndarray] |numpy.ndarray
- read_cell_params(degrees=True, normalize=False)¶
Read the unit cell parameters (lengths and angles) from a CIF file.
- Parameters:
- Returns:
The box vector lengths (in angstroms) and angles (in degrees or radians) \((L_1, L_2, L_3, \alpha, \beta, \gamma)\).
- Return type:
- Raises:
ValueError – If the stored data cannot form a valid box.
- build_unit_cell(n_decimal_places=4, verbose=False)¶
Reconstruct fractional atomic positions from Wyckoff sites and symops.
Rather than storing an entire unit cell’s atomic positions, CIF files instead include the data required to recreate those positions based on symmetry rules. Symmetry operations (stored as strings of x,y,z position permutations) are applied to the Wyckoff (symmetry irreducible) positions to create a list of possible atomic sites. These are then wrapped into the unit cell and filtered for uniqueness to yield the final crystal.
Warning
Reconstructing positions requires several floating point calculations that can be impacted by low-precision data in CIF files. Typically, at least four decimal places are required to accurately reconstruct complicated unit cells: less precision than this can yield cells with duplicate or missing positions.
- Parameters:
n_decimal_places (int) – (int, optional) The number of decimal places to round each position to for the uniqueness comparison. Values higher than 4 may not work for all CIF files. Default value =
4verbose (bool) – (bool, optional) Whether to print debug information about the uniqueness checks. Default value =
False
- Returns:
The full unit cell of the crystal structure.
- Return type:
\((N, 3)\)
numpy.ndarray[float]- Raises:
ValueError – If the stored data cannot form a valid box.
- property box¶
Read the unit cell as a freud or HOOMD box-like object.
Important
cif.boxreturns box extents and tilt factors, whileCifFile.read_cell_paramsreturns unit cell vector lengths and angles. See the box-like documentation linked above for more details.Example
This method provides a convenient interface to create box objects.
>>> box = cif.box >>> print(box) (3.6, 3.6, 3.6, 0.0, 0.0, 0.0) >>> import freud, hoomd >>> freud.Box(*box) freud.box.Box(Lx=3.6, Ly=3.6, Lz=3.6, xy=0, xz=0, yz=0, ...) >>> hoomd.Box(*box) hoomd.box.Box(Lx=3.6, Ly=3.6, Lz=3.6, xy=0.0, xz=0.0, yz=0.0)
- property lattice_vectors¶
The lattice vectors of the unit cell, with \(\vec{a_1}\perp[100]\).
Important
The lattice vectors are stored as columns of the returned matrix, similar to freud to_matrix(). This matrix must be transposed when creating a Freud box or transforming fractional coordinates to absolute.
Example
The box matrix can be used to transform fractional coordinates to absolute coordinates after transposing to row-major form.
>>> lattice_vectors = cif.lattice_vectors >>> lattice_vectors array([[3.6, 0.0, 0.0], [0.0, 3.6, 0.0], [0.0, 0.0, 3.6]]) >>> cif.build_unit_cell() @ lattice_vectors.T # Calculate absolute positions array([[0.0, 0.0, 0.0], [0.0, 1.8, 1.8], [1.8, 0.0, 1.8], [1.8, 1.8, 0.0]])
- Returns:
The lattice vectors of the unit cell \(\vec{a_1}, \vec{a_2},\vec{a_3}\).
- Return type:
\((3, 3)\)
numpy.ndarray
- property loop_labels¶
A list of column labels for each data array.
This property is equivalent to
[arr.dtype.names for arr in self.loops].
- property symops¶
Extract the symmetry operations from a CIF file.
Example
>>> cif.symops array([['x,y,z'], ['z,y+1/2,x+1/2'], ['z+1/2,-y,x+1/2'], ['z+1/2,y+1/2,x']], dtype='<U14')
- Returns:
An array of strings containing the symmetry operations in a parsable algebraic form.
- Return type:
\((N,)\)
numpy.ndarray[str]
- property wyckoff_positions¶
Extract symmetry-irreducible, fractional x,y,z coordinates from a CIF file.
- Returns:
Symmetry-irreducible positions of atoms in fractional coordinates.
- Return type:
\((N, 3)\)
numpy.ndarray[float]
- property cast_values¶
Whether to cast “number-like” values to ints & floats.
Note
When set to True after construction, the values are modified in-place. This action cannot be reversed.
- Type:
Bool
- classmethod structured_to_unstructured(arr)¶
Convert a structured (column-labeled) array to a standard unstructured array.
This is useful when extracting entire loops from
loopsfor use in other programs. This classmethod simply callsnp.lib.recfunctions.structured_to_unstructuredon the input data to ensure the resulting array is properly laid out in memory. See this page in the structured array docs for more information.- Parameters:
arr (
numpy.ndarray: |numpy.recarray) – The structured array to convert.- Returns:
An unstructured array containing a copy of the data from the input.
- Return type:
- PATTERNS: ClassVar = {'block_delimiter': '([Dd][Aa][Tt][Aa]_)[ |\\t]*([^\\n]*)', 'key_list': '_[\\w_\\.*]+[\\[\\d\\]]*', 'key_value_general': '^(_[\\w\\.\\-/\\[\\d\\]]+)\\s+([^#]+)', 'loop_delimiter': '([Ll][Oo][Oo][Pp]_)[ |\\t]*([^\\n]*)', 'space_delimited_data': '(\\;[^\\;]*\\;|\\\'[^\\\']*\\\'|\\"[^\\"]*\\"]|[^\\\'\\"\\;\\s]*)\\s*'}¶
Regex patterns used when parsing files.
This dictionary can be modified to change parsing behavior, although doing is not recommended. Changes to this variable are shared across all instances of the class.