Code Documentation

Standard Functions

selfies.encoder(smiles, print_error=False)

Translates a SMILES into a SELFIES.

The SMILES to SELFIES translation occurs independently of the SELFIES alphabet and grammar. Thus, selfies.encoder() will work regardless of the alphabet and grammar rules that selfies is operating on, assuming the input is a valid SMILES. Additionally, selfies.encoder() preserves the atom and branch order of the input SMILES; thus, one could generate random SELFIES corresponding to the same molecule by generating random SMILES, and then translating them.

However, encoding and then decoding a SMILES may not necessarily yield the original SMILES. Reasons include:

  1. SMILES with aromatic symbols are automatically Kekulized before being translated.

  2. SMILES that violate the bond constraints specified by selfies will be successfully encoded by selfies.encoder(), but then decoded into a new molecule that satisfies the constraints.

  3. The exact ring numbering order is lost in selfies.encoder(), and cannot be reconstructed by selfies.decoder().

Finally, note that selfies.encoder() does not check if the input SMILES is valid, and should not be expected to reject invalid inputs. It is recommended to use RDKit to first verify that the SMILES are valid.

Parameters
  • smiles (str) – The SMILES to be translated.

  • print_error (bool) – If True, error messages will be printed to console. Defaults to False.

Return type

Optional[str]

Returns

the SELFIES translation of smiles. If an error occurs, and smiles cannot be translated, None is returned instead.

Example

>>> import selfies
>>> selfies.encoder('C=CF')
'[C][=C][F]'

Note

Currently, selfies.encoder() does not support the following types of SMILES:

  • SMILES using ring numbering across a dot-bond symbol to specify bonds, e.g. C1.C2.C12 (propane) or c1cc([O-].[Na+])ccc1 (sodium phenoxide).

  • SMILES with ring numbering between atoms that are over 16 ** 3 = 4096 atoms apart.

  • SMILES using the wildcard symbol *.

  • SMILES using chiral specifications other than @ and @@.

selfies.decoder(selfies, print_error=False)

Translates a SELFIES into a SMILES.

The SELFIES to SMILES translation operates based on the selfies grammar rules, which can be configured using selfies.set_semantic_constraints(). Given the appropriate settings, the decoded SMILES will always be syntactically and semantically correct. That is, the output SMILES will satisfy the specified bond constraints. Additionally, selfies.decoder() will attempt to preserve the atom and branch order of the input SELFIES.

Parameters
  • selfies (str) – The SELFIES to be translated.

  • print_error (bool) – If True, error messages will be printed to console. Defaults to False.

Return type

Optional[str]

Returns

the SMILES translation of selfies. If an error occurs, and selfies cannot be translated, None is returned instead.

Example

>>> import selfies
>>> selfies.decoder('[C][=C][F]')
'C=CF'
selfies.len_selfies(selfies)

Computes the symbol length of a SELFIES.

The symbol length is the number of symbols that make up the SELFIES, and not the length of the string itself (i.e. len(selfies)).

Parameters

selfies (str) – A SELFIES.

Return type

int

Returns

The symbol length of selfies.

Example

>>> import selfies
>>> selfies.len_selfies('[C][O][C]')
3
>>> selfies.len_selfies('[C][=C][F].[C]')
5
selfies.split_selfies(selfies)

Splits a SELFIES into its symbols.

Returns an iterable that yields the symbols of a SELFIES one-by-one in the order they appear in the string. SELFIES symbols are always either indicated by an open and closed square bracket, or are the '.' dot-bond symbol.

Parameters

selfies (str) – The SELFIES to be read.

Return type

Iterable[str]

Returns

An iterable of the symbols of selfies in the same order they appear in the string.

Example

>>> import selfies
>>> list(selfies.split_selfies('[C][O][C]'))
['[C]', '[O]', '[C]']
>>> list(selfies.split_selfies('[C][=C][F].[C]'))
['[C]', '[=C]', '[F]', '.', '[C]']
selfies.get_alphabet_from_selfies(selfies_iter)

Constructs an alphabet from an iterable of SELFIES.

From an iterable of SELFIES, constructs the minimum-sized set of SELFIES symbols such that every SELFIES in the iterable can be constructed from symbols from that set. Then, the set is returned. Note that the symbol '.' will not be added as a member of the returned set, even if it appears in the input.

Parameters

selfies_iter (Iterable[str]) – An iterable of SELFIES.

Return type

Set[str]

Returns

The SElFIES alphabet built from the SELFIES in selfies_iter.

Example

>>> import selfies
>>> selfies_list = ['[C][F][O]', '[C].[O]', '[F][F]']
>>> alphabet = selfies.get_alphabet_from_selfies(selfies_list)
>>> sorted(list(alphabet))
['[C]', '[F]', '[O]']
selfies.get_semantic_robust_alphabet()

Returns a subset of all symbols that are semantically constrained by selfies.

These semantic constraints can be configured with selfies.set_semantic_constraints().

Return type

Set[str]

Returns

a subset of all symbols that are semantically constrained.

Advanced Functions

By default, selfies operates under the following semantic constraints

Max Bonds

Atom(s)

1

F, Cl, Br, I

2

O

3

N

4

C

5

P

6

S

8

All other atoms

However, the default constraints are inadequate for SMILES that violate them. For example, nitrobenzene O=N(=O)C1=CC=CC=C1 has a nitrogen with 6 bonds and the chlorate anion O=Cl(=O)[O-] has a chlorine with 5 bonds - these SMILES cannot be represented as SELFIES under the default constraints. Additionally, users may want to specify their own custom constraints. Thus, we provide the following methods for configuring the semantic constraints of selfies.

Warning

SELFIES may be translated differently under different semantic constraints. Therefore, if custom semantic constraints are used, it is recommended to report them for reproducibility reasons.

selfies.get_semantic_constraints()

Returns the semantic bond constraints that selfies is currently operating on.

Returned is the argument of the most recent call of selfies.set_semantic_constraints(), or the default bond constraints if the function has not been called yet. Once retrieved, it is copied and then returned. See selfies.set_semantic_constraints() for further explanation.

Return type

Dict[str, int]

Returns

The bond constraints selfies is currently operating on.

selfies.set_semantic_constraints(bond_constraints=None)

Configures the semantic constraints of selfies.

The SELFIES grammar is enforced dynamically from a dictionary bond_constraints. The keys of the dictionary are atoms and/or ions (e.g. I, Fe+2). To denote an ion, use the format E+C or E-C, where E is an element and C is a positive integer. The corresponding value is the maximum number of bonds that atom or ion can make, between 1 and 8 inclusive. For example, one may have:

  • bond_constraints['I'] = 1

  • bond_constraints['C'] = 4

selfies.decoder() will only generate SMILES that respect the bond constraints specified by the dictionary. In the example above, both '[C][=I]' and '[I][=C]' will be translated to 'CI' and 'IC' respectively, because I has been configured to make one bond maximally.

If an atom or ion is not specified in bond_constraints, it will by default be constrained to 8 bonds. To change the default setting for unrecognized atoms or ions, set bond_constraints['?'] to the desired integer (between 1 and 8 inclusive).

Parameters

bond_constraints (Optional[Dict[str, int]]) – a dictionary representing the semantic constraints the updated SELFIES will operate upon. Defaults to None; in this case, a default dictionary will be used.

Return type

None

Returns

None.