Code Documentation¶
Standard Functions¶
-
selfies.
encoder
(smiles, print_error=False)¶ Translates a SMILES into a SELFIES.
The SMILES to SELFIES translation occurs independently of the SELFIES alphabet and grammar. Thus,
selfies.encoder()
will work regardless of the alphabet and grammar rules thatselfies
is operating on, assuming the input is a valid SMILES. Additionally,selfies.encoder()
preserves the atom and branch order of the input SMILES; thus, one could generate random SELFIES corresponding to the same molecule by generating random SMILES, and then translating them.However, encoding and then decoding a SMILES may not necessarily yield the original SMILES. Reasons include:
SMILES with aromatic symbols are automatically Kekulized before being translated.
SMILES that violate the bond constraints specified by
selfies
will be successfully encoded byselfies.encoder()
, but then decoded into a new molecule that satisfies the constraints.The exact ring numbering order is lost in
selfies.encoder()
, and cannot be reconstructed byselfies.decoder()
.
Finally, note that
selfies.encoder()
does not check if the input SMILES is valid, and should not be expected to reject invalid inputs. It is recommended to use RDKit to first verify that the SMILES are valid.- Parameters
smiles (
str
) – The SMILES to be translated.print_error (
bool
) – If True, error messages will be printed to console. Defaults to False.
- Return type
Optional
[str
]- Returns
the SELFIES translation of
smiles
. If an error occurs, andsmiles
cannot be translated,None
is returned instead.- Example
>>> import selfies >>> selfies.encoder('C=CF') '[C][=C][F]'
Note
Currently,
selfies.encoder()
does not support the following types of SMILES:SMILES using ring numbering across a dot-bond symbol to specify bonds, e.g.
C1.C2.C12
(propane) orc1cc([O-].[Na+])ccc1
(sodium phenoxide).SMILES with ring numbering between atoms that are over
16 ** 3 = 4096
atoms apart.SMILES using the wildcard symbol
*
.SMILES using chiral specifications other than
@
and@@
.
-
selfies.
decoder
(selfies, print_error=False)¶ Translates a SELFIES into a SMILES.
The SELFIES to SMILES translation operates based on the
selfies
grammar rules, which can be configured usingselfies.set_semantic_constraints()
. Given the appropriate settings, the decoded SMILES will always be syntactically and semantically correct. That is, the output SMILES will satisfy the specified bond constraints. Additionally,selfies.decoder()
will attempt to preserve the atom and branch order of the input SELFIES.- Parameters
selfies (
str
) – The SELFIES to be translated.print_error (
bool
) – If True, error messages will be printed to console. Defaults to False.
- Return type
Optional
[str
]- Returns
the SMILES translation of
selfies
. If an error occurs, andselfies
cannot be translated,None
is returned instead.- Example
>>> import selfies >>> selfies.decoder('[C][=C][F]') 'C=CF'
-
selfies.
len_selfies
(selfies)¶ Computes the symbol length of a SELFIES.
The symbol length is the number of symbols that make up the SELFIES, and not the length of the string itself (i.e.
len(selfies)
).- Parameters
selfies (
str
) – A SELFIES.- Return type
int
- Returns
The symbol length of
selfies
.- Example
>>> import selfies >>> selfies.len_selfies('[C][O][C]') 3 >>> selfies.len_selfies('[C][=C][F].[C]') 5
-
selfies.
split_selfies
(selfies)¶ Splits a SELFIES into its symbols.
Returns an iterable that yields the symbols of a SELFIES one-by-one in the order they appear in the string. SELFIES symbols are always either indicated by an open and closed square bracket, or are the
'.'
dot-bond symbol.- Parameters
selfies (
str
) – The SELFIES to be read.- Return type
Iterable
[str
]- Returns
An iterable of the symbols of
selfies
in the same order they appear in the string.- Example
>>> import selfies >>> list(selfies.split_selfies('[C][O][C]')) ['[C]', '[O]', '[C]'] >>> list(selfies.split_selfies('[C][=C][F].[C]')) ['[C]', '[=C]', '[F]', '.', '[C]']
-
selfies.
get_alphabet_from_selfies
(selfies_iter)¶ Constructs an alphabet from an iterable of SELFIES.
From an iterable of SELFIES, constructs the minimum-sized set of SELFIES symbols such that every SELFIES in the iterable can be constructed from symbols from that set. Then, the set is returned. Note that the symbol
'.'
will not be added as a member of the returned set, even if it appears in the input.- Parameters
selfies_iter (
Iterable
[str
]) – An iterable of SELFIES.- Return type
Set
[str
]- Returns
The SElFIES alphabet built from the SELFIES in
selfies_iter
.- Example
>>> import selfies >>> selfies_list = ['[C][F][O]', '[C].[O]', '[F][F]'] >>> alphabet = selfies.get_alphabet_from_selfies(selfies_list) >>> sorted(list(alphabet)) ['[C]', '[F]', '[O]']
-
selfies.
get_semantic_robust_alphabet
()¶ Returns a subset of all symbols that are semantically constrained by
selfies
.These semantic constraints can be configured with
selfies.set_semantic_constraints()
.- Return type
Set
[str
]- Returns
a subset of all symbols that are semantically constrained.
Advanced Functions¶
By default, selfies
operates under the following semantic constraints
Max Bonds |
Atom(s) |
---|---|
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
8 |
All other atoms |
However, the default constraints are inadequate for SMILES that violate them. For
example, nitrobenzene O=N(=O)C1=CC=CC=C1
has a nitrogen with 6 bonds and
the chlorate anion O=Cl(=O)[O-]
has a chlorine with 5 bonds - these
SMILES cannot be represented as SELFIES under the default constraints.
Additionally, users may want to specify their own custom constraints. Thus, we
provide the following methods for configuring the semantic constraints
of selfies
.
Warning
SELFIES may be translated differently under different semantic constraints. Therefore, if custom semantic constraints are used, it is recommended to report them for reproducibility reasons.
-
selfies.
get_semantic_constraints
()¶ Returns the semantic bond constraints that
selfies
is currently operating on.Returned is the argument of the most recent call of
selfies.set_semantic_constraints()
, or the default bond constraints if the function has not been called yet. Once retrieved, it is copied and then returned. Seeselfies.set_semantic_constraints()
for further explanation.- Return type
Dict
[str
,int
]- Returns
The bond constraints
selfies
is currently operating on.
-
selfies.
set_semantic_constraints
(bond_constraints=None)¶ Configures the semantic constraints of
selfies
.The SELFIES grammar is enforced dynamically from a dictionary
bond_constraints
. The keys of the dictionary are atoms and/or ions (e.g.I
,Fe+2
). To denote an ion, use the formatE+C
orE-C
, whereE
is an element andC
is a positive integer. The corresponding value is the maximum number of bonds that atom or ion can make, between 1 and 8 inclusive. For example, one may have:bond_constraints['I'] = 1
bond_constraints['C'] = 4
selfies.decoder()
will only generate SMILES that respect the bond constraints specified by the dictionary. In the example above, both'[C][=I]'
and'[I][=C]'
will be translated to'CI'
and'IC'
respectively, becauseI
has been configured to make one bond maximally.If an atom or ion is not specified in
bond_constraints
, it will by default be constrained to 8 bonds. To change the default setting for unrecognized atoms or ions, setbond_constraints['?']
to the desired integer (between 1 and 8 inclusive).- Parameters
bond_constraints (
Optional
[Dict
[str
,int
]]) – a dictionary representing the semantic constraints the updated SELFIES will operate upon. Defaults toNone
; in this case, a default dictionary will be used.- Return type
None
- Returns
None
.