Preliminary Definitions¶
Genetic Alphabet¶
Definition
A genetic alphabet \(\langle \Sigma,\sim \rangle\) is an algebraic structure on an alphabet \(\Sigma\) with a unary operation \(\sim\) verifying the following properties:
\(\sim: \Sigma^\star \to \Sigma^\star\) is a bijection
\(\forall x \in \Sigma^\star, \lvert \widetilde{x} \rvert = \lvert x \rvert\)
\(\forall (x, y) \in (\Sigma^\star)^2, \quad \widetilde{x \cdot y} = \widetilde{\,y\,} \cdot \widetilde{\,x\,}\)
Note
To stay consistent with the biology lexicon, we will be referring to a word over a genetic alphabet as a sequence, only explicitly naming a mathematical sequence when needed to.
Examples
\((\{A, T, G, C\}, \sim)\) is the standard genetic alphabet, with \(\sim\) defined as \(\widetilde{A \cdot G} = C \cdot T\).
\((\{A, T, G, C, d5SICS, dNaM\}, \sim)\) is the genetic alphabet using the unnatural base pairs from Malyshev et al., Nature 2014, with \(\sim\) defined as \(\widetilde{A \cdot G \cdot d5ICS} = dNaM \cdot C \cdot T\)
Circular Sequences¶
Definition
A circular word over an alphabet \(\Sigma\) is a finite word with no end. It can be noted \(w^{(c)}\), where \(w\) is a finite word of \(\Sigma^\star\).
Definition: Cardinality
Given a circular sequence \(s^{(c)}\), the cardinal of \(s^{(c)}\), noted \(\lvert s^{(c)} \rvert\), is defined as:
Definition: Equality
Given two sequences \(a^{(c)}\) and \(b^{(c)}\) with
let the \(=\) relation be defined as:
where \(\sigma\) is the circular shift defined as:
Property
\(=\) is a relation of equivalence over \(\Sigma^{(c)}\)
Demonstration
Given the set of circular sequences \(\Sigma^{(c)}\) using an alphabet \(\Sigma\):
Reflexivity:
\[s^{(c)} \in \Sigma^{(c)} \implies s = Id(s) = \sigma^{0}(s) \implies s^{(c)} = s^{(c)}\]Symetry: \(\forall s_1^{(c)}, s_2^{(c)} \in \Sigma^{(c)} \times \Sigma^{(c)}\):
\[\begin{split}\begin{array}{lll} s_1^{(c)} = s_2^{(c)} &\iff& \exists k \in \mathbb{N}, s_1 = \sigma^k(s_2) \\ &\iff& \exists k \in \mathbb{N}, s_2 = \sigma^{-k}(s_1) \\ &\iff& \exists k \in \mathbb{N}, s_2 = \sigma^{\lvert s_1 \rvert - k}(s_1) \\ &\iff& s_2^{(c)} = s_1^{(c)} \end{array}\end{split}\]Transitivity: \(\forall s_1, s_2, s_3 \in \Sigma^{(c)} \times \Sigma^{(c)} \times \Sigma^{(c)}\)
\[\begin{split}\begin{array}{lll} \begin{cases} s_1^{(c)} = s_2^{(c)} \\ s_2^{(c)} = s_3^{(c)} \end{cases} &\implies& \begin{cases} \exists k_1 \in \mathbb{N}, s_1 = \sigma^{k_1}(s_2) \\ \exists k_2 \in \mathbb{N}, s_2 = \sigma^{k_2}(s_3) \end{cases} \\ &\implies& \exists k_1, k_2 \in \mathbb{N}^2, s_1 = \sigma^{k_1} \circ \sigma^{k_2}(s_3) \\ &\implies& \exists k_1, k_2 \in \mathbb{N}^2, s_1 = \sigma^{k_1 + k_2}(s_3) \\ &\implies& s_1^{(c)} = s_3^{(c)} \end{array}\end{split}\]
Definition: Automaton acception
Given a finite automaton \(A\) over an alphabet \(\Sigma\), and \(u^{(c)}\) a sequence of \(\Sigma^{(c)}\), \(A\) accepts \(u^{(c)}\) iff there exist a sequence \(v\) of \(\Sigma^\star\) such that:
\(v^{(c)} = u^{(c)}\)
\(A\) accepts \(v\)
Restriction Enzymes¶
Definition
Given a genetic alphabet \(\langle \Sigma, \sim \rangle\), a restriction enzyme \(e\) can be defined as a tuple \((S, n, k)\) where:
\(S \subseteq \Sigma^\star\) is the finite set of recognition sites that \(e\) binds to
\(\forall (s, s\prime) \in S^2, \lvert s \rvert = \lvert s\prime \rvert\)
\(n \in \mathbb{Z}\) is the cutting offset between the last nucleotides of the site and the first nucleotide of the restriction cut
\(k \in \mathbb{Z}\) is the overhang length:
\(k = 0\) if the enzyme produces blunt cuts
\(k > 0\) if the enzyme produces \(5\prime\) overhangs
\(k < 0\) if the enzyme produce \(3\prime\) overhangs
\(\forall (s, s\prime) \in S^2, \lvert s \rvert = \lvert s\prime \rvert\)
\(n \ge - \lvert s \rvert, s \in S\)
Note
This definition only covers single-cut restriction enzymes found in vivo, but we don’t need to cover the case of double-cut restriction enzymes since they are not used in modular cloning.
Definition: Enzyme types
A restriction enzyme \((S, n, k)\) is:
a blunt cutter is \(k = 0\)
an asymmetric cutter if \(k \ne 0\)
a Type IIS enzyme if:
\(n \ge 0\)
\(\forall s \in S, s \ne \overline{s}\)
Golden Gate Assembly¶
Definition
An assembly is a function of \(\mathcal{P}(\Sigma^\star \cup \Sigma^{(c)}) \times \mathcal{P}(E)\) to \(\mathcal{P}(\Sigma^\star \cup \Sigma^{(c)})\), which to a set of distinct sequences \(\{d_1, \dots, d_m\}\) and a set of restriction enzymes \(\{e_1, \dots, e_n\}\) associates the set of digested/ligated sequences \(A = \{a_1, \dots a_k\}\).
The notation for an assembly is: