Preliminary Definitions

Genetic Alphabet

Definition

A genetic alphabet \(\langle \Sigma,\sim \rangle\) is an algebraic structure on an alphabet \(\Sigma\) with a unary operation \(\sim\) verifying the following properties:

  • \(\sim: \Sigma^\star \to \Sigma^\star\) is a bijection

  • \(\forall x \in \Sigma^\star, \lvert \widetilde{x} \rvert = \lvert x \rvert\)

  • \(\forall (x, y) \in (\Sigma^\star)^2, \quad \widetilde{x \cdot y} = \widetilde{\,y\,} \cdot \widetilde{\,x\,}\)

Note

To stay consistent with the biology lexicon, we will be referring to a word over a genetic alphabet as a sequence, only explicitly naming a mathematical sequence when needed to.

Examples

  • \((\{A, T, G, C\}, \sim)\) is the standard genetic alphabet, with \(\sim\) defined as \(\widetilde{A \cdot G} = C \cdot T\).

  • \((\{A, T, G, C, d5SICS, dNaM\}, \sim)\) is the genetic alphabet using the unnatural base pairs from Malyshev et al., Nature 2014, with \(\sim\) defined as \(\widetilde{A \cdot G \cdot d5ICS} = dNaM \cdot C \cdot T\)

Circular Sequences

Definition

A circular word over an alphabet \(\Sigma\) is a finite word with no end. It can be noted \(w^{(c)}\), where \(w\) is a finite word of \(\Sigma^\star\).

Definition: Cardinality

Given a circular sequence \(s^{(c)}\), the cardinal of \(s^{(c)}\), noted \(\lvert s^{(c)} \rvert\), is defined as:

\[\lvert s^{(c)} \rvert = \lvert s \rvert\]

Definition: Equality

Given two sequences \(a^{(c)}\) and \(b^{(c)}\) with

\[\begin{split}\begin{array}{lllll} a &=& a_0 \cdot a_1 \cdot \, \dots \, \cdot a_m & \in \Sigma^{(m)}, & m \in \mathbb{N} \\ b &=& b_0 \cdot b_1 \cdot \, \dots \, \cdot b_n & \in \Sigma^{(n)}, & n \in \mathbb{N} \end{array}\end{split}\]

let the \(=\) relation be defined as:

\[a^{(c)} = b^{(c)} \iff \exists k \in \mathbb{N}, a = \sigma^{k}(b)\]

where \(\sigma\) is the circular shift defined as:

\[\begin{split}\begin{array}l \forall u = u_1 \cdot u_2 \cdot\,\dots\,\cdot u_k \in \Sigma^k, \\ \quad \quad \sigma(u_1 \cdot u_2 \cdot\,\dots\,\cdot u_k) = u_k \cdot u_1 \cdot u_2 \cdot \, \dots \, \cdot u_{k-1} \end{array}\end{split}\]

Property

\(=\) is a relation of equivalence over \(\Sigma^{(c)}\)

Demonstration

Given the set of circular sequences \(\Sigma^{(c)}\) using an alphabet \(\Sigma\):

  • Reflexivity:

    \[s^{(c)} \in \Sigma^{(c)} \implies s = Id(s) = \sigma^{0}(s) \implies s^{(c)} = s^{(c)}\]
  • Symetry: \(\forall s_1^{(c)}, s_2^{(c)} \in \Sigma^{(c)} \times \Sigma^{(c)}\):

    \[\begin{split}\begin{array}{lll} s_1^{(c)} = s_2^{(c)} &\iff& \exists k \in \mathbb{N}, s_1 = \sigma^k(s_2) \\ &\iff& \exists k \in \mathbb{N}, s_2 = \sigma^{-k}(s_1) \\ &\iff& \exists k \in \mathbb{N}, s_2 = \sigma^{\lvert s_1 \rvert - k}(s_1) \\ &\iff& s_2^{(c)} = s_1^{(c)} \end{array}\end{split}\]
  • Transitivity: \(\forall s_1, s_2, s_3 \in \Sigma^{(c)} \times \Sigma^{(c)} \times \Sigma^{(c)}\)

    \[\begin{split}\begin{array}{lll} \begin{cases} s_1^{(c)} = s_2^{(c)} \\ s_2^{(c)} = s_3^{(c)} \end{cases} &\implies& \begin{cases} \exists k_1 \in \mathbb{N}, s_1 = \sigma^{k_1}(s_2) \\ \exists k_2 \in \mathbb{N}, s_2 = \sigma^{k_2}(s_3) \end{cases} \\ &\implies& \exists k_1, k_2 \in \mathbb{N}^2, s_1 = \sigma^{k_1} \circ \sigma^{k_2}(s_3) \\ &\implies& \exists k_1, k_2 \in \mathbb{N}^2, s_1 = \sigma^{k_1 + k_2}(s_3) \\ &\implies& s_1^{(c)} = s_3^{(c)} \end{array}\end{split}\]

Definition: Automaton acception

Given a finite automaton \(A\) over an alphabet \(\Sigma\), and \(u^{(c)}\) a sequence of \(\Sigma^{(c)}\), \(A\) accepts \(u^{(c)}\) iff there exist a sequence \(v\) of \(\Sigma^\star\) such that:

  • \(v^{(c)} = u^{(c)}\)

  • \(A\) accepts \(v\)

Restriction Enzymes

Definition

Given a genetic alphabet \(\langle \Sigma, \sim \rangle\), a restriction enzyme \(e\) can be defined as a tuple \((S, n, k)\) where:

  • \(S \subseteq \Sigma^\star\) is the finite set of recognition sites that \(e\) binds to

  • \(\forall (s, s\prime) \in S^2, \lvert s \rvert = \lvert s\prime \rvert\)

  • \(n \in \mathbb{Z}\) is the cutting offset between the last nucleotides of the site and the first nucleotide of the restriction cut

  • \(k \in \mathbb{Z}\) is the overhang length:

    • \(k = 0\) if the enzyme produces blunt cuts

    • \(k > 0\) if the enzyme produces \(5\prime\) overhangs

    • \(k < 0\) if the enzyme produce \(3\prime\) overhangs

  • \(\forall (s, s\prime) \in S^2, \lvert s \rvert = \lvert s\prime \rvert\)

  • \(n \ge - \lvert s \rvert, s \in S\)

Note

This definition only covers single-cut restriction enzymes found in vivo, but we don’t need to cover the case of double-cut restriction enzymes since they are not used in modular cloning.

Definition: Enzyme types

A restriction enzyme \((S, n, k)\) is:

  • a blunt cutter is \(k = 0\)

  • an asymmetric cutter if \(k \ne 0\)

  • a Type IIS enzyme if:

    • \(n \ge 0\)

    • \(\forall s \in S, s \ne \overline{s}\)

Golden Gate Assembly

Definition

An assembly is a function of \(\mathcal{P}(\Sigma^\star \cup \Sigma^{(c)}) \times \mathcal{P}(E)\) to \(\mathcal{P}(\Sigma^\star \cup \Sigma^{(c)})\), which to a set of distinct sequences \(\{d_1, \dots, d_m\}\) and a set of restriction enzymes \(\{e_1, \dots, e_n\}\) associates the set of digested/ligated sequences \(A = \{a_1, \dots a_k\}\).

The notation for an assembly is:

\[d_1 + \dots + d_m \xrightarrow{\quad e_1, \dots, e_n \quad} a_1 + \dots + a_k\]