MoClo

Source PyPI Travis Docs Codecov Codacy Format License

The MoClo system is a standard for molecular cloning that relies on the Golden Gate Assembly technique.

Concepts and Definitions

Concepts

Introduction

The MoClo standard was first presented in the Weber et al., 2011 [21364738] paper, as an attempt to standardize the process of assembling complex DNA molecules from smaller genetic elements. It is inspired by two previous standards:

  • NOMAD [8855278], which proposed generic notions of modules and vectors, as well as assembly using Type IIS enzymes. Modules can be combined in any order, but are clone sequentially one module at a time.
  • BioBrick [18410688], which defines parts with a stable structure: assembling two parts together always gives a part with the same flanking restriction sites.

The MoClo standard enhances both of these assembly standards by relying on the Golden Gate Assembly, which allows single-step assembly of an arbitrary number of modules into a vector. Furthermore, MoClo parts are flanked by stereotypical overhangs, enforcing a particular assembly order, therefore allowing only the desired contruct to be obtained.

Type II-S enzymes

Restriction enzymes are enzymes that are able to cut DNA at or near specific recognition sites. Among those enzymes, Type IIS enzymes cut DNA out of the sequence they recognize, at a defined distance. The cut can produce cohesive ends, which can then recombine with other sequences sharing the complementary cohesive ends, or blunt ends, which cannot recombine. The design of the cohesive ends is of great importance when using Type II-S enzymes to do molecular cloning.

Golden Gate Assembly

The Golden Gate Assembly relies on Type II-S enzymes to assemble several DNA sequences. The sequences are first cut by restriction enzymes, and then assembled together using a T4 DNA ligase. These two steps can be repeated in a single reaction tube using a thermo cycler, as the two enzymes typically do not work at the same temperature. As standard Type II-S enzymes, such as BsaI or BsmBI, create a 4-base-long cohesive end when cutting the DNA, there can be as much as 256 fragments combined together in a deterministic way in a single assembly, although in vivo the chemical properties of the nucleotides will most likely prevent assemblies that large to succeed.

_images/assembly.svg

Example GoldenGate assembly of two modules in a vector using BsaI.

The MoClo system

The MoClo system combines the idea of a standard part format from the BioBrick standard, with the Golden Gate assembly protocol, allowing several modules to be assembled in a vector at the same time.

Hierarchy

MoClo modules and vectors are divided into several levels, describing their structural and transcriptional features:

  • Level -1 modules are sequences that are not yet in a standardized backbone, but can be assembled in a dedicated vector to form a level 0 module. They are most of the time obtained via oligonucelotide synthesis, or PCR.
  • Level 0 modules are standardized genetic elements: promoter, 5’ UTR, signal sequence, CDS, terminator.
  • Level 1 modules are transcription units, formed by a combination of Level 0 modules, and are able to express proteins
  • Level 2 modules are multigenic units, containing several transcription units, and are able to express many genes at onces.

Furthermore, the enzyme used during the Golden Gate Assembly depends on the assembly level. Alternating between the two enzymes makes it possible for an infinite number of genes to be inserted in the same plasmid, although biological limits are reached in vivo.

Types definition

Although transcription units can be assembled in any possible order in their destination vectors, level 0 modules must be assembled in a specific order to obtain a functional genetic construct. In order to enforce the assembly order, parts are flanked by fusion sites with standard sequences, which are unique to the type of the part. A valid level 1 module is obtained by assembling a part of each type into the destination vector.

Assembly markers

Once the Golden Gate Assembly is finished, the obtained constructs can be amplified using a bacterial host. After transformation, bacteria are selected using two different factors:

  • An antibiotic for which a resistance cassette is only availble on the vector, but not on any module: this allows selecting all the bacterias that received the vector plasmid
  • A marker for a dropout reporter gene that can only be found in the vector but not in the final construct (such as the gfp or lacZ genes).

This double screening makes it possible to select only the bacterias that contain the expected construct, discarding the others, and retrieving the assembled plasmid through a miniprep.

References

[8855278]Rebatchouk, D, N Daraselia, and J O Narita. ‘NOMAD: A Versatile Strategy for in Vitro DNA Manipulation Applied to Promoter Analysis and Vector Design.’ Proceedings of the National Academy of Sciences of the United States of America 93, no. 20 (1 October 1996): 10891–96. pmid:8855278
[18410688]Shetty, Reshma P, Drew Endy, and Thomas F Knight. ‘Engineering BioBrick Vectors from BioBrick Parts’. Journal of Biological Engineering 2 (14 April 2008): 5. doi:10.1186/1754-1611-2-5
[21364738]Weber, Ernst, Carola Engler, Ramona Gruetzner, Stefan Werner, and Sylvestre Marillonnet. ‘A Modular Cloning System for Standardized Assembly of Multigene Constructs’. PLOS ONE 6, no. 2 (18 February 2011): e16765. doi:10.1371/journal.pone.0016765

Definitions

Molecular Cloning
Molecular cloning is the process of assembling together fragments of DNA to obtain a more complex molecule, often presenting genetic features of interest. It describes a process, not a technique
GoldenGate
GoldenGate is a molecular cloning technique that uses Type IIS restriction enzymes to cut and assemble DNA sequences into recombinant DNA molecules. It describes a technique
Modular Cloning
A Modular Cloning system uses the GoldenGate technique to assemble several genetic modules of a given level into a vector of the same level. It can also define types, which are modules or vectors with specific overhangs that are collections of sequences that are functionnally and structuraly equivalent to each other.
MoClo
MoClo is originally the name of a modular cloning system published by the Marillonnet Lab which defines a set of vectors and modules to be used to assemble multigenic expression devices for plants. An extension was later provided by the same team proposing potentially infinite assemblies multigenic expression devices with the addition of two levels. Other modular cloning systems, inspired by them, were published under the name of MoClo (such as MoClo YTK, MoClo CIDAR, MoClo EcloFlex, etc.). In this work, the original toolkit is named MoClo IG, and MoClo is used as an abbreviation of modular cloning as defined above.

Descriptive Theory

This section introduces the theory that was developed to support the software implementation of the modular cloning logic. It introduces mathematical definitions of biological concepts, relying on particular on formal language theory.

Preliminary Definitions

Genetic Alphabet

Definition

A genetic alphabet \(\langle \Sigma,\sim \rangle\) is an algebraic structure on an alphabet \(\Sigma\) with a unary operation \(\sim\) verifying the following properties:

  • \(\sim: \Sigma^\star \to \Sigma^\star\) is a bijection
  • \(\forall x \in \Sigma^\star, \lvert \widetilde{x} \rvert = \lvert x \rvert\)
  • \(\forall (x, y) \in (\Sigma^\star)^2, \quad \widetilde{x \cdot y} = \widetilde{\,y\,} \cdot \widetilde{\,x\,}\)

Note

To stay consistent with the biology lexicon, we will be referring to a word over a genetic alphabet as a sequence, only explicitly naming a mathematical sequence when needed to.

Examples

  • \((\{A, T, G, C\}, \sim)\) is the standard genetic alphabet, with \(\sim\) defined as \(\widetilde{A \cdot G} = C \cdot T\).
  • \((\{A, T, G, C, d5SICS, dNaM\}, \sim)\) is the genetic alphabet using the unnatural base pairs from Malyshev et al., Nature 2014, with \(\sim\) defined as \(\widetilde{A \cdot G \cdot d5ICS} = dNaM \cdot C \cdot T\)
Circular Sequences

Definition

A circular word over an alphabet \(\Sigma\) is a finite word with no end. It can be noted \(w^{(c)}\), where \(w\) is a finite word of \(\Sigma^\star\).

Definition: Cardinality

Given a circular sequence \(s^{(c)}\), the cardinal of \(s^{(c)}\), noted \(\lvert s^{(c)} \rvert\), is defined as:

\[\lvert s^{(c)} \rvert = \lvert s \rvert\]

Definition: Equality

Given two sequences \(a^{(c)}\) and \(b^{(c)}\) with

\[\begin{split}\begin{array}{lllll} a &=& a_0 \cdot a_1 \cdot \, \dots \, \cdot a_m & \in \Sigma^{(m)}, & m \in \mathbb{N} \\ b &=& b_0 \cdot b_1 \cdot \, \dots \, \cdot b_n & \in \Sigma^{(n)}, & n \in \mathbb{N} \end{array}\end{split}\]

let the \(=\) relation be defined as:

\[a^{(c)} = b^{(c)} \iff \exists k \in \mathbb{N}, a = \sigma^{k}(b)\]

where \(\sigma\) is the circular shift defined as:

\[\begin{split}\begin{array}l \forall u = u_1 \cdot u_2 \cdot\,\dots\,\cdot u_k \in \Sigma^k, \\ \quad \quad \sigma(u_1 \cdot u_2 \cdot\,\dots\,\cdot u_k) = u_k \cdot u_1 \cdot u_2 \cdot \, \dots \, \cdot u_{k-1} \end{array}\end{split}\]

Property

\(=\) is a relation of equivalence over \(\Sigma^{(c)}\)

Demonstration

Given the set of circular sequences \(\Sigma^{(c)}\) using an alphabet \(\Sigma\):

  • Reflexivity:

    \[s^{(c)} \in \Sigma^{(c)} \implies s = Id(s) = \sigma^{0}(s) \implies s^{(c)} = s^{(c)}\]
  • Symetry: \(\forall s_1^{(c)}, s_2^{(c)} \in \Sigma^{(c)} \times \Sigma^{(c)}\):

    \[\begin{split}\begin{array}{lll} s_1^{(c)} = s_2^{(c)} &\iff& \exists k \in \mathbb{N}, s_1 = \sigma^k(s_2) \\ &\iff& \exists k \in \mathbb{N}, s_2 = \sigma^{-k}(s_1) \\ &\iff& \exists k \in \mathbb{N}, s_2 = \sigma^{\lvert s_1 \rvert - k}(s_1) \\ &\iff& s_2^{(c)} = s_1^{(c)} \end{array}\end{split}\]
  • Transitivity: \(\forall s_1, s_2, s_3 \in \Sigma^{(c)} \times \Sigma^{(c)} \times \Sigma^{(c)}\)

    \[\begin{split}\begin{array}{lll} \begin{cases} s_1^{(c)} = s_2^{(c)} \\ s_2^{(c)} = s_3^{(c)} \end{cases} &\implies& \begin{cases} \exists k_1 \in \mathbb{N}, s_1 = \sigma^{k_1}(s_2) \\ \exists k_2 \in \mathbb{N}, s_2 = \sigma^{k_2}(s_3) \end{cases} \\ &\implies& \exists k_1, k_2 \in \mathbb{N}^2, s_1 = \sigma^{k_1} \circ \sigma^{k_2}(s_3) \\ &\implies& \exists k_1, k_2 \in \mathbb{N}^2, s_1 = \sigma^{k_1 + k_2}(s_3) \\ &\implies& s_1^{(c)} = s_3^{(c)} \end{array}\end{split}\]

Definition: Automaton acception

Given a finite automaton \(A\) over an alphabet \(\Sigma\), and \(u^{(c)}\) a sequence of \(\Sigma^{(c)}\), \(A\) accepts \(u^{(c)}\) iff there exist a sequence \(v\) of \(\Sigma^\star\) such that:

  • \(v^{(c)} = u^{(c)}\)
  • \(A\) accepts \(v\)
Restriction Enzymes

Definition

Given a genetic alphabet \(\langle \Sigma, \sim \rangle\), a restriction enzyme \(e\) can be defined as a tuple \((S, n, k)\) where:

  • \(S \subseteq \Sigma^\star\) is the finite set of recognition sites that \(e\) binds to
  • \(\forall (s, s\prime) \in S^2, \lvert s \rvert = \lvert s\prime \rvert\)
  • \(n \in \mathbb{Z}\) is the cutting offset between the last nucleotides of the site and the first nucleotide of the restriction cut
  • \(k \in \mathbb{Z}\) is the overhang length:
    • \(k = 0\) if the enzyme produces blunt cuts
    • \(k > 0\) if the enzyme produces \(5\prime\) overhangs
    • \(k < 0\) if the enzyme produce \(3\prime\) overhangs
  • \(\forall (s, s\prime) \in S^2, \lvert s \rvert = \lvert s\prime \rvert\)
  • \(n \ge - \lvert s \rvert, s \in S\)

Note

This definition only covers single-cut restriction enzymes found in vivo, but we don’t need to cover the case of double-cut restriction enzymes since they are not used in modular cloning.

Definition: Enzyme types

A restriction enzyme \((S, n, k)\) is:

  • a blunt cutter is \(k = 0\)
  • an asymmetric cutter if \(k \ne 0\)
  • a Type IIS enzyme if:
    • \(n \ge 0\)
    • \(\forall s \in S, s \ne \overline{s}\)
Golden Gate Assembly

Definition

An assembly is a function of \(\mathcal{P}(\Sigma^\star \cup \Sigma^{(c)}) \times \mathcal{P}(E)\) to \(\mathcal{P}(\Sigma^\star \cup \Sigma^{(c)})\), which to a set of distinct sequences \(\{d_1, \dots, d_m\}\) and a set of restriction enzymes \(\{e_1, \dots, e_n\}\) associates the set of digested/ligated sequences \(A = \{a_1, \dots a_k\}\).

The notation for an assembly is:

\[d_1 + \dots d_m \xrightarrow{\quad e_1, \dots, e_n \quad} a_1 + \dots + a_k\]

Standard Modular Cloning System

System Definition

Definition

Given a genetic alphabet \(\langle \Sigma, \sim \rangle\), a Modular Cloning System \(S\) is defined as a mathematical sequence

\[(M_l,\ V_l,\ e_l)_ {\ l\ \ge -1}\]

where:

  • \(M_l \subseteq \Sigma^\star \cup \Sigma^{(c)}\) is the set of modules of level \(l\)
  • \(V_l \subseteq \Sigma^{(c)}\) is the set of vectors of level \(l\)
  • \(e_l \subseteq E\) is the finite, non-empty set of asymmetric, Type IIS restriction enzymes of level \(l\)

Definition: \(k\)-cyclicity

A Modular Cloning System \((M_l, V_l, e_l)_ {l \ge -1}\) is said to be \(k\)-cyclic after a level \(\lambda\) if:

\[\begin{split}\begin{array}{ll} \exists k \in N^\star, & \\ \forall l \ge \lambda, & \\ & \begin{cases} M_{l+k} \subseteq M_l \\ V_{l+k} \subseteq V_l \\ e_{l+k} \subseteq e_l \end{cases} \end{array}\end{split}\]

Definition: \(\lambda\)-limit

A Modular Cloning System \((M_l, V_l, e_l)_ {l \ge -1}\) is said to be \(\lambda\)-limited if:

\[\forall l \ge \lambda, M_l = \emptyset, V_l = \emptyset, e_l = \emptyset\]
Modules

Definition

For a given level \(l\), \(M_l\) is defined as the set of modules \(m \in \Sigma^\star \cup \Sigma^{(c)}\) for which:

\[\begin{split}\begin{array}{l} \exists ! (S, n, k) \in e_l, \\ \exists ! (S^\prime, n^\prime, k^\prime) \in e_l, \\ \exists ! (s, s^\prime) \in S \times S^\prime, \\ \exists ! (x, y, o_5, o_3) \in (\Sigma^\star)^4, \\ \\ \quad \exists ! t \in \Sigma^\star, \left\{ \begin{array}{lll} \exists ! b \in \Sigma^\star,\ & m = (s \cdot x \cdot o_5 \cdot t \cdot o_3 \cdot y \cdot \widetilde{s^\prime} \cdot b)^{(c)}, & \text{ if } m \in \Sigma^{(c)}\\ \exists ! u, v \in (\Sigma^\star)^2, & m = u \cdot s \cdot x \cdot o_5 \cdot t \cdot o_3 \cdot y \cdot \widetilde{s^\prime} \cdot v, & \text{ if } m \not \in \Sigma^{(c)} \end{array} \right. \end{array}\end{split}\]

with:

  • \(|x| = n\)
  • \(|y| = n^\prime\)
  • \(|o_5| = abs(k)\)
  • \(|o_3| = abs(k^\prime)\)

Note

This decomposition is called the canonic module decomposition, where:

  • \(t\) is the target sequence of the module \(m\)
  • \(b\) is the backbone of the module \(m\) (if \(m\) is circular)
  • \(u\) and \(v\) are called the prefix and suffix of the module \(m\) (if \(m\) is not circular)
  • \(o_5\) and \(o_3\) are the upstream and downstream overhangs respectively.

Property

\(\forall \langle \Sigma, \sim \rangle\), \(\forall l \ge -1\), \(\forall e_l \subset E\):

\[M_l \text{ is a rational language }\]

Demonstration

Let there be a genetic alphabet \(\langle \Sigma, \sim \rangle\) and a Modular Cloning System \((M_l, V_l, e_l)_ {l \ge -1}\) over it.

\(\forall l \ge -1\), the regular expression:

\[\begin{split}\begin{array}l \bigcup_{\begin{array}l(S, n, k) \in e_l \\ (S\prime, n\prime, k\prime) \in e_l\end{array}} \Sigma^\star \cdot S \cdot \Sigma^n \cdot \Sigma^{abs(k)} \cdot \Sigma^\star \cdot \overline{(S | S^\prime)} \cdot \Sigma^\star \cdot \Sigma^{abs(k\prime)} \cdot \Sigma^{n\prime} \cdot \widetilde{\,S\prime\,} \cdot \Sigma^\star \\ \end{array}\end{split}\]

where:

matches a sequence \(m \in \Sigma^\star \cup \Sigma^{(c)}\) if and only if \(m \in M_l\).

\(M_l\) is regular, so given Kleene’s Theorem, \(M_l\) is rational.

Vectors

Definition

For a given level \(l\), \(V_l\) is defined as the set of vectors \(v \in \Sigma^{(c)}\) for which:

\[\begin{split}\begin{array}{l} \exists ! (S, n, k) \in e_l, \\ \exists ! (S^\prime, n^\prime, k^\prime) \in e_l, \\ \exists ! (s, s^\prime) \in S \times S^\prime, \\ \exists ! (x, y, o_5, o_3) \in (\Sigma^\star)^4, \\ \\ \quad \exists ! (b, p) \in (\Sigma^\star)^2, \exists ! b \in \Sigma^\star,\ v = (o_3 \cdot b \cdot o_5 \cdot y \cdot \widetilde{s} \cdot p \cdot s\prime \cdot x)^{(c)} \\ \end{array}\end{split}\]

with:

  • \(|x| = n\)
  • \(|y| = n^\prime\)
  • \(|o_5| = abs(k)\)
  • \(|o_3| = abs(k^\prime)\)
  • \(o_3 \ne o_5\)

Note

This decomposition is called the canonic vector decomposition, where:

  • \(p\) is the placeholder sequence of the vector \(v\)
  • \(b\) is the backbone of the vector \(v\)
  • \(o_3\) and \(o_5\) are the upstream and downstream overhangs respectively.
Overhangs

By definition, every valid level \(l\) module and vector only have a single canonic decomposition where they have unique \(o_5\) and \(o_3\) overhangs. As such, let the function \(up\) (resp. \(down\)) be defined as the function which:

  • to a module \(m\) associates the word \(o_5\) (resp. \(o_3\)) from its canonic module decomposition
  • to a vector \(v\) associates the word \(o_3\) (resp. \(o_5\)) from its canonic vector decomposition.
Standard Assembly

Definition: Standard MoClo Assembly

Given an assembly of level \(l\), where \(m_1, \dots, m_k \in M_l^k, v \in V_l\):

\[a:\quad m_1 + \dots + m_k \xrightarrow{\quad e_l \quad} A \subset (\Sigma^\star \cup \Sigma^{(c)})\]

and the partial order \(le\) over \(S = \{m_1, \dots, m_k\}\) defined as:

\[\begin{split}\begin{array}{l} \forall x, y \in S^2, \\ \quad x \le y \iff \begin{cases} x = y & \\ down(x) = up(y) & \text{ if } x \ne y\\ \exists z \in S \backslash \{x, y\}, down(x) = up(z), \ z \le y & \text{ if } x \ne y \text{ and } down(x) \ne up(y) \end{cases} \end{array}\end{split}\]

then a chain \(\langle S\prime, \le \rangle \subset \langle S, \le \rangle\) is an insert if:

\[\begin{split}\begin{cases} v \le min(S^\prime) \\ max(S^\prime) \le v \end{cases} \iff \begin{cases} down(v) = up(min(S^\prime)) \\ up(v) = down(max(S^\prime)) \end{cases}\end{split}\]

\(a\) is:

  • invalid if \(\langle S, \le \rangle\) is an antichain or \(\langle S, \ge \rangle\) has no insert.
  • valid if \(\langle S, \le \rangle\) has at least one insert.
  • ambiguous if \(\langle S, \le \rangle\) has more than one insert.
  • unambiguous if \(\langle S, \le \rangle\) has exactly one insert.
  • complete if \(\langle S, \le \rangle\) is an insert.

Corollary

If an assembly \(a\) is complete, then there exist a permutation \(\pi\) of \([\![1, k]\!]\) such that:

\[m_{\pi(1)} \le m_{\pi(2)} \le \dots \le m_{\pi(k-1)} \le m_{\pi(k)}\]

and:

\[\begin{split}\begin{array}{lll} up(m_{\pi(1)}) &=& down(v) \\ down(m_{\pi(k)}) &=& up(v) \end{array}\end{split}\]

Property: Uniqueness of the cohesive ends

If an assembly

\[m_1 + \dots + m_k \xrightarrow{\quad e_l \quad} A \subset (\Sigma^\star \cup \Sigma^{(c)})\]

is unambiguous and complete, then \(\forall i \in [\![1, k]\!]\),

\[\begin{split}\left\{ \begin{array}{llll} up(m_i) &\ne& down(m_i)& \\ up(m_i) &\ne& up(m_j), & j \in [\![1, k]\!] \backslash \{i\} \\ down(m_i) &\ne& down(m_j), & j \in [\![1, k]\!] \backslash \{i\} \\ \end{array} \right .\end{split}\]

Demonstration

Let there be an unambiguous complete assembly

\[a:\quad m_1 + \dots + m_k \xrightarrow{\quad e_l \quad} A\]
  • \(up(m_i) \ne down(m_i)\)

    Let’s suppose that \(\exists i \in [\![1, k]\!]\) such that

    \[up(m_i) = down(m_i)\]

    then \(\langle \{m_1, \dots, m_k\} \backslash \{m_i\}, \le \rangle\) is also an insert, which cannot be since \(a\) is complete.

  • \(up(m_i) \ne up(m_j)\)

    Let’s suppose that \(\exists (i, j) \in [\![1, k]\!]^2\) such that

    \[up(m_i) = up(m_j)\]

    Since the \(a\) is complete, there exists \(pi\) such that

    \[m_{\pi(1)} \le m_{\pi(2)} \le \dots \le m_{\pi(k-1)} \le m_{\pi(k)}\]

    and since \(a\) is unambiguous, \(\langle \{m_1, \dots, m_k\}, \le \rangle\) is the only insert.

  • \(down(m_i) \ne down(m_j)\)

    TODO

Property: Uniqueness of the assembled plasmid

If an assembly

\[m_1 + \dots + m_k \xrightarrow{\quad e_l \quad} A \subset (\Sigma^\star \cup \Sigma^{(c)})\]

is unambiguous, then

\[A \cap \Sigma^{(c)} = \{p\}\]

with

\[p = \left( up(v) \cdot b \cdot up(m_{\pi(1)}) \cdot t_{\pi(1)} \cdot \, \dots \, \cdot up(m_{\pi(n)}) \cdot t_{\pi(n)} \right) ^{(c)}\]

(\(n \le k\), \(n = k\) if \(a\) is complete).

Demonstration

TODO

Typed Modular Cloning System

System Definition

Definition

Given a genetic alphabet \(\langle \Sigma, \sim \rangle\), a Typed Modular Cloning System \(S\) is defined as a mathematical sequence

\[(M_l,\ V_l,\ \mathcal{M}_l,\ \mathcal{V}_l,\ e_l)_ {\ l\ \ge -1}\]

where:

  • \((M_l, V_l, e_l)_{l \ge -1}\) is a standard Modular Cloning System
  • \(\mathcal{M}_l \subseteq \mathcal{P}(M_l) \to \mathcal{P}(M_l)\) is the set of module types of level \(l\)
  • \(\mathcal{V}_l \subseteq \mathcal{P}(V_l) \to \mathcal{P}(V_l)\) is the set of vector types of level \(l\)
Types

Definition

\(\forall l \ge -1\), we define types using their signatures (i.e. the sets of upstream and downstream overhangs of elements using this type):

\[\begin{split}\begin{array}{ll} \forall t \in \mathcal{M}_l,& \begin{cases} Up(t) &= \bigcup_{m \in t(M_l)} \{ up(m) \} \\ Down(t) &= \bigcup_{m \in t(M_l)} \{ down(m) \} \end{cases} \\ \forall t \in \mathcal{V}_l,& \begin{cases} Up(t) &= \bigcup_{v \in t(V_l)} \{ up(v) \} \\ Down(t) &= \bigcup_{v \in t(V_l)} \{ down(v) \} \end{cases} \end{array}\end{split}\]

Corollary

\(\forall l \ge -1\),

\[\begin{split}\begin{array}{lll} \forall t \in \mathcal{M}_l,&\ t(M_l) &= \{ m \in M_l\ |\ up(m) \in Up(t),\ down(m) \in Down(t) \} \\ \forall t \in \mathcal{V}_l,&\ t(V_l) &= \{ v \in V_l\ |\ up(v) \in Up(t),\ down(v) \in Down(t) \} \end{array}\end{split}\]

Property: Structural equivalence of module types

Given a valid (resp. unambiguous) (resp. complete) assembly

\[m_1 + \dots + m_k + v \xrightarrow{e_l} A \subset (\Sigma^\star \cup \Sigma^{(c)})\]

then if there exist \(t \in \mathcal{M}_l\) such that

\[\begin{split}\begin{cases} \lvert Up(t) \rvert = \lvert Down(t) \rvert = 1 \\ m_1 \in t(M_l) \end{cases}\end{split}\]

then \(\forall m_1\prime \in t(M_l)\),

\[m_1\prime + \dots + m_k + v \xrightarrow{e_l} A \subset (\Sigma^\star \cup \Sigma^{(c)})\]

is valid (resp. unambiguous) (resp. complete).

Library

Installation

The moclo module is designed to be modular, and as such, you only need to install whatever functionalities you are willing to use. Packages are distributed on PyPI, and it is advised to use pip to install them. See the pip documentation to get pip if it is not installed on your system.

Commands below use pip in user mode: the packages will be installed in a user-dependent location, and no additional permissions are needed. If for some reason you need a system-wide setup, remove the --user flag. Installing in user-mode should be prefered to avoid dependency issues, in particular when on an OS which provides a package manager (such as aptitude on Debian, or even homebrew on Mac OSX).

PyPI + pip PyPI

To download the latest release from the Python Package Index:

$ pip install --user moclo moclo-ytk moclo-cidar

GitHub + pip Travis

To download the development version from the source repository, you can specify a subfolder in the installation command and directly install it:

$ pip install --user git+https://github.com/althonos/moclo#subdirectory=moclo
$ pip install --user git+https://github.com/althonos/moclo#subdirectory=moclo-ytk
$ pip install --user git+https://github.com/althonos/moclo#subdirectory=moclo-cidar

Check the CI build is passing, or else you may be installing a broken version of the library !

Examples

This page contains examples in Python code, generated from Jupyter notebooks with nbsphinx.

YTK integration vector

In this example, we will be using the moclo library as well as the moclo-ytk extension kit to generate the pre-assembled YTK integration vector (pYTK096) from the available YTK parts, as described in the *Lee et al.* paper

Structure

The list of parts, as well as the vector structure, can be found in the Supporting Table S1 from the Lee et al. supplementary materials:

image0

Loading parts

We’ll be loading each of the desired parts from the moclo-ytk registry. It is generated from the GenBank distributed with the YTK kits. They can be found on the AddGene YTK page.

[2]:
from moclo.registry.ytk import YTKRegistry
registry = YTKRegistry()

vector = registry['pYTK090'].entity         # Part 8a
modules = [registry['pYTK008'].entity,      # Part 1
           registry['pYTK047'].entity,      # Part 234r
           registry['pYTK073'].entity,      # Part 5
           registry['pYTK074'].entity,      # Part 6
           registry['pYTK086'].entity,      # Part 7
           registry['pYTK092'].entity]      # Part 8b
Checking parts

We can use dna_features_viewer to visualize your records before proceeding (for readability purposes, we’ll show the records as linear although they are plasmids):

[3]:
import itertools
import dna_features_viewer as dfv
import matplotlib.pyplot as plt

translator = dfv.BiopythonTranslator([lambda f: f.type != 'source'])
plt.figure(1, figsize=(24, 10))
for index, entity in enumerate(itertools.chain(modules, [vector])):
    ax = plt.subplot(2, 4, index + 1)
    translator.translate_record(entity.record).plot(ax)
    plt.title(entity.record.id)
plt.show()
_images/examples_ytk-vector_6_0.png
Creating the assembly

We use the Part 8a as our base assembly vector, and then assemble all the other parts into that vector:

[4]:
assembly = vector.assemble(*modules)
Rendering the assembly sequence map

When creating an assembly, corresponding regions of the obtained sequence will be annotated with the ID of the sequence they come from.

[6]:
vec_translator = IntegrationVectorTranslator([lambda f: f.type == 'source'])
vec_translator.translate_record(assembly, dfv.CircularGraphicRecord).plot(figure_width=8)
plt.show()
_images/examples_ytk-vector_11_0.png
Comparing the assembly to the expected vector

Hopefully the obtained assembly should look like the pYTK096 plasmid, distributed with the official YTK parts:

[7]:
plt.figure(3, figsize=(24, 10))

ax = plt.subplot(2, 1, 1)
translator.translate_record(assembly).plot(ax)
plt.title('Assembly')

ax = plt.subplot(2, 1, 2)
translator.translate_record(registry['pYTK096'].entity.record).plot(ax)
plt.title('Expected')

plt.show()
_images/examples_ytk-vector_13_0.png
[ ]:

Library Reference

Record

class moclo.record.CircularRecord(SeqRecord)[source]

A derived SeqRecord that contains a circular DNA sequence.

It handles the in operator as expected, and removes the implementation of the + operator since circular DNA sequence do not have an end to append more nucleotides to. In addition, it overloads the >> and << operators to allow rotating the sequence and its annotations, effectively changing the 0 position.

See also

Bio.SeqRecord.SeqRecord documentation on the Biopython wiki.

__add__(other)[source]

Add another sequence or string to this sequence.

Since adding an arbitrary sequence to a plasmid is ambiguous (there is no sequence end), trying to add a sequence to a CircularRecord will raise a TypeError.

__contains__(char)[source]

Implement the in keyword, searches the sequence.

__getitem__(index)[source]

Return a sub-sequence or an individual letter.

The sub-sequence is always returned as a SeqRecord, since it is probably not circular anymore.

__init__(seq, id='<unknown id>', name='<unknown name>', description='<unknown description>', dbxrefs=None, features=None, annotations=None, letter_annotations=None)[source]

Create a new CircularRecord instance.

If given a SeqRecord as the first argument, it will simply copy all attributes of the record. This allows using Bio.SeqIO.read to open records, then loading them into a CircularRecord.

__lshift__(index)[source]

Rotate the sequence counter-clockwise, preserving annotations.

__radd__(other)[source]

Add another sequence or string to this sequence (from the left).

Since adding an arbitrary sequence to a plasmid is ambiguous (there is no sequence end), trying to add a sequence to a CircularRecord will raise a TypeError.

__rshift__(index)[source]

Rotate the sequence clockwise, preserving annotations.

reverse_complement(id=False, name=False, description=False, features=True, annotations=False, letter_annotations=True, dbxrefs=False)[source]

Return a new CircularRecord with reverse complement sequence.

Registry

Base class
class moclo.registry.base.AbstractRegistry[source]

An abstract registry holding MoClo plasmids.

Implementations
class moclo.registry.base.CombinedRegistry[source]

A registry combining several registries into a single collection.

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

class moclo.registry.base.EmbeddedRegistry[source]

An embedded registry, distributed with the library source code.

Records are stored within a BZ2 compressed JSON file, using standard annotations to allow retrieving features easily.

Modules

Moclo module classes.

A module is a sequence of DNA that contains a sequence of interest, such as a promoter, a CDS, a protein binding site, etc., organised in a way it can be combined to other modules to create an assembly. This involves flanking that target sequence with Type IIS restriction sites, which depend on the level of the module, as well as the chosen MoClo protocol.

Abstract
class moclo.core.modules.AbstractModule(object)[source]

An abstract modular cloning module.

cutter

the enzyme used to cut the target sequence from the backbone plasmid during Golden Gate assembly.

Type:RestrictionType
__init__(record)

Initialize self. See help(type(self)) for accurate signature.

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()[source]

Get the downstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
overhang_start()[source]

Get the upstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
classmethod structure()[source]

Get the module structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The upstream (5’) overhang sequence
  2. The module target sequence
  3. The downstream (3’) overhang sequence
target_sequence()[source]

Get the target sequence of the module.

Modules are often stored in a standardized way, and contain more than the sequence of interest: for instance they can contain an antibiotic marker, that will not be part of the assembly when that module is assembled into a vector; only the target sequence is inserted.

Returns:the target sequence with annotations.
Return type:SeqRecord

Note

Depending on the cutting direction of the restriction enzyme used during assembly, the overhang will be left at the beginning or at the end, so the obtained record is exactly the sequence the enzyme created during restriction.

Level -1
class moclo.core.modules.Product(AbstractModule)[source]

A level -1 module, often obtained as a PCR product.

Modules of this level are the lowest components of the MoClo system, but are not practical to work with until they are assembled in a standard vector to obtain entries.

Level 0
class moclo.core.modules.Entry(AbstractModule)[source]

A level 0 module, often obtained from the official toolkits plamisds.

Entries are assembled from products into a standard vector suitable for selection and storage.

Level 1
class moclo.core.modules.Cassette(AbstractModule)[source]

A level 1 module, also refered as a Transcriptional Unit.

Cassettes can either express genes in their target organism, or be assembled into multigene modules for expressing many genes at once, depending on the chosen cassette vector during level 0 assembly.

Level 2
class moclo.core.modules.Device(AbstractModule)[source]

A level 2 module, also refered as a Multigene plasmid.

Modules of this level are assembled from several transcriptional units so that they contain several genes that can be expressed all at once. Most of the MoClo implementations are designed so that multiple devices can be assembled into a module that is also a valid level 1 module, as does the Golden Braid system with its α and Ω plasmids.

Vectors

MoClo vector classes.

A vector is a plasmidic DNA sequence that can hold a combination of modules of the same level to create a single module of the following level. Vectors contain a placeholder sequence that is replaced by the concatenation of the modules during the Golden Gate assembly.

Abstract
class moclo.core.vectors.AbstractVector(object)[source]

An abstract modular cloning vector.

assemble(module, *modules, **kwargs)[source]

Assemble the provided modules into the vector.

Parameters:
  • module (AbstractModule) – a module to insert in the vector.
  • modules (AbstractModule, optional) – additional modules to insert in the vector. The order of the parameters is not important, since modules will be sorted by their start overhang in the function.
Returns:

the assembled sequence with sequence annotations inherited from the vector and the modules.

Return type:

SeqRecord

Raises:
  • DuplicateModules – when two different modules share the same start overhang, leading in possibly non-deterministic constructs.
  • MissingModule – when a module has an end overhang that is not shared by any other module, leading to a partial construct only
  • InvalidSequence – when one of the modules does not match the required module structure (missing site, wrong overhang, etc.).
  • UnusedModules – when some modules were not used during the assembly (mostly caused by duplicate parts).
overhang_end()[source]

Get the downstream overhang of the vector sequence.

overhang_start()[source]

Get the upstream overhang of the vector sequence.

placeholder_sequence()[source]

Get the placeholder sequence in the vector.

The placeholder sequence is replaced by the concatenation of modules during the assembly. It often contains a dropout sequence, such as a GFP expression cassette that can be used to measure the progress of the assembly.

classmethod structure()[source]

Get the vector structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The downstream (3’) overhang sequence
  2. The vector placeholder sequence
  3. The upstream (5’) overhang sequence
target_sequence()[source]

Get the target sequence in the vector.

The target sequence if the part of the plasmid that is not discarded during the assembly (everything except the placeholder sequence).

Level -1
class moclo.core.vectors.EntryVector(AbstractVector)[source]

Level 0 vector.

Level 0
class moclo.core.vectors.CassetteVector(AbstractVector)[source]

Level 1 vector.

Level 1
class moclo.core.vectors.DeviceVector(AbstractVector)[source]

Level 2 vector.

Parts

Moclo part classes.

Abstract
class moclo.core.parts.AbstractPart(object)[source]

An abstract modular cloning part.

Parts can be either modules or vectors, but are determined by their flanking overhangs sequences, declared in the signature class attribute. The part structure is derived from the part class (module of vector), signature, and restriction enzyme.

Example

>>> class ExamplePart(AbstractPart, Entry):
...     cutter = BsaI
...     signature = ('ATGC', 'ATTC')
...
>>> ExamplePart.structure()
'GGTCTCN(ATGC)(NN*N)(ATTC)NGAGACC'
__init__(record)

Initialize self. See help(type(self)) for accurate signature.

classmethod characterize(record)[source]

Load the record in a concrete subclass of this type.

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
classmethod structure()[source]

Get the part structure, as a DNA regex pattern.

The structure of most parts can be obtained automatically from the part signature and the restriction enzyme used in the Golden Gate assembly.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The upstream (5’) overhang sequence
  2. The vector placeholder sequence
  3. The downstream (3’) overhang sequence

Errors

Base classes
class moclo.errors.MocloError(Exception)[source]

Base class for all MoClo-related exceptions.

class moclo.errors.AssemblyError(MocloError, RuntimeError)[source]

Assembly-specific run-time error.

class moclo.errors.AssemblyWarning(MocloError, Warning)[source]

Assembly-specific run-time warning.

Warnings can be turned into errors using the warnings.catch_warnings decorator combined to warnings.simplefilter with action set to "error".

Errors
class moclo.errors.DuplicateModules(AssemblyError)[source]

Several modules share the same overhangs.

class moclo.errors.InvalidSequence(MocloError, ValueError)[source]

Invalid sequence provided.

class moclo.errors.IllegalSite(InvalidSequence)[source]

Sequence with illegal site provided.

class moclo.errors.MissingModule(AssemblyError)[source]

A module is missing in the assembly.

Warnings
class moclo.errors.UnusedModules(AssemblyWarning)[source]

Not all modules were used during assembly.

Record (moclo.record)

CircularRecord A derived SeqRecord that contains a circular DNA sequence.

Registry (moclo.registry.base)

Item A uniquely identified record in a registry.
AbstractRegistry An abstract registry holding MoClo plasmids.
CombinedRegistry A registry combining several registries into a single collection.
EmbeddedRegistry An embedded registry, distributed with the library source code.

Modules (moclo.core.modules)

AbstractModule An abstract modular cloning module.
Entry A level 0 module, often obtained from the official toolkits plamisds.
Cassette A level 1 module, also refered as a Transcriptional Unit.
Device A level 2 module, also refered as a Multigene plasmid.

Vectors (moclo.core.vectors)

AbstractVector An abstract modular cloning vector.
EntryVector Level 0 vector.
CassetteVector Level 1 vector.
DeviceVector Level 2 vector.

Parts (moclo.core.parts)

AbstractPart An abstract modular cloning part.

Errors (moclo.errors)

Base classes

MocloError Base class for all MoClo-related exceptions.
AssemblyError Assembly-specific run-time error.
AssemblyWarning Assembly-specific run-time warning.

Errors

DuplicateModules Several modules share the same overhangs.
InvalidSequence Invalid sequence provided.
IllegalSite Sequence with illegal site provided.
MissingModule A module is missing in the assembly.

Warnings

UnusedModules Not all modules were used during assembly.

Changelogs

moclo

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

v0.4.5 - 2019-02-22
Fixed
  • Support all fs versions under 3.0.
v0.4.4 - 2019-02-11
Changed
  • Add 2.3.0 to the supported fs versions.
v0.4.3 - 2019-01-06
Changed
  • Add 2.2.0 to the supported fs versions.
Added
  • Add Item.record shortcut to Item.entity.record in moclo.registry.
  • Make moclo.core abstract classes check for illegal sites in sequence to be identified as valid.
  • This CHANGELOG file.
Documented
  • Fix typos.
v0.4.2 - 2018-08-16
Fixed
  • Some registries not loading CircularRecord instances.
v0.4.1 - 2018-08-16
Changed
  • Bump required fs version to 2.1.0.
v0.4.0 - 2018-08-10
Added
  • AbstractPart.characterize to load a record into a part instance.
  • Option to include / exclude ELabFTWRegistry items using tags.
v0.3.0 - 2018-08-07
Added
  • Annotate assembled vectors as circular in AbstractVector.assemble.
  • eLabFTW registry connector in moclo.registry.elabftw.
Changed
  • Move Item._find_type to public function moclo.registry.utils.find_type.
  • Improve annotation generated in AbstractVector.assemble.
Fixed
  • AbstractPart subclasses not being recognized as abstract.
v0.2.1 - 2018-07-27
Added
  • moclo.registry.utils module with resistance idenfication function.
  • Make AbstractVector.assemble add an alphabet to the generated sequence.
Documented
  • Improved README.rst file.
v0.2.0 - 2018-07-24
Added
  • Use AbstracModule.cutter and AbstractVector.cutter to deduce the required structure for modules and vectors.
  • AbstractPart class to generate sequence structure based on part signature.
  • Add registry API in moclo.registry module.
Changed
  • Make StructuredRecord convert SeqRecord to CircularRecord on instantiation if needed.
  • Use target_sequence method in AbstractVector.assemble.
  • Make modules and vectors add sources to their target sequences when assembled.
  • Patch CircularRecord.reverse_complement to return a CircularRecord.
Documented
  • Add moclo.base.parts to documentation.
  • Add example in AbstractPart docstring.
  • Fix documentation of moclo.base
Fixed
  • Fix AbstracModule.target_sequence and AbstractVector.target_sequence to take into account cutter overhand position.
v0.1.0 - 2018-07-12

Initial public release.

moclo-cidar

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

Unreleased
Added
  • This CHANGELOG file.
Changed
  • Update CIDAR sequences to latest AddGene data update (1.6.2).
v0.4.0 - 2018-08-16
Changed
  • Bumped moclo minimal required version to v0.4.0.
Documented
  • Add SVG images illustrating CIDAR parts to the API documentation.
  • Fixed class hierarchy in API documentation.
v0.3.0 - 2018-08-07
Changed
  • Bumped moclo minimal required version to v0.3.0.
Removed
  • Location attribute handler from CIDARRegistry.
  • DVA and DVK sequences from the registry as they are not MoClo elements.
v0.2.0 - 2018-07-25
Added
  • Partial reference CIDAR sequences in moclo.registry.cidar.CIDARRegistry.
Changed
  • Use signature and cutter to generate structures of moclo.kits.cidar.CIDARPart subclasses.
  • Bumped moclo minimal required version to v0.2.0.
Documented
  • Fixed link to documentation in README.rst.
v0.1.0 - 2018-07-12

Initial public release.

moclo-ecoflex

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

Unreleased
Fixed
  • Annotations of CmR cassette in pBP-BBa_B0034.
  • Add missing sequences to the EcoFlex registry:
    • Promoters: pBP-SJM9** series.
v0.3.1 - 2018-11-19
Added
  • This CHANGELOG file.
Fixed
  • Wheel distribution not embedding the moclo.registry.ecoflex module.
  • Add missing sequences to the EcoFlex registry:
    • Promoters: pBP-BBa_B0012, pBP-BBa_B0015, pBP-BBa_B0034,
    • Tags: pBP-HexHis
    • CDS: pBP-eCFP, pBP-eGFP
    • Promoter + RBS: pBP-T7-RBS-His6
    • Device Vectors: pTU2-a-RFP, pTU2-b-RFP
v0.3.0 - 2018-08-16
Changed
  • Bumped moclo minimal required version to v0.4.0.
Documented
  • Fixed class hierarchy in API documentation.
v0.2.0 - 2018-08-07
Added
  • Partial reference EcoFlex sequences in moclo.registry.ecoflex.EcoFlexRegistry.
Changed
  • Use signature and cutter to generate structures of moclo.kits.ecoflex.EcoFlexPart subclasses.
  • Bumped moclo minimal required version to v0.3.0.
v0.1.0 - 2018-07-12

Initial public release.

moclo-gb3

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

v0.1.0 - 2018-07-12

Initial public release.

moclo-ig

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

v0.1.0 - 2018-07-12

Initial public release.

moclo-ytk

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

Unreleased
Changed
  • Update Pichia ToolKit sequences to latest AddGene data update (1.6.2).
Added
  • This CHANGELOG file.
v0.4.0 - 2018-08-16
Changed
  • Bumped moclo minimal required version to v0.4.0.
Documented
  • Fixed class hierarchy in API documentation.
v0.3.0 - 2018-08-07
Changed
  • Bumped moclo minimal required version to v0.3.0.
Documented
  • Fix links to documentation in README.rst.
  • Add YTK specific notebook in a Docker image.
v0.2.0 - 2018-07-24
Added
  • Reference Yeast ToolKit sequences in moclo.registry.ytk.YTKRegistry.
  • Reference Pichia ToolKit sequences in moclo.registry.ytk.PTKRegistry.
Changed
  • Redefined YTKProduct._structure as a public static method.
v0.1.0 - 2018-07-12

Initial public release.

About

Authors

moclo is developped and maintained by:

Martin Larralde
Graduate student, Biology department
École Normale Supérieure Paris Saclay

This library was developped during a summer internship at Institut Pasteur, under the supervision of:

François Bertaux
Reserach Engineer, InBio Unit
Inria / Institut Pasteur
Grégory Batt
Senior Scientist, Head of InBio Unit
Inria / Institut Pasteur

License

This project is licensed under the MIT License.

Kits

CIDAR Kit

An implementation of the CIDAR ToolKit for the Python MoClo library.

Level -1

Module
class moclo.kits.cidar.CIDARProduct(Product)[source]

A CIDAR MoClo product.

__init__(record)

Initialize self. See help(type(self)) for accurate signature.

cutter

alias of Bio.Restriction.Restriction.BbsI

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()

Get the downstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
overhang_start()

Get the upstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
classmethod structure()

Get the module structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The upstream (5’) overhang sequence
  2. The module target sequence
  3. The downstream (3’) overhang sequence
target_sequence()

Get the target sequence of the module.

Modules are often stored in a standardized way, and contain more than the sequence of interest: for instance they can contain an antibiotic marker, that will not be part of the assembly when that module is assembled into a vector; only the target sequence is inserted.

Returns:the target sequence with annotations.
Return type:SeqRecord

Note

Depending on the cutting direction of the restriction enzyme used during assembly, the overhang will be left at the beginning or at the end, so the obtained record is exactly the sequence the enzyme created during restriction.

Vector
class moclo.kits.cidar.CIDAREntryVector(EntryVector)[source]

A CIDAR MoClo entry vector.

__init__(record)

Initialize self. See help(type(self)) for accurate signature.

assemble(module, *modules, **kwargs)

Assemble the provided modules into the vector.

Parameters:
  • module (AbstractModule) – a module to insert in the vector.
  • modules (AbstractModule, optional) – additional modules to insert in the vector. The order of the parameters is not important, since modules will be sorted by their start overhang in the function.
Returns:

the assembled sequence with sequence annotations inherited from the vector and the modules.

Return type:

SeqRecord

Raises:
  • DuplicateModules – when two different modules share the same start overhang, leading in possibly non-deterministic constructs.
  • MissingModule – when a module has an end overhang that is not shared by any other module, leading to a partial construct only
  • InvalidSequence – when one of the modules does not match the required module structure (missing site, wrong overhang, etc.).
  • UnusedModules – when some modules were not used during the assembly (mostly caused by duplicate parts).
cutter

alias of Bio.Restriction.Restriction.BbsI

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()

Get the downstream overhang of the vector sequence.

overhang_start()

Get the upstream overhang of the vector sequence.

placeholder_sequence()

Get the placeholder sequence in the vector.

The placeholder sequence is replaced by the concatenation of modules during the assembly. It often contains a dropout sequence, such as a GFP expression cassette that can be used to measure the progress of the assembly.

static structure()[source]

Get the vector structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The downstream (3’) overhang sequence
  2. The vector placeholder sequence
  3. The upstream (5’) overhang sequence
target_sequence()

Get the target sequence in the vector.

The target sequence if the part of the plasmid that is not discarded during the assembly (everything except the placeholder sequence).

Level 0

Module
class moclo.kits.cidar.CIDAREntry(Entry)[source]

A CIDAR MoClo entry.

__init__(record)

Initialize self. See help(type(self)) for accurate signature.

cutter

alias of Bio.Restriction.Restriction.BsaI

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()

Get the downstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
overhang_start()

Get the upstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
classmethod structure()

Get the module structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The upstream (5’) overhang sequence
  2. The module target sequence
  3. The downstream (3’) overhang sequence
target_sequence()

Get the target sequence of the module.

Modules are often stored in a standardized way, and contain more than the sequence of interest: for instance they can contain an antibiotic marker, that will not be part of the assembly when that module is assembled into a vector; only the target sequence is inserted.

Returns:the target sequence with annotations.
Return type:SeqRecord

Note

Depending on the cutting direction of the restriction enzyme used during assembly, the overhang will be left at the beginning or at the end, so the obtained record is exactly the sequence the enzyme created during restriction.

Vector
class moclo.kits.cidar.CIDARCassetteVector(CassetteVector)[source]

A CIDAR Moclo cassette vector.

References

Iverson et al., Figure 1.

__init__(record)

Initialize self. See help(type(self)) for accurate signature.

assemble(module, *modules, **kwargs)

Assemble the provided modules into the vector.

Parameters:
  • module (AbstractModule) – a module to insert in the vector.
  • modules (AbstractModule, optional) – additional modules to insert in the vector. The order of the parameters is not important, since modules will be sorted by their start overhang in the function.
Returns:

the assembled sequence with sequence annotations inherited from the vector and the modules.

Return type:

SeqRecord

Raises:
  • DuplicateModules – when two different modules share the same start overhang, leading in possibly non-deterministic constructs.
  • MissingModule – when a module has an end overhang that is not shared by any other module, leading to a partial construct only
  • InvalidSequence – when one of the modules does not match the required module structure (missing site, wrong overhang, etc.).
  • UnusedModules – when some modules were not used during the assembly (mostly caused by duplicate parts).
cutter

alias of Bio.Restriction.Restriction.BsaI

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()

Get the downstream overhang of the vector sequence.

overhang_start()

Get the upstream overhang of the vector sequence.

placeholder_sequence()

Get the placeholder sequence in the vector.

The placeholder sequence is replaced by the concatenation of modules during the assembly. It often contains a dropout sequence, such as a GFP expression cassette that can be used to measure the progress of the assembly.

static structure()[source]

Get the vector structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The downstream (3’) overhang sequence
  2. The vector placeholder sequence
  3. The upstream (5’) overhang sequence
target_sequence()

Get the target sequence in the vector.

The target sequence if the part of the plasmid that is not discarded during the assembly (everything except the placeholder sequence).

Parts
class moclo.kits.cidar.CIDARPromoter(CIDARPart, CIDAREntry)[source]

A CIDAR Promoter part.

_images/promoter.svg

Parts of this type contain contain a promoter. The upstream overhangs can be changed to amend the order of assembly of a circuit from different cassettes.

Note

The CIDAR toolkit parts provide 4 different upstream overhangs: GGAG, GCTT, CGCT, and TGCC. These are not enforced in this module, and any upstream sequence will be accepted. The downstream sequence however is always TACT.

class moclo.kits.cidar.CIDARRibosomeBindingSite(CIDARPart, CIDAREntry)[source]

A CIDAR ribosome binding site.

_images/rbs.svg

Parts of this type contain a ribosome binding site (RBS). The downstream overhang doubles as the start codon for the subsequent coding sequence.

class moclo.kits.cidar.CIDARCodingSequence(CIDARPart, CIDAREntry)[source]

A CIDAR coding sequence.

_images/cds.svg

Parts of this type contain a coding sequence, with the start codon located on the upstream overhang.

Caution

Although the start codon is located on the upstream overhang, a STOP codon is expected to be found within this part target sequence before the downstream overhang.

class moclo.kits.cidar.CIDARTerminator(CIDARPart, CIDAREntry)[source]

A CIDAR terminator.

_images/terminator.svg

Parts of this type contain a terminator. The upstream overhang is always the same for the terminator to directly follow the coding sequence, but the downstream overhang can vary to specify an order for a following multigenic assembly within a device.

Note

The CIDAR toolkit parts provide 4 different downstream overhangs: GCTT, CGCT, TGCC, and ACTA. These are not enforced in this module, and any downstream sequence will be accepted. The upstream sequence however is always AGGT.

Level 1

Module
class moclo.kits.cidar.CIDARCassette(Cassette)[source]

A CIDAR MoClo cassette.

cutter

alias of Bio.Restriction.Restriction.BbsI

Vector
class moclo.kits.cidar.CIDARDeviceVector(DeviceVector)[source]

A CIDAR Moclo device vector.

References

Iverson et al., Figure 1.

cutter

alias of Bio.Restriction.Restriction.BbsI

static structure()[source]

Get the vector structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The downstream (3’) overhang sequence
  2. The vector placeholder sequence
  3. The upstream (5’) overhang sequence

Level 2

Module
class moclo.kits.cidar.CIDARDevice(Device)[source]

A CIDAR MoClo device.

cutter

alias of Bio.Restriction.Restriction.BsaI

EcoFlex Kit

An implementation of the EcoFlex ToolKit for the Python MoClo library.

Level 0

Module
class moclo.kits.ecoflex.EcoFlexEntry(Entry)[source]

An EcoFlex MoClo entry.

EcoFlex entries are stored and shared as plasmids flanked by BsaI binding sites at both ends of the target sequence.

__init__(record)

Initialize self. See help(type(self)) for accurate signature.

cutter

alias of Bio.Restriction.Restriction.BsaI

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()

Get the downstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
overhang_start()

Get the upstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
classmethod structure()

Get the module structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The upstream (5’) overhang sequence
  2. The module target sequence
  3. The downstream (3’) overhang sequence
target_sequence()

Get the target sequence of the module.

Modules are often stored in a standardized way, and contain more than the sequence of interest: for instance they can contain an antibiotic marker, that will not be part of the assembly when that module is assembled into a vector; only the target sequence is inserted.

Returns:the target sequence with annotations.
Return type:SeqRecord

Note

Depending on the cutting direction of the restriction enzyme used during assembly, the overhang will be left at the beginning or at the end, so the obtained record is exactly the sequence the enzyme created during restriction.

Vector
class moclo.kits.ecoflex.EcoFlexCassetteVector(CassetteVector)[source]

An EcoFlex MoClo cassette vector.

__init__(record)

Initialize self. See help(type(self)) for accurate signature.

assemble(module, *modules, **kwargs)

Assemble the provided modules into the vector.

Parameters:
  • module (AbstractModule) – a module to insert in the vector.
  • modules (AbstractModule, optional) – additional modules to insert in the vector. The order of the parameters is not important, since modules will be sorted by their start overhang in the function.
Returns:

the assembled sequence with sequence annotations inherited from the vector and the modules.

Return type:

SeqRecord

Raises:
  • DuplicateModules – when two different modules share the same start overhang, leading in possibly non-deterministic constructs.
  • MissingModule – when a module has an end overhang that is not shared by any other module, leading to a partial construct only
  • InvalidSequence – when one of the modules does not match the required module structure (missing site, wrong overhang, etc.).
  • UnusedModules – when some modules were not used during the assembly (mostly caused by duplicate parts).
cutter

alias of Bio.Restriction.Restriction.BsaI

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()

Get the downstream overhang of the vector sequence.

overhang_start()

Get the upstream overhang of the vector sequence.

placeholder_sequence()

Get the placeholder sequence in the vector.

The placeholder sequence is replaced by the concatenation of modules during the assembly. It often contains a dropout sequence, such as a GFP expression cassette that can be used to measure the progress of the assembly.

static structure()[source]

Get the vector structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The downstream (3’) overhang sequence
  2. The vector placeholder sequence
  3. The upstream (5’) overhang sequence
target_sequence()

Get the target sequence in the vector.

The target sequence if the part of the plasmid that is not discarded during the assembly (everything except the placeholder sequence).

Parts
class moclo.kits.ecoflex.EcoFlexPromoter(EcoFlexPart, EcoFlexEntry)[source]

An EcoFlex MoClo promoter.

class moclo.kits.ecoflex.EcoFlexRBS(EcoFlexPart, EcoFlexEntry)[source]

An EcoFlex MoClo ribosome binding site.

kits/ecoflex/rbs.svg

Parts of this type contain a ribosome binding site (RBS). The last adenosine serves as the beginning of the start codon of the following CDS.

class moclo.kits.ecoflex.EcoFlexTagLinker(EcoFlexPart, EcoFlexEntry)[source]

An EcoFlex MoClo tag linker.

kits/ecoflex/linker.svg

Parts of this type also contain a RBS, but they allow adding a N-terminal tag sequence before the CDS.

class moclo.kits.ecoflex.EcoFlexTag(EcoFlexPart, EcoFlexEntry)[source]

An EcoFlex MoClo N-terminal tag.

kits/ecoflex/tag.svg

Parts of this type typically contain tags that are added to the N-terminus of the translated protein, such as a hexa histidine or a Strep(II) tag.

class moclo.kits.ecoflex.EcoFlexCodingSequence(EcoFlexPart, EcoFlexEntry)[source]

An EcoFlex MoClo coding sequence.

kits/ecoflex/cds.svg

Parts of this type contain a coding sequence (CDS), with the start codon beginning on the upstream overhang.

Caution

Although the start codon is located on the upstream overhang, a STOP codon is expected to be found within this part target sequence before the downstream overhang.

class moclo.kits.ecoflex.EcoFlexTerminator(EcoFlexPart, EcoFlexEntry)[source]

An EcoFlex MoClo terminator.

kits/ecoflex/terminator.svg

Level 1

Module
class moclo.kits.ecoflex.EcoFlexCassette(Cassette)[source]

An EcoFlex MoClo cassette.

cutter

alias of Bio.Restriction.Restriction.BsmBI

Vector
class moclo.kits.ecoflex.EcoFlexDeviceVector(DeviceVector)[source]

An EcoFlex MoClo device vector.

cutter

alias of Bio.Restriction.Restriction.BsmBI

static structure()[source]

Get the vector structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The downstream (3’) overhang sequence
  2. The vector placeholder sequence
  3. The upstream (5’) overhang sequence

Level 2

Module
class moclo.kits.ecoflex.EcoFlexDevice(Device)[source]

An EcoFlex MoClo device.

cutter

alias of Bio.Restriction.Restriction.BsaI

Icon Genetics Kit

An implementation of the Icon Genetics ToolKit for the Python MoClo library.

Level -1

Module
class moclo.kits.ig.IGProduct(Product)[source]

An Icon Genetics MoClo product.

__init__(record)

Initialize self. See help(type(self)) for accurate signature.

cutter

alias of Bio.Restriction.Restriction.BpiI

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()

Get the downstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
overhang_start()

Get the upstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
classmethod structure()

Get the module structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The upstream (5’) overhang sequence
  2. The module target sequence
  3. The downstream (3’) overhang sequence
target_sequence()

Get the target sequence of the module.

Modules are often stored in a standardized way, and contain more than the sequence of interest: for instance they can contain an antibiotic marker, that will not be part of the assembly when that module is assembled into a vector; only the target sequence is inserted.

Returns:the target sequence with annotations.
Return type:SeqRecord

Note

Depending on the cutting direction of the restriction enzyme used during assembly, the overhang will be left at the beginning or at the end, so the obtained record is exactly the sequence the enzyme created during restriction.

Vector
class moclo.kits.ig.IGEntryVector(EntryVector)[source]

An Icon Genetics entry vector.

References

Weber et al., Figure 2A.

__init__(record)

Initialize self. See help(type(self)) for accurate signature.

assemble(module, *modules, **kwargs)

Assemble the provided modules into the vector.

Parameters:
  • module (AbstractModule) – a module to insert in the vector.
  • modules (AbstractModule, optional) – additional modules to insert in the vector. The order of the parameters is not important, since modules will be sorted by their start overhang in the function.
Returns:

the assembled sequence with sequence annotations inherited from the vector and the modules.

Return type:

SeqRecord

Raises:
  • DuplicateModules – when two different modules share the same start overhang, leading in possibly non-deterministic constructs.
  • MissingModule – when a module has an end overhang that is not shared by any other module, leading to a partial construct only
  • InvalidSequence – when one of the modules does not match the required module structure (missing site, wrong overhang, etc.).
  • UnusedModules – when some modules were not used during the assembly (mostly caused by duplicate parts).
cutter

alias of Bio.Restriction.Restriction.BpiI

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()

Get the downstream overhang of the vector sequence.

overhang_start()

Get the upstream overhang of the vector sequence.

placeholder_sequence()

Get the placeholder sequence in the vector.

The placeholder sequence is replaced by the concatenation of modules during the assembly. It often contains a dropout sequence, such as a GFP expression cassette that can be used to measure the progress of the assembly.

classmethod structure()[source]

Get the vector structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The downstream (3’) overhang sequence
  2. The vector placeholder sequence
  3. The upstream (5’) overhang sequence
target_sequence()

Get the target sequence in the vector.

The target sequence if the part of the plasmid that is not discarded during the assembly (everything except the placeholder sequence).

Level 0

Module
class moclo.kits.ig.IGEntry(Entry)[source]

An Icon Genetics MoClo entry.

__init__(record)

Initialize self. See help(type(self)) for accurate signature.

cutter

alias of Bio.Restriction.Restriction.BsaI

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()

Get the downstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
overhang_start()

Get the upstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
classmethod structure()

Get the module structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The upstream (5’) overhang sequence
  2. The module target sequence
  3. The downstream (3’) overhang sequence
target_sequence()

Get the target sequence of the module.

Modules are often stored in a standardized way, and contain more than the sequence of interest: for instance they can contain an antibiotic marker, that will not be part of the assembly when that module is assembled into a vector; only the target sequence is inserted.

Returns:the target sequence with annotations.
Return type:SeqRecord

Note

Depending on the cutting direction of the restriction enzyme used during assembly, the overhang will be left at the beginning or at the end, so the obtained record is exactly the sequence the enzyme created during restriction.

Vector
class moclo.kits.ig.IGCassetteVector(CassetteVector)[source]

An Icon Genetics cassette vector.

References

Weber et al., Figure 4A.

__init__(record)

Initialize self. See help(type(self)) for accurate signature.

assemble(module, *modules, **kwargs)

Assemble the provided modules into the vector.

Parameters:
  • module (AbstractModule) – a module to insert in the vector.
  • modules (AbstractModule, optional) – additional modules to insert in the vector. The order of the parameters is not important, since modules will be sorted by their start overhang in the function.
Returns:

the assembled sequence with sequence annotations inherited from the vector and the modules.

Return type:

SeqRecord

Raises:
  • DuplicateModules – when two different modules share the same start overhang, leading in possibly non-deterministic constructs.
  • MissingModule – when a module has an end overhang that is not shared by any other module, leading to a partial construct only
  • InvalidSequence – when one of the modules does not match the required module structure (missing site, wrong overhang, etc.).
  • UnusedModules – when some modules were not used during the assembly (mostly caused by duplicate parts).
cutter

alias of Bio.Restriction.Restriction.BsaI

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()

Get the downstream overhang of the vector sequence.

overhang_start()

Get the upstream overhang of the vector sequence.

placeholder_sequence()

Get the placeholder sequence in the vector.

The placeholder sequence is replaced by the concatenation of modules during the assembly. It often contains a dropout sequence, such as a GFP expression cassette that can be used to measure the progress of the assembly.

classmethod structure()[source]

Get the vector structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The downstream (3’) overhang sequence
  2. The vector placeholder sequence
  3. The upstream (5’) overhang sequence
target_sequence()

Get the target sequence in the vector.

The target sequence if the part of the plasmid that is not discarded during the assembly (everything except the placeholder sequence).

Parts
class moclo.kits.ig.IGPromoter(IGPart, IGEntry)[source]

An Icon Genetics promoter part.

class moclo.kits.ig.IGUntranslatedRegion(IGPart, IGEntry)[source]

An Icon Genetics 5’ UTR part.

class moclo.kits.ig.IGSignalPeptide(IGPart, IGEntry)[source]

An Icon Genetics signal peptide part.

class moclo.kits.ig.IGCodingSequence(IGPart, IGEntry)[source]

An Icon Genetics CDS part.

class moclo.kits.ig.IGTerminator(IGPart, IGEntry)[source]

An Icon Genetics terminator part.

Level 1

Module
class moclo.kits.ig.IGCassette(Cassette)[source]

An Icon Genetics MoClo cassette.

cutter

alias of Bio.Restriction.Restriction.BpiI

Vector
class moclo.kits.ig.IGDeviceVector(DeviceVector)[source]

An Icon Genetics device vector.

References

Weber et al., Figure 4A.

cutter

alias of Bio.Restriction.Restriction.BpiI

Parts
class moclo.kits.ig.IGEndLinker(IGPart, IGCassette)[source]

An Icon Genetic end linker part.

References

Weber et al., Figure 5.

Level M

Parts
class moclo.kits.ig.IGLevelMVector(IGPart, IGDeviceVector)[source]
cutter

alias of Bio.Restriction.Restriction.BpiI

classmethod structure()[source]

Get the part structure, as a DNA regex pattern.

The structure of most parts can be obtained automatically from the part signature and the restriction enzyme used in the Golden Gate assembly.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The upstream (5’) overhang sequence
  2. The vector placeholder sequence
  3. The downstream (3’) overhang sequence
class moclo.kits.ig.IGLevelMEndLinker(IGPart, IGCassette)[source]
classmethod structure()[source]

Get the part structure, as a DNA regex pattern.

The structure of most parts can be obtained automatically from the part signature and the restriction enzyme used in the Golden Gate assembly.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The upstream (5’) overhang sequence
  2. The vector placeholder sequence
  3. The downstream (3’) overhang sequence

Level P

Parts
class moclo.kits.ig.IGLevelPVector(IGPart, IGCassetteVector)[source]
cutter

alias of Bio.Restriction.Restriction.BsaI

classmethod structure()[source]

Get the part structure, as a DNA regex pattern.

The structure of most parts can be obtained automatically from the part signature and the restriction enzyme used in the Golden Gate assembly.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The upstream (5’) overhang sequence
  2. The vector placeholder sequence
  3. The downstream (3’) overhang sequence
class moclo.kits.ig.IGLevelPEndLinker(IGPart, IGEntry)[source]
cutter

alias of Bio.Restriction.Restriction.BsaI

classmethod structure()[source]

Get the part structure, as a DNA regex pattern.

The structure of most parts can be obtained automatically from the part signature and the restriction enzyme used in the Golden Gate assembly.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The upstream (5’) overhang sequence
  2. The vector placeholder sequence
  3. The downstream (3’) overhang sequence

Yeast ToolKit (YTK) / Pichia ToolKit (PTK)

An implementation of the Yeast ToolKit for the Python MoClo library.

This module is tested against the officials parts available in the Yeast ToolKit (YTK), and also against the Pichia ToolKit (PTK) parts since they were designed to be compatible with each other.

The documentation of this module is mostly adapted from the Lee et al. supplementary data. Each item also has specific sections that are organized as follow:

Note:
this section describes a behaviour that is not part of the YTK standard, but that is implemnted in all YTK official parts, and encouraged to follow by the YTK authors.
Caution
this section describes a behaviour that goes against the MoClo standard, but which you are entitled to follow for your parts to be valid YTK parts.
Danger
this section describes a quirk specific to the moclo-ytk library.

Level -1

Module
class moclo.kits.ytk.YTKProduct(Product)[source]

A MoClo Yeast ToolKit product.

As the YTK entry vector does not contain the required BsaI restriction site, the site must be contained in the product sequence.

Caution

The standard construction describe in the Lee et al. paper directly inserts the beginning of the BsaI recognition site inside of the two BsmBI overhangs at both ends of the product. Other valid constructs that do not proceed like so won’t be considered a valid product, although they contain the required BsaI site.

References

Lee et al., Supplementary Figure S19.

__init__(record)

Initialize self. See help(type(self)) for accurate signature.

cutter

alias of Bio.Restriction.Restriction.BsmBI

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()

Get the downstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
overhang_start()

Get the upstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
static structure()[source]

Get the module structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The upstream (5’) overhang sequence
  2. The module target sequence
  3. The downstream (3’) overhang sequence
target_sequence()

Get the target sequence of the module.

Modules are often stored in a standardized way, and contain more than the sequence of interest: for instance they can contain an antibiotic marker, that will not be part of the assembly when that module is assembled into a vector; only the target sequence is inserted.

Returns:the target sequence with annotations.
Return type:SeqRecord

Note

Depending on the cutting direction of the restriction enzyme used during assembly, the overhang will be left at the beginning or at the end, so the obtained record is exactly the sequence the enzyme created during restriction.

Vector
class moclo.kits.ytk.YTKEntryVector(EntryVector)[source]

A MoClo Yeast ToolKit entry vector.

Any plasmid with two BsmBI restriction sites can be used to create a YTK entry, although the toolkit-provided entry vector (pYTK001) is probably the most appropriate plasmid to use.

Caution

To the contrary of the usual MoClo entry vectors described in the Weber et al. paper, the YTK entry vectors do not provide another BsaI restriction site enclosing the placeholder sequence. As such, YTK Level -1 modules must embed the BsaI binding site.

__init__(record)

Initialize self. See help(type(self)) for accurate signature.

assemble(module, *modules, **kwargs)

Assemble the provided modules into the vector.

Parameters:
  • module (AbstractModule) – a module to insert in the vector.
  • modules (AbstractModule, optional) – additional modules to insert in the vector. The order of the parameters is not important, since modules will be sorted by their start overhang in the function.
Returns:

the assembled sequence with sequence annotations inherited from the vector and the modules.

Return type:

SeqRecord

Raises:
  • DuplicateModules – when two different modules share the same start overhang, leading in possibly non-deterministic constructs.
  • MissingModule – when a module has an end overhang that is not shared by any other module, leading to a partial construct only
  • InvalidSequence – when one of the modules does not match the required module structure (missing site, wrong overhang, etc.).
  • UnusedModules – when some modules were not used during the assembly (mostly caused by duplicate parts).
cutter

alias of Bio.Restriction.Restriction.BsmBI

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()

Get the downstream overhang of the vector sequence.

overhang_start()

Get the upstream overhang of the vector sequence.

placeholder_sequence()

Get the placeholder sequence in the vector.

The placeholder sequence is replaced by the concatenation of modules during the assembly. It often contains a dropout sequence, such as a GFP expression cassette that can be used to measure the progress of the assembly.

classmethod structure()

Get the vector structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The downstream (3’) overhang sequence
  2. The vector placeholder sequence
  3. The upstream (5’) overhang sequence
target_sequence()

Get the target sequence in the vector.

The target sequence if the part of the plasmid that is not discarded during the assembly (everything except the placeholder sequence).

Level 0

Module
class moclo.kits.ytk.YTKEntry(Entry)[source]

A MoClo Yeast ToolKit entry.

YTK entries are stored and shared as plasmids flanked by BsaI binding sites at both ends of the target sequence.

Danger

Although the BsaI binding sites is not located within the target sequence for almost all the standard toolkit parts, special Type 234r parts have these sites reversed, because these parts are used to assemble cassette vectors and require the final construct to contain a BsaI site to allow assembly with other parts. Those parts will not match the default YTKEntry, and must be used as YTKPart234r instances for the assembly logic to work as expected.

__init__(record)

Initialize self. See help(type(self)) for accurate signature.

cutter

alias of Bio.Restriction.Restriction.BsaI

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()

Get the downstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
overhang_start()

Get the upstream overhang of the target sequence.

Returns:the downstream overhang.
Return type:Seq
classmethod structure()

Get the module structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The upstream (5’) overhang sequence
  2. The module target sequence
  3. The downstream (3’) overhang sequence
target_sequence()

Get the target sequence of the module.

Modules are often stored in a standardized way, and contain more than the sequence of interest: for instance they can contain an antibiotic marker, that will not be part of the assembly when that module is assembled into a vector; only the target sequence is inserted.

Returns:the target sequence with annotations.
Return type:SeqRecord

Note

Depending on the cutting direction of the restriction enzyme used during assembly, the overhang will be left at the beginning or at the end, so the obtained record is exactly the sequence the enzyme created during restriction.

Vector
class moclo.kits.ytk.YTKCassetteVector(CassetteVector)[source]

A MoClo Yeast ToolKit cassette vector.

The YTK provides a canonical integration plasmid, preassembled from several other parts, that can be used as a cassette vector for an assembly of Type 2, 3 and 4 parts. Type 8, 8a and 678 parts are also considered as cassette vectors.

References

Lee et al., Figure 2.

__init__(record)

Initialize self. See help(type(self)) for accurate signature.

assemble(module, *modules, **kwargs)

Assemble the provided modules into the vector.

Parameters:
  • module (AbstractModule) – a module to insert in the vector.
  • modules (AbstractModule, optional) – additional modules to insert in the vector. The order of the parameters is not important, since modules will be sorted by their start overhang in the function.
Returns:

the assembled sequence with sequence annotations inherited from the vector and the modules.

Return type:

SeqRecord

Raises:
  • DuplicateModules – when two different modules share the same start overhang, leading in possibly non-deterministic constructs.
  • MissingModule – when a module has an end overhang that is not shared by any other module, leading to a partial construct only
  • InvalidSequence – when one of the modules does not match the required module structure (missing site, wrong overhang, etc.).
  • UnusedModules – when some modules were not used during the assembly (mostly caused by duplicate parts).
cutter

alias of Bio.Restriction.Restriction.BsaI

is_valid()

Check if the wrapped record follows the required class structure.

Returns:True if the record is valid, False otherwise.
Return type:bool
overhang_end()

Get the downstream overhang of the vector sequence.

overhang_start()

Get the upstream overhang of the vector sequence.

placeholder_sequence()

Get the placeholder sequence in the vector.

The placeholder sequence is replaced by the concatenation of modules during the assembly. It often contains a dropout sequence, such as a GFP expression cassette that can be used to measure the progress of the assembly.

classmethod structure()

Get the vector structure, as a DNA regex pattern.

Warning

If overloading this method, the returned pattern must include 3 capture groups to capture the following features:

  1. The downstream (3’) overhang sequence
  2. The vector placeholder sequence
  3. The upstream (5’) overhang sequence
target_sequence()

Get the target sequence in the vector.

The target sequence if the part of the plasmid that is not discarded during the assembly (everything except the placeholder sequence).

Parts
Base Parts
class moclo.kits.ytk.YTKPart1(YTKPart, YTKEntry)[source]

A YTK Type 1 part (Upstream assembly connector).

_images/type1.svg

Parts of this type contain non-coding and non-regulatory sequences that are used to direct assembly of multigene plasmids, such as ligation sites for other Type IIS endonucleases (e.g. BsmBI).

Note

Official toolkit Type 1 parts also include a EcoRI and XbaI site just after the upstream overhang for BioBrick compatibility of the assembled cassettes and multi-gene plasmids.

class moclo.kits.ytk.YTKPart2(YTKPart, YTKEntry)[source]

A YTK Type 2 part (Promoter).

_images/type2.svg

Parts of this type contain a promoter. The downstream overhang doubles as the start codon for the subsequent Type 3 or Type 3a coding sequence.

Note

Official toolkit Type 2 parts also include a BglII site immediately preceding the start codon (overlapping the downstream overhang) for BglBrick compatibility.

class moclo.kits.ytk.YTKPart3(YTKPart, YTKEntry)[source]

A YTK Type 3 part (Coding sequence).

_images/type3.svg

Parts of this type contain a coding sequence, with the start codon located on the upstream overhang. If a stop codon is omitted from the part, and two bases are added before the downstream overhang, the resulting site can be used as a two amino acid linker to a Type 4 or 4a C-terminal fusion.

Note

Official toolkit Type 3 parts also include a BamHI recognition site at the end of the included CDS (overlapping the downstream overhang) for BglBrick compatibility.

class moclo.kits.ytk.YTKPart3a(YTKPart, YTKEntry)[source]

A YTK Type 3a part (N-terminal coding sequence).

_images/type3a.svg
class moclo.kits.ytk.YTKPart3b(YTKPart, YTKEntry)[source]

A YTK Type 3b part (C-terminal coding sequence).

_images/type3b.svg

Note

As with Type 3 parts, official toolkits Type 3b parts also include a BamHI recognition site at the end of the included CDS (overlapping the downstream overhang) for BglBrick compatibility.

class moclo.kits.ytk.YTKPart4(YTKPart, YTKEntry)[source]

A YTK Type 4 part (Transcriptional terminator).

_images/type4.svg

As Type 3 parts do not include a stop codon, parts of this type should encode an in-frame stop codon before the transcriptional terminator. Commonly used C-terminal fusions, such as purification or epitope tags, but it is recommended to use YTKPart4a and YTKPart4b subtypes instead.

Note

Official toolkit Type 4 parts all start by a stop codon directly after the upstream overhang, followed by a XhoI recognition site which enables BglBrick compatibility, then followed by the terminator sequence itself.

class moclo.kits.ytk.YTKPart4a(YTKPart, YTKEntry)[source]

A YTK Type 4a part (C-terminal tag sequence).

_images/type4a.svg

Type 4a parts contain additional coding sequences that will be fused to the C-terminal extremity of the protein. These parts include, but are not limited to: localisation tags, purification tags, fluorescent proteins.

Caution

In contrast to the Type 3 and 3b parts, the convention for 4a parts is to include the stop codon rather than enable read-through of the downstream overhang, although that convention it is not enforced.

Note

Official toolkit Type 4a parts contain a stop codon after the CDS, itself immediately followed by a XhoI recognition site just before the downstream overhang, for BglBrick compatibility.

class moclo.kits.ytk.YTKPart4b(YTKPart, YTKEntry)[source]

A YTK Type 4b part (Terminator sequence).

_images/type4b.svg

Type 4b contain transcriptional terminators, but are not required to encode an in-frame start codon, as it should be located in the Type 4a part that precedes it.

class moclo.kits.ytk.YTKPart5(YTKPart, YTKEntry)[source]

A YTK Type 5 part (Downstream assembly connector).

_images/type5.svg

As with Type 1 parts, parts of this type provide sequences such as restriction enzymes recognition sites, for instance in order to direct multigene expression plasmids.

Note

Official toolkit parts also include a SpeI and PstI site at the end of the part sequence for BioBrick compatibility of the assembled cassettes and multi-gene plasmids.

class moclo.kits.ytk.YTKPart6(YTKPart, YTKEntry)[source]

A YTK Type 6 part (Yeast marker).

_images/type6.svg

Parts of this type contain a selectable marker for S. cerevisiae, as a full expression cassette (promoter, ORF, and terminal) for conferring the selectable phenotype (such as drug-resistance or bioluminescence).

class moclo.kits.ytk.YTKPart7(YTKPart, YTKEntry)[source]

A YTK Part Type 7 part (Yeast origin / 3’ homology).

_images/type7.svg

Depending on the expression organism (E.coli or S. ceverisiae), this sequence will either hold a yeast origin of replication, or a 3’ homology sequence for integration in the bacterial genome.

class moclo.kits.ytk.YTKPart8(YTKPart, YTKCassetteVector)[source]

A YTK Type 8 part (Bacterial origin & marker).

_images/type8.svg

Parts of this type contain a bacterial origin of replication, as well as an antibiotic resistance marker. They act as the Golden Gate Assembly vector when assembling a cassette, and as such should also embbed a dropout sequence, such as a fluorescent protein expression cassette.

Note

Official toolkit parts use an mRFP coding sequence as the dropout, and also include NotI restriction site at each end of the part to allow the verification of new assemblies.

class moclo.kits.ytk.YTKPart8a(YTKPart, YTKCassetteVector)[source]

A YTK Part 8a part (Bacterial origin & marker).

_images/type8a.svg

Parts of this type, like Type 8 parts, include a bacterial origin of replication and an antibiotic resistance marker, and act as Assembly vectors.

Note

Official toolkit parts use an mRFP coding sequence as the dropout, and also include NotI restriction site at each end of the part so the integration plasmid can be linearized prior to transformation into yeast.

class moclo.kits.ytk.YTKPart8b(YTKPart, YTKEntry)[source]

A YTK Type 8b part (5’ homology).

_images/type8b.svg

As with certain Type 7 parts, parts of this type contain long sequences of homology to the genome that is upstream of the target locus.

Composite
class moclo.kits.ytk.YTKPart234(YTKPart, YTKEntry)[source]

A YTK Type 234 part (Composite 2, 3, 4).

_images/type234.svg

Type 234 parts are composed of a complete expression cassette (promoter, coding sequence, and terminator) fused into a single part, instead of separate Type 2, 3 and 4 parts.

class moclo.kits.ytk.YTKPart234r(YTKPart, YTKEntry)[source]

A YTK Type 234 part (Composite 2, 3, 4) with reversed BsaI sites.

_images/type234r.svg

Type 234r parts are designed so that the BsaI sites are kept within the final cassette. They are used to assemble canonical integration vectors, where the Type 234 part acts as a placeholder until replaced by actual Type 2, 3 and 4 parts in the final construct.

class moclo.kits.ytk.YTKPart678(YTKPart, YTKCassetteVector)[source]

A YTK Type 678 part (Composite 6, 7, 8).

_images/type678.svg

Type 678 parts are used when there is no requirement for yeast markers and origins to be included in the final assembly, for instance when assembling an intermediary plasmid acting as a vector for a multi-gene construct.

Level 1

Module
class moclo.kits.ytk.YTKCassette(Cassette)[source]

A MoClo Yeast ToolKit cassette.

cutter

alias of Bio.Restriction.Restriction.BsmBI

Vector
class moclo.kits.ytk.YTKDeviceVector(DeviceVector)[source]

A MoClo Yeast ToolKit multigene vector.

Parts of Type 1 and 5 are used to order the cassette plasmids within the multigene assembly. The vector always contains a ConLS and ConRE parts.

References

Lee et al., Supplementary Figure S21.

cutter

alias of Bio.Restriction.Restriction.BsmBI

Indices and tables