Getting Started with pySBOL3

This beginner’s guide introduces the basic principles of pySBOL3 for new users. The examples discussed in this guide are excerpted from the Jupyter notebook (pySBOL3/examples/getting_started.ipynb). The objective of this documentation is to familiarize users with the basic patterns of the API. For more comprehensive documentation about the API, refer to documentation about specific classes and methods.

The class structure and data model for the API is based on the Synthetic Biology Open Language. For more detail about the SBOL standard, visit sbolstandard.org or refer to the specification document. This document provides diagrams and description of all the standard classes and properties that comprise SBOL.

Creating an SBOL Document

In a previous era, engineers might sit at a drafting board and draft a design by hand. The engineer’s drafting sheet in pySBOL2 is called a Document. The Document serves as a container, initially empty, for SBOL data objects which represent elements of a biological design. Usually the first step is to construct a Document in which to put your objects. All file I/O operations are performed on the Document. The Document read and write methods are used for reading and writing files in SBOL format.

>>> import sbol3
>>> doc = sbol3.Document()
>>> doc.read('simple_library.nt')
>>> doc.write('simple_library_out.nt')

Reading a Document will wipe any existing contents clean before import.

A Document may contain different types of SBOL objects, including ComponentDefinitions, ModuleDefinitions, Sequences, and Models. These objects are collectively referred to as TopLevel objects because they can be referenced directly from a Document. The total count of objects contained in a Document is determined using the len function. To view an inventory of objects contained in the Document, simply print it.

>>> len(doc)
67
>>> print(doc)
Collection....................4
CombinatorialDerivation.......6
Component.....................33
Sequence......................24
---
Total: .........................67

Each SBOL object in a Document is uniquely identified by a special string of characters called a Uniform Resource Identifier (URI). A URI is used as a key to retrieve objects from the Document. To see the identities of objects in a Document, iterate over them using a Python iterator.

>>> for obj in doc.objects:
...     print(obj.identity)
...
http://sbolstandard.org/testfiles/All_FPs
http://sbolstandard.org/testfiles/FPs_small
http://sbolstandard.org/testfiles/FPs_small_ins
.
.

These URIs are said to be sbol-compliant. An sbol-compliant URI consists of a namespace, an optional local path, and a display ID (display_id). In this tutorial, we use URIs of the type http://sbolstandard.org/testfiles/my_obj, where the namespace is http://sbolstandard.org/testfiles, and the display ID is my_object.

Based on our inspection of objects contained in the Document above, we can see that these objects were all created in the namespace http://sbolstandard.org/testfiles. Thus, in order to take advantage of SBOL-compliant URIs, we set an environment variable that configures this namespace as the default.

>>> sbol3.set_namespace('http://sbolstandard.org/testfiles')

Setting the namespace has several advantages. It simplifies object creation and retrieval from Documents. In addition, it serves as a way for a user to claim ownership of new objects. Generally users will want to specify a namespace that corresponds to their organization’s web domain.

Creating SBOL Data Objects

Biological designs can be described with SBOL data objects, including both structural and functional features. The principle classes for describing the structure and primary sequence of a design are Component, Sequence, and Feature. The principle classes for describing the function of a design are Component, Feature, Interaction, and Participation.

In the SBOL specification document, classes and their properties are represented as box diagrams. Each box represents an SBOL class and its attributes. Following is an example of the diagram for the Component class which will be referred to in later sections. These class diagrams follow conventions of the Unified Modeling Language.

_images/component_uml.png

As introduced in the previous section, SBOL objects are identified by a uniform resource identifier (URI). When a new object is constructed, the user must assign a unique identity. The identity is ALWAYS the first argument supplied to the constructor of an SBOL object.

Constructors for SBOL objects follow a predictable pattern. The first argument is an identifier, which can be either a full URI, a universally unique identifier (UUID), or a display ID (possibly with a local path). If the first argument to the constructor is a valid URI or UUID, the object is created with the URI or UUID as its identity. Otherwise, the object is created with an identity composed of the first argument appended to the configured namespace (set using sbol3.set_namespace()). Constructors can take additional arguments, depending on whether the SBOL class has required attributes. Attributes are required if the specification says they are. In a UML diagram, required attributes are indicated as properties with a cardinality of 1 or more. For example, a Component (see the UML diagram above) has only one required attribute, types, which specifies one or more molecular types for a component. Required attributes MUST be specified when calling a constructor.

The following code creates a protein component (types set to SBO_PROTEIN).

>>> cas9 = sbol3.Component('Cas9', sbol3.SBO_PROTEIN)

The following code creates a DNA component (types set to SBO_DNA).

>>> target_promoter = sbol3.Component('target_promoter', sbol3.SBO_DNA, roles=[sbol3.SO_PROMOTER])

The following code creates a DNA component with a local path (/promoters/), and another DNA component with a different namespace.

>>> # Include a local path in addition to a display_id
>>> second_promoter = sbol3.Component('promoters/second_promoter', sbol3.SBO_DNA)
>>>
>>> # Use a namespace different from the configured default namespace
>>> third_promoter = sbol3.Component('http://sbolstandard.org/other_namespace/third_promoter', sbol3.SBO_DNA)

For examples of how the first argument of the SBOL object constructor is used to assign the object’s identity and display_id, compare the following:

>>> target_promoter.identity
'http://sbolstandard.org/testfiles/target_promoter'
>>> target_promoter.display_id
'target_promoter'
>>> second_promoter.identity
'http://sbolstandard.org/testfiles/promoters/second_promoter'
>>> second_promoter.display_id
'second_promoter'
>>> third_promoter.identity
'http://sbolstandard.org/other_namespace/third_promoter'
>>> third_promoter.display_id
'third_promoter'

Using Ontology Terms for Attribute Values

Notice the Component.types attribute is specified using predefined constants (sbol3.SBO_PROTEIN and sbol3.SBO_DNA in the examples above). The Component.types property is one of many SBOL attributes that uses ontology terms as property values. The Component.types property uses the Systems Biology Ontology (SBO) to be specific. Ontologies are standardized, machine-readable vocabularies that categorize concepts within a domain of scientific study. The SBOL 3.0 standard unifies many different ontologies into a high-level, object-oriented model.

Ontology terms also take the form of Uniform Resource Identifiers. Many commonly used ontological terms are built-in to pySBOL3 as predefined constants. If an ontology term is not provided as a built-in constant, its URI can often be found by using an ontology browser tool online. Browse Sequence Ontology terms here and Systems Biology Ontology terms here. While the SBOL specification often recommends particular ontologies and terms to be used for certain attributes, in many cases these are not rigid requirements. The advantage of using a recommended term is that it ensures your data can be interpreted or visualized by other applications that support SBOL. However in many cases an application developer may want to develop their own ontologies to support custom applications within their domain.

The following example illustrates how the URIs for ontology terms can be easily constructed, assuming they are not already part of pySBOL3’s built-in ontology constants.

>>> SO_ENGINEERED_FUSION_GENE = tyto.SO.engineered_fusion_gene
>>> SO_ENGINEERED_FUSION_GENE
'https://identifiers.org/SO:0000288'
>>> SBO_DNA_REPLICATION = tyto.SBO.DNA_replication
>>> SBO_DNA_REPLICATION
'https://identifiers.org/SBO:0000204'

For more information on using ontology terms with pySBOL3, see: Using Ontology Terms.

Adding, Finding, and Getting Objects from a Document

In some cases a developer may want to use SBOL objects as intermediate data structures in a computational biology workflow. In this case, the user is free to manipulate objects independently of a Document. However, if the user wishes to write out a file with all the information contained in their object, they must first add it to the Document. This is done using the add method.

>>> doc.add(target_promoter)
>>> doc.add(cas9)

Objects can be found and retrieved from a Document by using the find method. This method can take either the object’s identity (i.e., full URI) or display_id (local identifier) as an argument.

>>> cas9.identity
'http://sbolstandard.org/testfiles/Cas9'
>>> found_obj = doc.find('http://sbolstandard.org/testfiles/Cas9')
>>> found_obj.identity
'http://sbolstandard.org/testfiles/Cas9'
>>> cas9.display_id
'Cas9'
>>> found_obj = doc.find('Cas9')
>>> found_obj.identity
'http://sbolstandard.org/testfiles/Cas9'

It is possible to have multiple SBOL objects with the same display_id (but different identity) in the same document. In that case, if the find method is called with the display_id as the argument, it will return the matching object that was added to the document first.

>>> cas9a = sbol3.Component('http://sbolstandard.org/other_namespace/Cas9', sbol3.SBO_PROTEIN)
>>> cas9a.identity
'http://sbolstandard.org/other_namespace/Cas9'
>>> cas9a.display_id
'Cas9'
>>> doc.add(cas9a)
>>> found_obj = doc.find('Cas9')
>>> found_obj.identity
'http://sbolstandard.org/testfiles/Cas9'
>>> found_obj = doc.find('http://sbolstandard.org/other_namespace/Cas9')
>>> found_obj.identity
'http://sbolstandard.org/other_namespace/Cas9'

Getting, Setting, and Editing Attributes

The attributes of an SBOL object can be accessed like other Python class objects, with a few special considerations. For example, to get the values of the display_id and identity properties of any object :

>>> print(cas9.display_id)
Cas9
>>> print(cas9.identity)
http://sbolstandard.org/testfiles/Cas9

Note that display_id gives only the shorthand, local identifier for the object, while the identity property gives the full URI.

The attributes above return singleton values. Some attributes, like Component.roles and Component.types support multiple values. Generally these attributes have plural names. If an attribute supports multiple values, then it will return a list. If the attribute has not been assigned any values, it will return an empty list.

>>> cas9.types
['https://identifiers.org/SBO:0000252']
>>> cas9.roles
[]

Setting an attribute follows the ordinary convention for assigning attribute values:

>>> cas9.description = 'This is a Cas9 protein'

To set multiple values:

>>> plasmid = sbol3.Component('pBB1', sbol3.SBO_DNA)
>>> plasmid.roles = [ sbol3.SO_DOUBLE_STRANDED, sbol3.SO_CIRCULAR ]

Properties such as types and roles behave like Python lists, and list operations like append and extend will work directly on these kind of attributes:

>>> plasmid.roles = [ sbol3.SO_DOUBLE_STRANDED ]
>>> plasmid.roles.append( sbol3.SO_CIRCULAR )

>>> plasmid.roles = []
>>> plasmid.roles.extend( [sbol3.SO_DOUBLE_STRANDED, sbol3.SO_CIRCULAR] )

>>> plasmid.roles = [ sbol3.SO_DOUBLE_STRANDED ]
>>> plasmid.roles += [ sbol3.SO_CIRCULAR ]

To clear all values from an attribute, set it to an empty list:

>>> plasmid.roles = []

Creating and Adding Child Objects

Some SBOL objects can be composed into hierarchical parent-child relationships. In the specification diagrams, these relationships are indicated by black diamond arrows. In the UML diagram above, the black diamond indicates that Components are parents of Features. In pySBOL3, properties of this type are created as subcomponents and then added to the appropriate list attribute of the parent component. The constructor for the SubComponent class takes a Component as its only required argument. In this usage, the Component is “… analogous to a blueprint or specification sheet for a biological part…” Whereas the SubComponent “… represents the specific occurrence of a part…” within a larger design (SBOL version 3.0.0 specification document). For example, to add a promoter to a circuit design, first define the promoter and circuit as SBOL Component objects, then define a SubComponent as an instance of the promoter and add that SubComponent to the circuit’s features attribute:

>>> ptet = sbol3.Component('pTetR', sbol3.SBO_DNA, roles=[sbol3.SO_PROMOTER])

>>> circuit = sbol3.Component('circuit', sbol3.SBO_DNA, roles=[sbol3.SO_ENGINEERED_REGION])

>>> ptet_sc = sbol3.SubComponent(ptet)
>>> circuit.features += [ptet_sc]

Creating and Editing Reference Properties

Some SBOL objects point to other objects by way of URI references. For example, Components point to their corresponding Sequences by way of a URI reference. These kind of properties correspond to white diamond arrows in UML diagrams, as shown in the figure above. Attributes of this type contain the URI of the related object.

>>> gfp = sbol3.Component('GFP', sbol3.SBO_DNA)
>>> doc.add(gfp)
>>> gfp_seq = sbol3.Sequence('GFPSequence', elements='atgnnntaa', encoding=sbol3.IUPAC_DNA_ENCODING)
>>> doc.add(gfp_seq)
>>> gfp.sequences = [ gfp_seq ]
>>> print(gfp.sequences)
['http://sbolstandard.org/testfiles/GFPSequence']
>>> # Look up the sequence via the document
>>> seq2 = gfp.sequences[0].lookup()
>>> seq2 == gfp_seq
True

Note that assigning the gfp_seq object to the gfp.sequences actually results in assignment of the object’s URI. An equivalent assignment is as follows:

>>> gfp.sequences = [ gfp_seq.identity ]
>>> print(gfp.sequences)
['http://sbolstandard.org/testfiles/GFPSequence']
>>> seq2 = gfp.sequences[0].lookup()
>>> seq2 == gfp_seq
True

Also note that the DNA sequence information is saved as the elements attribute of the Sequence object, as per the SBOL 3 specification:

>>> gfp_seq.elements
'atgnnntaa'

Iterating and Indexing List Properties

Some SBOL object properties can contain multiple values or objects. You may iterate over those list properties as with normal Python lists:

>>> # Iterate through objects (black diamond properties in UML)
>>> for feat in circuit.features:
...     print(f'{feat.display_id}, {feat.identity}, {feat.instance_of}')
...
SubComponent1, http://sbolstandard.org/testfiles/circuit/SubComponent1, http://sbolstandard.org/testfiles/pTetR
SubComponent2, http://sbolstandard.org/testfiles/circuit/SubComponent2, http://sbolstandard.org/testfiles/op1
SubComponent3, http://sbolstandard.org/testfiles/circuit/SubComponent3, http://sbolstandard.org/testfiles/RBS1
.
.
>>> # Iterate through references (white diamond properties in UML)
>>> for seq in gfp.sequences:
...     print(seq)
...
http://sbolstandard.org/testfiles/GFPSequence

Numerical indexing of list properties works as well:

>>> for n in range(0, len(circuit.features)):
...     print(circuit.features[n].display_id)
...
SubComponent1
SubComponent2
SubComponent3
.
.

Copying Documents and Objects

Copying a Document can result in a few different ends, depending on the user’s goal. The first option is to create a simple copy of the original Document with Document.copy. After copying, the object in the Document clone has the same identity as the object in the original Document.

>>> import sbol3
>>> sbol3.set_namespace('https://example.org/pysbol3')
>>> doc = sbol3.Document()
>>> cd1 = sbol3.Component('cd1', types=[sbol3.SBO_DNA])
>>> doc.add(cd1)
<sbol3.component.Component object at 0x7fb7d805b9a0>
>>> for o in doc:
...     print(o)
...
<Component https://example.org/pysbol3/cd1>
>>> doc2 = doc.copy()
>>> for o in doc2:
...     print(o)
...
<Component https://example.org/pysbol3/cd1>
>>> cd1a = doc2.find('cd1')
>>>
>>> # The two objects point to different locations in memory, they are different objects with the same name.
>>>
>>> cd1a
<sbol3.component.Component object at 0x7fb7c83e7c40>
>>> cd1
<sbol3.component.Component object at 0x7fb7d805b9a0>

The sbol3.copy function is a more powerful way to copy or clone objects. Document.copy is built on sbol3.copy. The sbol3.copy function lets a user copy objects as above. It also lets the user change object namespaces and add the new documents to an existing Document.

For example, if a user wants to copy objects and change the namespace of those objects, a user can use the into_namespace argument to sbol3.copy. Following on from the example above:

>>> objects = sbol3.copy(doc, into_namespace='https://example.org/foo')
>>> len(objects)
1
>>> for o in objects:
...     print(o)
...
<Component https://example.org/foo/cd1>
>>>

Finally, if a user wants to construct a new set of objects and add them to an existing Document they can do so using the into_document argument to sbol3.copy. Again, following on from the example above:

>>> doc3 = sbol3.Document()
>>> len(doc3)
0
>>> # Any iterable of TopLevel can be passed to sbol3.copy:
>>> sbol3.copy([cd1], into_namespace='https://example.org/bar', into_document=doc3)
[<sbol3.component.Component object at 0x7fb7d844aa60>]
>>> len(doc3)
1
>>> for o in doc3:
...     print(o)
...
<Component https://example.org/bar/cd1>
>>>