exonize_analysis module tutorial

This guide will walk you through the basics of using the exonize_analysis module for parsing and analyzing the output data generated by exonize.

Example: Human Y chromosome

Data representation

Step 1: Import the exonize_analysis module

from exonize import exonize_analysis as exonize
Step 2: Set the path to the exonize results database

For this example, we’ll use the results database generated for the Y chromosome in the usage example.

db_path='Homo_sapiens_chrom_Y_exonize/Homo_sapiens_chrom_Y_results.db'
Step 3: Create a GenomeExpansions object

Initialize the GenomeExpansions object using the specified database.

ychrom_expansions = exonize.GenomeExpansions(exonize_db_path=db_path)

Now let’s check the number of genes identified with duplication events:

>>> len(ychrom_expansions)
15

Step 4: List genes with exon duplications

Display some of the gene IDs where exon duplications were found:

>>> print(ychrom_expansions.genes[-4:])
['gene:ENSG00000188120', 'gene:ENSG00000165246', 'gene:ENSG00000205916', 'gene:ENSG00000205944']

Step 5: Examine the DAZ1 gene

Let’s take a closer look at the DAZ1 gene (Ensembl id gene:ENSG00000205916). Note that the Gene object will return an interable of Expansion objects.

>>> ychrom_expansions['gene:ENSG00000205916']
<Gene gene:ENSG00000205916 with 6 expansions (iterable of expansion graphs)>

DAZ1 has 6 expansions:

>>> len(ychrom_expansions['gene:ENSG00000205916'])
6

Step 6: Inspect expansion #1

Check the size of expansion #1:

>>> len(ychrom_expansions['gene:ENSG00000205916'][1])
18

This output shows that an exon in DAZ1 duplicated 18 times. Now, let’s look at the types of these duplications:

>>> types = [attrib.get('type') for node, attrib in ychrom_expansions['gene:ENSG00000205916'][1].nodes(data=True)]
>>> type_counts = {t: types.count(t) for t in set(types)}
>>> type_counts
{'FULL': 15, 'INTRONIC': 3}
This tells us that there are matches between 15 exons and 3 intronic regions.

Plotting

Step 1: Visualize expansion #1

Now, let’s visualize the expansion graph for the DAZ1 gene. Nodes represent coordinates and edges indicate matches found between them:

>>> ychrom_expansions['gene:ENSG00000205916'].draw_expansions_multigraph(expansion_id=1)

Step 2: Visualize the gene structure

The draw_gene_structure method uses the [dna_features_viewer](https://edinburgh-genome-foundry.github.io/DnaFeaturesViewer/) library to illustrate the positions of expansion events within the gene, along with the coding exons it comprises.

First, we need to parse the gene hierarchy dictionary (found in the exonize output directory) to obtain the gene structure:

>>> ychrom_expansions.parse_gene_hierarchy_dictionary(
    gene_hierarchy_dictionary_path='Homo_sapiens_chrom_Y_exonize/Homo_sapiens_chrom_Y_gene_hierarchy.pkl'
)
Let's now plot the gene's features:

>>> ychrom_expansions['gene:ENSG00000205916'].draw_gene_structure(
    expansion_id=1
)

The dark bars indicate the locations of the gene's coding sequences, while the light bars highlight the locations of the expansion events, colored according to the event mode.

Step 3: Visualize the full expansion #1

To visualize the full expansion, and matches between tandem exon pairs set the full_expansion and color_tandem_pair_edges parameters to True.

>>> ychrom_expansions['gene:ENSG00000205916'].draw_expansions_multigraph(
    expansion_id=1,
    full_expansion=True,
    color_tandem_pair_edges=True
)

We can see that the expansion graph is composed of 15 tandemly duplicated exons.

Step 4: Visualize the match graph for DAZ1

You can also display all expansions associated with DAZ1 by ommiting the expansion_id parameter:

>>> ychrom_expansions['gene:ENSG00000205916'].draw_expansions_multigraph(
    full_expansion=True,
    color_tandem_pair_edges=True
)

We can see that all 6 events found in step 5 refer to full exon duplications, where only the first expansion is found to be composed by tandem pairs.