exonize_analysis
module tutorial¶
This guide will walk you through the basics of using the exonize_analysis
module for parsing and analyzing the output data generated by exonize
.
Example: Human Y chromosome¶
Data representation¶
Step 1: Import the exonize_analysis
module
from exonize import exonize_analysis as exonize
For this example, we’ll use the results database generated for the Y chromosome in the usage example.
db_path='Homo_sapiens_chrom_Y_exonize/Homo_sapiens_chrom_Y_results.db'
GenomeExpansions
object
Initialize the GenomeExpansions
object using the specified database.
ychrom_expansions = exonize.GenomeExpansions(exonize_db_path=db_path)
Now let’s check the number of genes identified with duplication events:
>>> len(ychrom_expansions)
15
Step 4: List genes with exon duplications
Display some of the gene IDs where exon duplications were found:
>>> print(ychrom_expansions.genes[-4:])
['gene:ENSG00000188120', 'gene:ENSG00000165246', 'gene:ENSG00000205916', 'gene:ENSG00000205944']
Step 5: Examine the DAZ1 gene
Let’s take a closer look at the DAZ1 gene (Ensembl id gene:ENSG00000205916). Note that the Gene object will return an interable of Expansion
objects.
>>> ychrom_expansions['gene:ENSG00000205916']
<Gene gene:ENSG00000205916 with 6 expansions (iterable of expansion graphs)>
DAZ1 has 6 expansions:
>>> len(ychrom_expansions['gene:ENSG00000205916'])
6
Step 6: Inspect expansion #1
Check the size of expansion #1:
>>> len(ychrom_expansions['gene:ENSG00000205916'][1])
18
This output shows that an exon in DAZ1 duplicated 18 times. Now, let’s look at the types of these duplications:
>>> types = [attrib.get('type') for node, attrib in ychrom_expansions['gene:ENSG00000205916'][1].nodes(data=True)]
>>> type_counts = {t: types.count(t) for t in set(types)}
>>> type_counts
{'FULL': 15, 'INTRONIC': 3}
Plotting¶
Step 1: Visualize expansion #1
Now, let’s visualize the expansion graph for the DAZ1 gene. Nodes represent coordinates and edges indicate matches found between them:
>>> ychrom_expansions['gene:ENSG00000205916'].draw_expansions_multigraph(expansion_id=1)
Step 2: Visualize the gene structure
The draw_gene_structure
method uses the [dna_features_viewer]
(https://edinburgh-genome-foundry.github.io/DnaFeaturesViewer/) library to illustrate the positions of expansion events within the gene, along with the coding exons it comprises.
First, we need to parse the gene hierarchy dictionary (found in the exonize output directory) to obtain the gene structure:
>>> ychrom_expansions.parse_gene_hierarchy_dictionary(
gene_hierarchy_dictionary_path='Homo_sapiens_chrom_Y_exonize/Homo_sapiens_chrom_Y_gene_hierarchy.pkl'
)
>>> ychrom_expansions['gene:ENSG00000205916'].draw_gene_structure(
expansion_id=1
)
The dark bars indicate the locations of the gene's coding sequences, while the light bars highlight the locations of the expansion events, colored according to the event mode.
Step 3: Visualize the full expansion #1
To visualize the full expansion, and matches between tandem exon pairs set the full_expansion
and color_tandem_pair_edges
parameters to True
.
>>> ychrom_expansions['gene:ENSG00000205916'].draw_expansions_multigraph(
expansion_id=1,
full_expansion=True,
color_tandem_pair_edges=True
)
We can see that the expansion graph is composed of 15 tandemly duplicated exons.
Step 4: Visualize the match graph for DAZ1
You can also display all expansions associated with DAZ1 by ommiting the expansion_id
parameter:
>>> ychrom_expansions['gene:ENSG00000205916'].draw_expansions_multigraph(
full_expansion=True,
color_tandem_pair_edges=True
)
We can see that all 6 events found in step 5 refer to full exon duplications, where only the first expansion is found to be composed by tandem pairs.