First name
Pranami
Last name
Bora
Year of Study
Research Center
Thesis Title
An integrative approach to identify binding partners of Myc using (epi)genomics data in the 3T9MycER, Eμ-myc and tet-MYC
Thesis Abstract
The c-MYC oncogene encodes the transcription factor Myc, which regulates a large number of
biological processes and is overexpressed in a large number of cancers. When overexpressed, Myc
binds to almost all open promoters but only regulates specific subsets of genes. We investigated this
issue in three systems where Myc is overexpressed: 3T9MycER fibroblasts, Eμ-myc B cells and tet-
MYC liver cells, through an approach integrating different types of next generation sequencing data,
such as DNase-seq footprinting, ChIP-seq and RNA-seq, with motif analysis and machine learning
methods (random forest). In order to analyse the DNase-seq footprinting data in our systems, we
developed a novel pipeline that carries out step-by-step analysis of the raw DNase-seq data, and
outputs DHS and TF footprints. We overlapped genome wide the footprints identified by the pipeline
with matches of a PWM library, obtaining a list of footprinted PWMs. We first applied a single
feature classifier assessing the performance of each of the PWMs one by one, and we found that
single PWMs only provided a limited classification of the gene subsets. We then turned to a random
forest classifier that considers combinations of PWMs as features. This strategy provided a good
separation of the data sets (AUC>0.7) and identified some candidates, such as Nrf1/Nrf2 (Eμ-myc T
up), Tead factors (Eμ-myc T and tet-MYC up), E2f4 (Eμ-myc T up) and E2f1(Eμ-myc T and tet-MYC
up), that could potentially act with Myc in regulating specific subsets of genes.
biological processes and is overexpressed in a large number of cancers. When overexpressed, Myc
binds to almost all open promoters but only regulates specific subsets of genes. We investigated this
issue in three systems where Myc is overexpressed: 3T9MycER fibroblasts, Eμ-myc B cells and tet-
MYC liver cells, through an approach integrating different types of next generation sequencing data,
such as DNase-seq footprinting, ChIP-seq and RNA-seq, with motif analysis and machine learning
methods (random forest). In order to analyse the DNase-seq footprinting data in our systems, we
developed a novel pipeline that carries out step-by-step analysis of the raw DNase-seq data, and
outputs DHS and TF footprints. We overlapped genome wide the footprints identified by the pipeline
with matches of a PWM library, obtaining a list of footprinted PWMs. We first applied a single
feature classifier assessing the performance of each of the PWMs one by one, and we found that
single PWMs only provided a limited classification of the gene subsets. We then turned to a random
forest classifier that considers combinations of PWMs as features. This strategy provided a good
separation of the data sets (AUC>0.7) and identified some candidates, such as Nrf1/Nrf2 (Eμ-myc T
up), Tead factors (Eμ-myc T and tet-MYC up), E2f4 (Eμ-myc T up) and E2f1(Eμ-myc T and tet-MYC
up), that could potentially act with Myc in regulating specific subsets of genes.
Students representatives
Off
Curricula Term