|Abstract:||The G-Quadruplex (GQ) is a repetitive, guanine rich DNA sequence that occurs throughout the genome. This specific DNA motif is capable of forming an alternative DNA secondary structure similar to a rectangular prism. The most common GQ motif consists of four sets of three guanine bases separated by varying intervening loops, typically comprised of 1 to 9 bases in length. The four sets of guanine triplets stabilize the GQ structure through Hoogsteen base pairing with the assistance of monovalent cations. Two predominant folding motifs of these structures have been identified within a single-stranded (ss) DNA context: parallel and antiparallel. Further work is needed to establish if these structural folding trends are consistent in the double-stranded (ds) DNA context. Limited studies in this area have occurred due to the lack of available methodologies for characterizing these GQ structures within a duplex DNA context (i.e. a GQ structure with its complementary sequence embedded within a dsDNA context).
The biological importance of these structures can be traced to GQ involvement in regulation of replication and transcription, genome rearrangements, translation and telomere processing. In support of experimental findings, computational studies have revealed a 230-fold enrichment of GQ sequences in upstream of promoters over the genomic average, amounting to nearly half of all human promoters containing putative GQ sequences. Moreover, GQ sequences are highly likely to be found in oncogenes and regulatory genes. Auxiliary to these observations are findings that GQ sequences are less likely to be located within the template strand, coding regions, tumor suppressor genes, and housekeeping genes. The highly selective positions of GQ imply that GQs may regulate particular set of biological processes and suggest that the stabilization of the structure may serve as a novel pharmaceutical target.
Despite the plethora of reports on telomeric DNA, relatively few studies have looked into GQs located within genomic regions. Furthermore, most studies focused on several well-characterized sequences such as c-Myc, TERT and BCL2 formed in the context of ssDNA. Although ssDNA may be relevant for studying the telomeric overhang (naturally single-stranded), it cannot be an appropriate platform for investigating ~800,000 putative GQ-forming sequences in dsDNA found throughout the human genome. Previously accepted methodologies in GQ structural investigations have proven to be cumbersome, at best, when utilized within the dsDNA context. Additionally, current, widely-available GQ investigational tools provide only qualitative insight into GQ formation and structure. Circular dichroism, the standard methodology for identifying folding properties of putative GQ sequences, fails to provide characteristic signals for GQs when probing dsDNA. The limitations of current techniques provide an opportunity for the development of more quantitative, biologically applicable analysis tools.
We have developed a bulk-phase induced fluorescence-based assay that can distinguish between folded and unfolded GQs, as well as identify predominant folding motifs (parallel or antiparallel). Our central objective is to elucidate the rules governing GQ folding within dsDNA and identify potential GQs that can affect crucial biological processes such as replication, transcription and translation. This thesis describes four key findings. First, GQ formation is much less robust within the duplex setting as compared to single-stranded contexts. Second, stable GQ folding within a dsDNA context is driven by both sequence composition and loop length. Third, strong GQ folding within genomic DNA is underrepresented near genetic regulatory elements. Fourth, GQ folding imposes barrier effects in gene expression in e. coli. Based on these findings, a comprehensive GQ folding atlas was developed which highlights potentially important GQ structures and their function in gene expression.