Mechanism of Alu integration into the human genome
© Springer Science+Business Media B.V. 2007
Received: 31 January 2007
Accepted: 6 March 2007
Published: 28 March 2007
LINE-1 or L1 has driven the generation of at least 10% of the human genome by mobilising Alu sequences. Although there is no doubt that Alu insertion is initiated by L1-dependent target site-primed reverse transcription, the mechanism by which the newly synthesised 3′ end of a given Alu cDNA attaches to the target genomic DNA is less well understood. Intrigued by observations made on 28 pathological simple Alu insertions, we have sought to ascertain whether microhomologies could have played a role in the integration of shorter Alu sequences into the human genome. A meta-analysis of the 1624 Alu insertion polymorphisms deposited in the Database of Retrotransposon Insertion Polymorphisms in Humans (dbRIP), when considered together with a re-evaluation of the mechanism underlying how the three previously annotated large deletion-associated short pathological Alu inserts were generated, enabled us to present a unifying model for Alu insertion into the human genome. Since Alu elements are comparatively short, L1 RT is usually able to complete nascent Alu cDNA strand synthesis leading to the generation of full-length Alu inserts. However, the synthesis of the nascent Alu cDNA strand may be terminated prematurely if its 3′ end anneals to the 3′ terminal of the top strand’s 5′ overhang by means of microhomology-mediated mispairing, an event which would often lead to the formation of significantly truncated Alu inserts. Furthermore, the nascent Alu cDNA strand may be ‘hijacked’ to patch existing double strand breaks located in the top-strand’s upstream regions, leading to the generation of large genomic deletions.
KeywordsAlu insertion polymorphisms Human genetic disease Human genome evolution L1 LINE-1 Retrotransposition
LINE-1 (long interspersed element-1) or L1-mediated retrotransposition has significantly impacted upon human genome evolution (for recent reviews, see Deininger et al. 2003; Kazazian 2004; Han and Boeke 2005; Hedges and Batzer 2005) but has also given rise to human genetic disease (Chen et al. 2005, 2006). Intriguingly, L1 elements have driven the generation of some 10% of the human genome mass by mobilising Alu sequences (Lander et al. 2001; Batzer and Deininger 2002). Although there is no doubt that Alu insertion is initiated by L1 endonuclease and reverse transcriptase (RT)-dependent target site-primed reverse transcription (TPRT; Dewannieux et al. 2003; Hagan et al. 2003), the mechanism by which the newly synthesised 3′ end of a given Alu cDNA attaches to the target genomic DNA is less well understood. In this regard, the integration of full-length L1 elements has recently been proposed to occur via a template-jumping model whereas the integration of 5′-truncated L1 elements is thought to result predominantly from a microhomology-mediated end-joining (MMEJ) model (Zingler et al. 2005; Babushok et al. 2006). The integration of full-length Alu elements can also be explained, at least in principle, by the template-jumping model. However, unlike 5′-truncated L1 elements, 5′-truncated Alu elements appear by and large not to be integrated via the MMEJ model (Zingler et al. 2005).
Identification of microhomology existing between the top strand’s 5′ overhang and the sequence that lies 5′ to the truncation position in the Alu consensus sequence
The sub-family of each selected Alu insert was checked/annotated using RepeatMasker (http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker; as of December 6, 2006). Although in some cases, annotations were different from those previously reported in Chen et al. (2005, 2006) and dbRIP, this did not affect the conclusions of the study in any way. Consensus sequences of AluYa5, AluYa8, AluYb8, AluYb9, AluY, AluSq, AluYg6, AluYd8 and AluSp sub-families were taken from Repbase (http://www.girinst.org/repbase/update/browse.php; Jurka et al. 2005). Sequence alignments were performed with ClustalW (http://www.ebi.ac.uk/clustalw/#).
A trimodal length distribution of simple Alu inserts and the role of microhomology in generating shorter Alu inserts
Studies of recently inserted genomic L1 elements in the human genome (Myers et al. 2002; Pavlicek et al. 2002; Szak et al. 2002; Boissinot et al. 2004), pathological L1 direct insertions (Chen et al. 2005), and de novo L1 insertions in cultured human cells (Gilbert et al. 2002; 2005) as well as in a transgenic mouse model (Babushok et al. 2006) have consistently shown that simple L1 inserts display a bimodal length distribution with a large peak of short (<2 kb) and a smaller peak of longer (∼6 kb) integrations. Although the exact mechanism underlying this bimodal distribution remains controversial (e.g. Farley et al. 2004; Gilbert et al. 2005), the generation of the abundant short L1 inserts would appear to be facilitated by the presence of microhomologies frequently found between the top strand’s 5′ overhang in the target genomic sequence and the 3′ end of the nascent L1 RT-transcribed cDNA strand (Zingler et al. 2005; Babushok et al. 2006).
Correlation between the Presence of Microhomology (1–7 bp) and the length of the 5′ truncation of Alu insertion polymorphismsa
Number of entries manifesting microhomology (A)
Total number of entries (B)
23 (1 bp)
17 (≥2 bp)
10 (1 bp)
5 (≥2 bp)
17 (1 bp)
12 (≥2 bp)
As mentioned above, only 34.8% of the Group II Alu inserts were found to exhibit microhomology. By contrast, microhomology was found in some 50% (44/89) of the Group III Alu inserts. As a matter of fact, in the context of the 5′ truncated Alu insertion polymorphisms (i.e. starting positions, 8–271), there exists a positive correlation between the presence of microhomology and the length of the 5′ truncation (Table 1), thereby suggesting an important role of the MMEJ mechanism in generating shorter Alu inserts. Under this model, the generation of most of the shorter Alu inserts could have been promoted by the inadvertent annealing of the microhomology present between the 3′ end of the nascent Alu cDNA strand and the 3′ end of the top strand’s 5′ overhang. This would then be followed by the premature termination of nascent cDNA strand synthesis with concomitant initiation of second Alu cDNA strand synthesis by either a second L1 RT or a host DNA repair enzyme. In addition, we should point out that our finding differs from the recent genome-wide analysis that has concluded that 5′ truncated Alu elements exhibit no (or only a weak) tendency to exhibit microhomology (Zingler et al. 2005). The discrepancy may be due to one or more of the following reasons. Firstly, Zingler et al. (2005) did not address the microhomology issue in relation to the different lengths of 5′ truncation. Secondly, these authors used only computer-generated data with respect to the analysis of the 5′ truncated Alu insertions. In other words, they did not analyse the relevant data manually. As shown in Supplementary Tables S3–S6, our manual evaluation led to the re-annotation of a significant fraction of the dbRIP entries.
Near Full-Length Alu insertion polymorphisms (i.e. starting positions 2–5 in accordance with their respective consensus sequences) that can be alternatively interpreted as full-length insertionsa
Number of entries that can be alternatively interpreted as full-length insertions
Total number of entries
Large deletion-associated short Alu inserts appear to be integrated through qualitatively different mechanisms
The generation of the three disease-causing large genomic deletions associated with Alu insertions can in principle be accounted for by the model illustrated in Fig. 6B from Gilbert et al. (2002): each event was putatively initiated by L1 endonuclease cleavage on the bottom strand but, unlike the typical process of TPRT leading to the generation of a simple insertional event, the L1 RT-transcribed Alu cDNA strand appears to have invaded a double strand break located far upstream of the bottom strand nick/break (Chen et al. 2005). This model can be further refined in the light of new developments in the field. Thus, in a genome-wide analysis of both human and chimpanzee data sets, Han et al. (2005) observed a significant positive correlation between the size of the L1 direct insertion and the size of the associated deletions. Han et al. (2005) surmised that the longer the newly synthesised L1 cDNA strand was, the higher would be the probability of forming sufficient complementarity between the end of the L1 cDNA and the region flanking the 5′ end of the L1 insertion in the ancestral sequence. This is indeed a plausible explanation for the generation of large genomic deletions created upon L1 insertion. This model cannot however be readily extrapolated to cases of large genomic deletions caused by insertions of Alu elements, simply because the Alu inserts in the three disease-causing events are significantly 5′ truncated (see Fig. 1). This notwithstanding, the model of Han et al. (2005) stimulated us to propose a refined model for the generation of large genomic deletions caused by Alu insertions: the significant sequence similarity existing between the regions spanning the top strand’s upstream deletion breakpoints and the newly synthesised Alu cDNA strands in all three cases (Fig. 4) suggests that the longer the stretch of complementarity, the higher the likelihood of a newly synthesised Alu cDNA strand annealing to a double strand break-containing far-upstream region. In this refined model, the position of the Alu truncation would be specified by the position of the double strand break in the top strand whereas the synthesis of the Alu cDNA strand might not necessarily need to be completed in order to obtain sufficient complementarity for strand annealing/invasion.
One further point warrants further discussion. It is possible that the top strand’s upstream double strand break may be attributable to the activity of L1 endonuclease (Gasior et al. 2006). Were this to be the case, this could predict an active role for L1-mediated retrotransposition in creating large genomic deletions. It should however be emphasised that the L1 endonuclease used to generate the top strand’s upstream double strand break may not necessarily be the same as that used to create the bottom strand’s first nick (Mine et al. 2007), by analogy to the proposition that two different L1 RT molecules may be used for twin-priming, leading to L1 inversion (Ostertag and Kazazian 2001b). It is equally possible that the top strand’s upstream double strand break was created independently of L1 endonuclease. Were this to be the case, “a fascinating scenario would present itself: the organism could have ‘hijacked’ the L1 machinery to repair an existing double strand break through a mechanism akin to single strand annealing.” (Chen et al. 2005). In this particular context, L1 integration may represent a ‘host/parasite battleground’ as it has been termed by Gilbert et al. (2005), in which L1 integration finds itself in a ‘race’ to complete cDNA synthesis before being ‘hijacked’ to patch an upstream double strand break.
A unified model for Alu insertion into the human genome
Based upon the above observations, we propose a unified model for Alu insertion in the human genome. Since Alu elements are comparatively short, L1 RT is usually able to complete nascent Alu cDNA strand synthesis before jumping to the 3′ end of the top strand’s 5′ overhang, resulting in the generation of either full-length (i.e. Group I events) or 5′ truncated (i.e. Group II events) Alu inserts. Alternatively, the synthesis of the nascent Alu cDNA strand may be terminated prematurely if its 3′ end anneals to the 3′ terminal of the top strand’s 5′ overhang by means of microhomology-mediated mispairing, an event which would often lead to the formation of significantly truncated (Group III) Alu inserts. Furthermore, the nascent Alu cDNA strand may be ‘hijacked’ to patch existing double strand breaks located in the top-strand’s upstream regions (which should usually comprise Alu-rich sequences), leading to the generation of large genomic deletions. Clearly, the unified model proposed here is likely to be subjected to further modification/revision by new studies as they emerge.
Database of Retrotransposon Insertion Polymorphisms in humans
- LINE-1 or L1:
Long interspersed element-1
Target site-primed reverse transcription
Target site duplications
This work was supported by the INSERM (Institut National de la Santé et de la Recherche Médicale), France.
- Babushok DV, Ostertag EM, Courtney CE, Choi JM, Kazazian HH Jr (2006) L1 integration in a transgenic mouse model. Genome Res 16:240–250PubMedPubMed CentralView ArticleGoogle Scholar
- Batzer MA, Deininger PL (2002) Alu repeats and human genomic diversity. Nat Rev Genet 3:370–379PubMedView ArticleGoogle Scholar
- Beauchamp NJ, Makris M, Preston FE, Peake IR, Daly ME (2000) Major structural defects in the antithrombin gene in four families with type I antithrombin deficiency–partial/complete deletions and rearrangement of the antithrombin gene. Thromb Haemost 83:715–721PubMedGoogle Scholar
- Bibillo A, Eickbush TH (2002) High processivity of the reverse transcriptase from a non-long terminal repeat retrotransposon. J Biol Chem 277:34836–34845PubMedView ArticleGoogle Scholar
- Bibillo A, Eickbush TH (2004) End-to-end template jumping by the reverse transcriptase encoded by the R2 retrotransposon. J Biol Chem 279:14945–14953PubMedView ArticleGoogle Scholar
- Boissinot S, Entezam A, Young L, Munson PJ, Furano AV (2004) The insertional history of an active family of L1 retrotransposons in humans. Genome Res 14:1221–1231PubMedPubMed CentralView ArticleGoogle Scholar
- Callinan PA, Wang J, Herke SW, Garber RK, Liang P, Batzer MA (2005) Alu retrotransposition-mediated deletion. J Mol Biol 348:791–800PubMedView ArticleGoogle Scholar
- Chen JM, Stenson PD, Cooper DN, Férec C (2005) A systematic analysis of LINE-1 endonuclease-dependent retrotranspositional events causing human genetic disease. Hum Genet 117:411–427PubMedView ArticleGoogle Scholar
- Chen JM, Férec C, Cooper DN (2006) LINE-1 endonuclease-dependent retrotranspositional events causing human genetic disease: mutation detection bias and multiple mechanisms of target gene disruption. J Biomed Biotechnol 2006:56182PubMedPubMed CentralView ArticleGoogle Scholar
- Deininger PL, Moran JV, Batzer MA, Kazazian HH Jr (2003) Mobile elements and mammalian genome evolution. Curr Opin Genet Dev 13:651–658PubMedView ArticleGoogle Scholar
- Dewannieux M, Esnault C, Heidmann T (2003) LINE-mediated retrotransposition of marked Alu sequences. Nat Genet 35:41–48PubMedView ArticleGoogle Scholar
- Farley AH, Luning Prak ET, Kazazian HH Jr (2004) More active human L1 retrotransposons produce longer insertions. Nucleic Acids Res 32:502–510PubMedPubMed CentralView ArticleGoogle Scholar
- Gasior SL, Wakeman TP, Xu B, Deininger PL (2006) The human LINE-1 retrotransposon creates DNA double-strand breaks. J Mol Biol 357:1383–1393PubMedPubMed CentralView ArticleGoogle Scholar
- Gilbert N, Lutz-Prigge S, Moran JV (2002) Genomic deletions created upon LINE-1 retrotransposition. Cell 110:315–325PubMedView ArticleGoogle Scholar
- Gilbert N, Lutz S, Morrish TA, Moran JV (2005) Multiple fates of L1 retrotransposition intermediates in cultured human cells. Mol Cell Biol 25:7780–7795PubMedPubMed CentralView ArticleGoogle Scholar
- Hagan CR, Sheffield RF, Rudin CM (2003) Human Alu element retrotransposition induced by genotoxic stress. Nat Genet 35:219–220PubMedView ArticleGoogle Scholar
- Han JS, Boeke JD (2005) LINE-1 retrotransposons: modulators of quantity and quality of mammalian gene expression? Bioessays 27:775–784PubMedView ArticleGoogle Scholar
- Han K, Sen SK, Wang J, Callinan PA, Lee J, Cordaux R, Liang P, Batzer MA (2005) Genomic rearrangements by LINE-1 insertion-mediated deletion in the human and chimpanzee lineages. Nucleic Acids Res 33:4040–4052PubMedPubMed CentralView ArticleGoogle Scholar
- Hedges DJ, Batzer MA (2005) From the margins of the genome: mobile elements shape primate evolution. Bioessays 27:785–794PubMedView ArticleGoogle Scholar
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J (2005) Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110:462–467PubMedView ArticleGoogle Scholar
- Kazazian HH Jr (2004) Mobile elements: drivers of genome evolution. Science 303:1626–1632PubMedView ArticleGoogle Scholar
- Kutsche K, Ressler B, Katzera HG, Orth U, Gillessen-Kaesbach G, Morlot S, Schwinger E, Gal A (2002) Characterization of breakpoint sequences of five rearrangements in L1CAM and ABCD1 (ALD) genes. Hum Mutat 19:526–535PubMedView ArticleGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, Szustakowki J, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, International human genome sequencing consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921PubMedView ArticleGoogle Scholar
- Mine M, Chen JM, Brivet M, Desguerre I, Marchant D, de Lonlay P, Bernard A, Férec C, Abitbol M, Ricquier D, Marsac C (2007) A large genomic deletion in the PDHX gene caused by the retrotranspositional insertion of a full-length LINE-1 element. Hum Mutat 28:137–142PubMedView ArticleGoogle Scholar
- Murphy MH, Baralle FE (1983) Directed semisynthetic point mutational analysis of an RNA polymerase III promoter. Nucleic Acids Res 11:7695–7700PubMedPubMed CentralView ArticleGoogle Scholar
- Myers JS, Vincent BJ, Udall H, Watkins WS, Morrish TA, Kilroy GE, Swergold GD, Henke J, Henke L, Moran JV, Jorde LB, Batzer MA (2002) A comprehensive analysis of recently integrated human Ta L1 elements. Am J Hum Genet 71:312–326PubMedPubMed CentralView ArticleGoogle Scholar
- Ostertag EM, Kazazian HH Jr (2001a) Biology of mammalian L1 retrotransposons. Annu Rev Genet 35:501–538View ArticleGoogle Scholar
- Ostertag EM, Kazazian HH Jr (2001b) Twin priming: a proposed mechanism for the creation of inversions in L1 retrotransposition. Genome Res 11:2059–2065View ArticleGoogle Scholar
- Pavlicek A, Paces J, Zika R, Hejnar J (2002) Length distribution of long interspersed nucleotide elements (LINEs) and processed pseudogenes of human endogenous retroviruses: implications for retrotransposition and pseudogene detection. Gene 300:189–194PubMedView ArticleGoogle Scholar
- Shankar R, Grover D, Brahmachari SK, Mukerji M (2004) Evolution and distribution of RNA polymerase II regulatory sites from RNA polymerase III dependant mobile Alu elements. BMC Evol Biol 4:37PubMedPubMed CentralView ArticleGoogle Scholar
- Su LK, Steinbach G, Sawyer JC, Hindi M, Ward PA, Lynch PM (2000) Genomic rearrangements of the APC tumor-suppressor gene in familial adenomatous polyposis. Hum Genet 106:101–107PubMedView ArticleGoogle Scholar
- Symer DE, Connelly C, Szak ST, Caputo EM, Cost GJ, Parmigiani G, Boeke JD (2002) Human L1 retrotransposition is associated with genetic instability in vivo. Cell 110:327–338PubMedView ArticleGoogle Scholar
- Szak ST, Pickeral OK, Makalowski W, Boguski MS, Landsman D, Boeke JD (2002) Molecular archeology of L1 insertions in the human genome. Genome Biol 3(10):research0052Google Scholar
- Wang J, Song L, Grover D, Azrak S, Batzer MA, Liang P (2006) dbRIP: a highly integrated database of retrotransposon insertion polymorphisms in humans. Hum Mutat 27:323–329PubMedPubMed CentralView ArticleGoogle Scholar
- Zingler N, Willhoeft U, Brose HP, Schoder V, Jahns T, Hanschmann KM, Morrish TA, Lower J, Schumann GG (2005) Analysis of 5′ junctions of human LINE-1 and Alu retrotransposons suggests an alternative model for 5′-end attachment requiring microhomology-mediated end-joining. Genome Res 15:780–789PubMedPubMed CentralView ArticleGoogle Scholar