A whole genome analyses of genetic variants in two Kelantan Malay individuals
- Wan Khairunnisa Wan Juhari1,
- Nur Aida Md Tamrin2,
- Mohd Hanif Ridzuan Mat Daud2,
- Hatin Wan Isa9,
- Nurfazreen Mohd Nasir9,
- Sathiya Maran9,
- Nur Shafawati Abdul Rajab9,
- Khairul Bariah Ahmad Amin Noordin3,
- Nik Norliza Nik Hassan4,
- Rick Tearle5,
- Rozaimi Razali6,
- Amir Feisal Merican7, 8 and
- Bin Alwi Zilfalil1Email author
© Wan Juhari et al.; licensee Springer. 2014
Received: 4 May 2014
Accepted: 19 September 2014
Published: 21 October 2014
The sequencing of two members of the Royal Kelantan Malay family genomes will provide insights on the Kelantan Malay whole genome sequences. The two Kelantan Malay genomes were analyzed for the SNP markers associated with thalassemia and Helicobacter pylori infection. Helicobacter pylori infection was reported to be low prevalence in the north-east as compared to the west coast of the Peninsular Malaysia and beta-thalassemia was known to be one of the most common inherited and genetic disorder in Malaysia.
By combining SNP information from literatures, GWAS study and NCBI ClinVar, 18 unique SNPs were selected for further analysis. From these 18 SNPs, 10 SNPs came from previous study of Helicobacter pylori infection among Malay patients, 6 SNPs were from NCBI ClinVar and 2 SNPs from GWAS studies. The analysis reveals that both Royal Kelantan Malay genomes shared all the 10 SNPs identified by Maran (Single Nucleotide Polymorphims (SNPs) genotypic profiling of Malay patients with and without Helicobacter pylori infection in Kelantan, 2011) and one SNP from GWAS study. In addition, the analysis also reveals that both Royal Kelantan Malay genomes shared 3 SNP markers; HBG1 (rs1061234), HBB (rs1609812) and BCL11A (rs766432) where all three markers were associated with beta-thalassemia.
Our findings suggest that the Royal Kelantan Malays carry the SNPs which are associated with protection to Helicobacter pylori infection. In addition they also carry SNPs which are associated with beta-thalassemia. These findings are in line with the findings by other researchers who conducted studies on thalassemia and Helicobacter pylori infection in the non-royal Malay population.
KeywordsMalays Single nucleotide polymorphisms Helicobacter pylori Beta-thalassemia Whole genome analysis
The Malays (Melayu) are found across wide areas in the world including Peninsular Malaysia, Borneo (Sabah & Sarawak), Indonesia, Singapore, Brunei, Thailand, Sri Lanka and some of the Cape Malay community in Cape Town, Africa. Among these countries, Malaysia and Brunei formed the majority of the Malay population. In Peninsular Malaysia, the Malays consist of several sub-ethnics groups originating from different ancestral lineages based on their migration many years ago (Paul, ). The Malays in Peninsular Malaysia comprises the western Malays (Melayu Minang), southern Malays (Melayu Jawa and Melayu Bugis) and northern Malays (Melayu Kelantan and Melayu Kedah) according to their settlements in the Peninsular Malaysia.
The SNP genotyping data previously reported by Hatin et al. () has placed the Kelantan Malay or Melayu Kelantan as an outlier to the other Malay ethnic groups in the Peninsular Malaysia. Kelantan Malay (Melayu Kelantan) has an ancestry that is more divergent than other Malay populations due to their historical links and geographical location at the northern part of the Peninsular Malaysia. Meanwhile, other Malays from the western and southern regions of the Peninsular Malaysia have more historical and cultural links with respective populations from the Indonesian archipelago, whereas Kelantan Malay (Melayu Kelantan) shows limited links with these populations (Hatin et al., ). The uniqueness shown by the Kelantan Malay (Melayu Kelantan) has sparked interests in understanding the Malaysian population in Malaysia in particular the Malay sub-ethnic groups.
The sequencing of two members of the Royal Kelantan family genomes will provide insights on the Kelantan Malay (Melayu Kelantan) whole genome sequences. The two Royal family members are the descendents of the Sultan Muhammad IV, the ruler of the Kelantan state from 1911 to 1920. The sequence will allow the construction of a Kelantan Malay (Melayu Kelantan) reference genome, the identification of variants specific to the Kelantan Malay (Melayu Kelantan) ethnic group and the positioning of Kelantan Malay (Melayu Kelantan) in the broad prehistory of both the Malay Peninsula and the Southeast Asia in general.
The whole genome sequence of the Royal Kelantan Malay individuals should reveal the genetic variants associated with Helicobacter pylori infection and thalassemia in Kelantan Malay (Melayu Kelantan) population. Although the Malays are the predominant ethnic groups in Malaysia, a study has reported the low prevalence of Helicobacter pylori infection in the Malays compared to the Chinese and the Indians (Goh, ). The high prevalence of Helicobacter pylori infection in the Chinese and the Indians might be due to the high prevalence from the country they originated from, which are the Southern China and the Southern India (Goh, ). The Helicobacter pylori prevalence also varied from a low range of 26.4% in Kota Bharu in the north east of the Peninsular Malaysia to a substantially high of 55.0% in Kota Kinabalu in the Sabah state of Borneo (Goh & Parasakthi, ). These findings were further supported by the prevalence of Helicobacter pylori infection which was reported to be low prevalence in the north-east as compared to the west coast of the Peninsular Malaysia (Goh & Parasakthi, ; Uyub et al., ). Due to the history of immigration thousand years ago, there are few implications related to the Malays Helicobacter pylori prevalence as the Malays isolates shared the same origin as the Indian isolates (Tay et al., ). Previous genome wide association study conducted by our group (Maran, ) revealed the presence of protective SNPs contributed to the low prevalence of Helicobacter pylori infection in Kelantan.
The sequence data was aligned to the NCBI human reference genome (build 37) with an average coverage depth of 40 fold. As both the Royal Kelantan Malay are first cousins, they are expected to share 1/32 of their genomes. In our result we found that 1/30 (93.6 Mb of 2,830 Mb) of the sequence data generated showed similarity for both genomes indicating consistency of our result with their relationship as first cousin.
Summary statistics of SNPs
Known homozygous SNPs
Novel homozygous SNPs
Known heterozygous SNPs
Novel heterozygous SNPs
Individual genome comparison
We compared the SNPs present in both K1 and K2 genomes against the NCBI dbSNP database (build 132). The whole genome sequence data of K1 and K2 were transformed into virtual Affymetrix GeneChip Human Mapping 50k Xba 1 Array to allow the comparison of SNP calling with the genotype data of the four Malay sub-ethnic groups in the peninsular Malaysia (Melayu Kelantan, Melayu Minang, Melayu Jawa and Melayu Bugis) (as reported by Hatin et al. ). Based on the genotyping calls, about 54,794 autosomal SNPs were identified to be similar between the Royal Kelantan Malay genomes and the Kelantan Malay (Melayu Kelantan).
We compared the Royal Kelantan Malay genomes with two other individual genomes, the Han Chinese and the South Asian Indian female genome that have been sequenced (Gupta et al., ; Wang et al., ). The basis of this comparison was to look for any admixture between the genomes as reported in the previous study, where the distinct genetic difference of the Malays was possibly due to the admixture between the Kelantan Malay (Melayu Kelantan) with other Indian populations (Hatin et al., ). The admixture could possibly occur by Indians who migrated from India in second century AD (Hatin et al. ).
The Royal Kelantan genomes were compared with two genomes, the Han Chinese and the South Asian Indian Female (SAIF) from the published personal genomes. The SNPs level comparisons using dbSNP (build 132) showed that this individual shared 98% SNP with Han Chinese and 95% with SAIF. This most probably indicates the Royal Kelantan subjects may have ancestary link with Han Chinese and SAIF.
The whole genome individual sequencing identified an average of over 3 million SNPs per individual. Many studies of other whole genome samples from the Han Chinese (Wang et al., ), the South Asian Indian Female (Gupta et al., ), the SJK-Korean (Ahn et al., ) and the Southeast Asia Malays in Singapore (Wong et al., ) have reported the discovery of a consistent number of SNPs. The analysis of the K1 and K2 individual genomes has also revealed a level of well-attested variation similar to other non-African populations. K1 carried 3,946,306 while K2 3,906,477 of such variants, compared to the human reference genome, with the average range of 3,956,074 ± 39,778 variants to a control group of the Caucasian and the Asian genomes based on the Complete Genomics public data. It was found that the overlapping variation between the two Royal Kelantan Malay individuals, K1 and K2 were about 2,542,089 (19.3%) and the unique variants of K1 and K2 were about 1,404,217 (10.7%) and 1,364,388 (10.4%), respectively.
Of the variants shared by both Kelantan Royal genomes, the overlapping variants of both individual genomes are shared with the Asian and the African genomes. With the availability of the Asian genomes for comparison, the Han Chinese and the South Asian Indian, the positioning of Kelantan Malay (Melayu Kelantan) genomes were able to be determined.
Royal Kelantan Malay genomes associated with Helicobacter pylori infection
The two Royal Kelantan genomes were analyzed for the SNP markers associated with Helicobacter pylori infection. By combining the SNP information from literatures, GWAS study and NCBI's ClinVar database, 18 unique SNPs were selected for further analysis. From these 18 SNPs, 10 SNPs came from the previous study of Helicobacter pylori infection among the Malay patients (Maran, ), 6 SNPs were from NCBI's ClinVar database (Landrum et al., ) and 2 SNPs from the genome wide association studies (GWAS).
List of all SNPs associated with H. pylori infection in the Royal Kelantan Malay genome
Study (Maran, )
Study (Maran, )
Study (Maran, )
Study (Maran, )
Study (Maran, )
Study (Maran, )
Study (Maran, )
Study (Maran, )
Study (Maran, )
Study (Maran, )
GWAS (Mayerle et al. )
Interestingly, by using the whole genome sequencing approach, we were able to identify in the two uninfected Royal Kelantan Malay individuals similar SNPs that were observed to be protective towards Helicobacter pylori infection in the non-royal Kelantan Malay individuals. The genetic variants that were previously studied by Lee et al. () and Maran et al. () most probably were responsible for the protection against Helicobacter pylori infection in the two Royal Kelantan Malay individuals. The findings of these SNPs in the two individuals have thus provided credence to our proposal that the genomes of this two Royal Kelantan Malay individuals be the reference genome sequence for Kelantan Malay (Melayu Kelantan).
Royal Melayu Kelantan genome associated with thalassemia
The two Royal Melayu Kelantan genomes were also analyzed for SNP markers associated with thalassemia as thalassemia is a public health problem which is also an inherited disease. It is common in the Malays with 5% carrier rate (George, ). 231 SNPs were selected for the analysis. 228 SNPs came from NCBI's ClinVar database (Landrum et al., ) and 3 SNPs from the GWAS studies.
The analysis revealed that both Royal Kelantan Malays (Melayu Kelantan) genomes shared 3 SNP markers, where all three markers were associated with beta-thalassemia. The SNPs implicated in the disease, rs1061234, rs1609812 and rs766432 were identified in the HBG1, HBB and BCL11A genes, respectively. BCL11A functions as a myeloid and B-cell proto-oncogene and plays important roles in leukemogenesis and hematopoiesis. An essential factor in lymphopoiesis is required for B-cell formation in fetal liver and may function as a modulator of the transcriptional repression activity of ARP1. It is expressed at high levels in brain, spleen thymus, bone marrow and testis. In addition, it is expressed in CD34-positive myeloid precursor cells, B-cells, monocytes and megakaryocytes. Its expression is tightly regulated during the B-cell development. HBB is involved in oxygen transport from the lung to the various peripheral tissues and as an endogenous inhibitor of enkephalin-degrading enzymes such as DPP3, and also as a selective antagonist of the P2RX3 receptor involved in pain signaling, where these properties implicate it as a regulator of pain and inflammation. HBB is known as a gene associated with beta-thalassemia. The absence of beta chain causes beta (0) -thalassemia while reducing the amounts of detectable beta globin causes beta (+) -thalassemia. In the severe forms of beta-thalassemia, the excess alpha globin chains accumulate in the developing of erythroid precursors in the marrow. Their deposition leads to a vast increase in the erythroid apoptosis, which in turn causes ineffective erythropoiesis and severe microcytic hypochromic anemia. Clinically, beta-thalassemia is divided into thalassemia major which is transfusion dependent, thalassemia intermedia (of intermediate severity) and thalassemia minor that is asymptomatic.
Lastly, HBG1 is normally expressed in the fetal liver, spleen and bone marrow. Two gamma chains together with two alpha chains constitute of fetal hemoglobin (HbF), which is normally replaced by adult hemoglobin (HbA) at birth. With some beta-thalassemias and related conditions, gamma chain production continues into adulthood. The two types of gamma chains differ at residue 136, where glycine is found in the G-gamma product (HBG2) and alanine is found in the A-gamma product (HBG1). The former is predominant at birth.
Royal Melayu Kelantan genome and pharmacogenomics
Further analysis of the two Royal Melayu Kelantan genomes was performed by analyzing their pharmacogenomics properties. The association between the SNP and the drugs were identified based on the Pharmacogenomics Knowledge Base (PharmGKB) database (http://pharmgkb.org) (Whirl-Carrillo et al. . From over 3.5 million SNPs identified, over 1,200 variants were identified to either affect the toxicity or efficacy of numerous drugs that are available in the market. For example, the variation in the IGF2BP2 (rs4402960) has been reported to enhanced the effect of repaglinide treatment in type 2 diabetes patient in China (Huang et al., ).
Disease associated SNPs with their respective drugs
Osteoarthritis and acute pain
Purine-Pyrimidine Metabolisme, Inborn errors
Our findings suggest that the Royal Kelantan Malays carry the SNPs which are associated with protection to Helicobacter pylori infection. In addition they also carry SNPs which are associated with beta-thalassemia. These findings are in line with the findings by other researchers who conducted studies on thalassemia and Helicobacter pylori infection in the non-royal Malay population. This whole genome sequence of Royal Kelantan Malays provides a reference genome for the Kelantan Malays sub-ethnic group and will be useful to those conducting comparative and evolutionary population studies.
Sample collection, library construction and sequencing
Approval from the Universiti Sains Malaysia (USM) Research Ethics Committee was obtained and the written informed consent from the two subjects were taken. A total of 3 ml blood was taken from each of the selected male subjects of the Royal Kelantan family. The two royal individuals were denoted as K1 and K2. Both individuals were not known to have hereditary diseases based on the interview conducted on them. The genomic DNAs were extracted using QIAGEN (Germany) Blood Mini Kit with the final concentration of genomic DNA of more than 100 ng/ul.
The genomic DNAs obtained were used in the preparation of the whole genome libraries according to the Complete Genomics library preparation protocols (Complete Genomics Inc., USA). The sequence data were then obtained by performing two primary components of sequencing technology developed by Complete Genomics Inc., the DNA nanoball arrays, the DNB™ arrays and the combinatorial probe-anchor ligation reads, cPAL™ reads (Complete Genomics Inc., USA).
The SNPs associated with Helicobacter pylori were searched from the variant list of both individual genomes according to the list of dbSNP ID that corresponds to the SNP that was previously reported by (Maran, ). The functional impact of each SNP was then evaluated using the F-SNP database (http://compbio.cs.queensu.ca/F-SNP/). Searchers on the NCBI Single Nucleotide Polymorphism database (dbSNP) (Sherry et al. ), Online Mendelian Inheritance in Man (OMIM) (Amberger et al., ) and Genatlas (http://genatlas.medecine.univ-paris5.fr/) were performed to obtain information on the sequence variants.
Grant support : We would like to acknowledge the USM APEX Delivering Excellence 2012 : 1002/PPSP/910343 and the grant supported by the Ministry of Science and Technology (MOSTI) : 304/PPSP/6150113/K105. This research was also supported by the USM Short Term grant (DNA Profiling of the Kelantan, Kedah and Pattani Malays using SNPs Microarray; 304/PPSP/61311034) and USM Research University Team grant (1002/PPSP/853003).
- Ahn S-M, Kim T-H, Lee S, Kim D, Ghang H, Kim D-S, Kim B-C, Kim S-Y, Kim W-Y, Kim C: The first korean genome sequence and analysis: Full genome sequencing for a socio-ethnic group. Genome Res 2009, 19: 1622–1629. 10.1101/gr.092197.109View ArticlePubMedPubMed CentralGoogle Scholar
- Amberger J, Bocchini CA, Scott AF, Hamosh A: Mckusick's Online Mendelian Inheritance In Man (Omim®). Nucleic Acids Res 2009, 37(Suppl 1):D793-D796. 10.1093/nar/gkn665View ArticlePubMedPubMed CentralGoogle Scholar
- George E: HbE ?-Thalassaemia in Malaysia: Revisited. J Hematol Thromb Dis 2013, 1: 101.Google Scholar
- Goh K: Epidemiology of Helicobacter pylori infection in Malaysia-observations in a multiracial Asian population. Med J Malaysia 2009, 64(3):187.PubMedGoogle Scholar
- Goh K-L, Parasakthi N: The racial cohort phenomenon: seroepidemiology of helicobacter pylori infection in a multiracial south-east asian country. Eur J Gastroen Hepat 2001, 13: 177–183. 10.1097/00042737-200102000-00014View ArticleGoogle Scholar
- Gupta R, Ratan A, Rajesh C, Chen R, Kim HL, Burhans R, Miller W, Santhosh S, Davuluri RV, Butte AJ: Sequencing and analysis of a south Asian-Indian personal genome. BMC Genomics 2012, 13: 440. 10.1186/1471-2164-13-440View ArticlePubMedPubMed CentralGoogle Scholar
- Hatin WI, Zahri M-K, Xu S, Jin L, Tan S-G, Rizman-Idid M, Zilfalil BA: Population genetic structure of peninsular Malaysia Malay sub-ethnic groups. PloS One 2011., 6: 10.1371/journal.pone.0018312Google Scholar
- Huang Q, Yin JY, Dai XP, Pei Q, Dong M, Zhou ZG, Liu ZQ: IGF2BP2 variations influence repaglinide response and risk of type 2 diabetes in Chinese population. Acta Pharm Sin 2010, 31(6):709–717. 10.1038/aps.2010.47View ArticleGoogle Scholar
- Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR: ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 2014, 42(D1):D980-D985. 10.1093/nar/gkt1113View ArticlePubMedPubMed CentralGoogle Scholar
- Lee YY, Ismail AW, Mustaffa N, Musa KI, Majid NA, Choo KE, Mahendra Raj S, Derakhshan MH, Malaty HM, Graham DY: Sociocultural And Dietary Practices Among Malay Subjects in the North-Eastern Region of Peninsular Malaysia: A Region of Low Prevalence of Helicobacter Pylori Infection. Helicobacter 2012, 17: 54–61. 10.1111/j.1523-5378.2011.00917.xView ArticlePubMedGoogle Scholar
- Maran S: Single Nucleotide Polymorphims (SNPs) genotypic profiling of Malay patients with and without Helicobacter pylori infection in Kelantan. Dissertation, Universiti Sains Malaysia; 2011.Google Scholar
- Maran S, Lee YY, Xu SH, Raj MS, Abdul Majid N, Choo KE, Zilfalil BA, Graham DY: Towards understanding the low prevalence of Helicobacter pylori in Malays: Genetic variants among Helicobacter pylori ?negative ethnic Malays in the north?eastern region of Peninsular Malaysia and Han Chinese and South Indians. J Dig Dis 2013, 14: 196–202. 10.1111/1751-2980.12023View ArticlePubMedGoogle Scholar
- Mayerle J, Den Hoed CM, Schurmann C, Stolk L, Homuth G, Peters MJ, Kuipers E: Identification of genetic loci associated with Helicobacter pylori serologic statusgwas of H. pylori infection susceptibility. JAMA 2013, 309(18):1912–1920. 10.1001/jama.2013.4350View ArticlePubMedGoogle Scholar
- Myles S, Tang K, Somel M, Green RE, Kelso J, Stoneking M: Identification and analysis of genomic regions with large between?population differentiation in humans. Ann Hum Genet 2008, 72(1):99–110.PubMedGoogle Scholar
- Obase Y, Matsuse H, Shimoda T, Haahtela T, Kohno S: Pathogenesis and management of aspirin-intolerant asthma. Treat Respir Med 2005, 4(5):325–336. 10.2165/00151829-200504050-00004View ArticlePubMedGoogle Scholar
- Palikhe NS, Kim SH, Cho BY, Ye YM, Hur GY, Park HS: Association of three sets of high-affinity IgE receptor (FcepsilonR1) polymorphisms with aspirin-intolerant asthma. Resp Med 2008, 102(8):1132–1139. 10.1016/j.rmed.2008.03.017View ArticleGoogle Scholar
- Paul W: The Golden Khersonese : Studies In The Historical Geography Of The Malay Peninsula Before Ad 1500 Universiti Of Malaya Press. Lumpur, Kuala; 1961.Google Scholar
- Sherry ST, Ward M, Sirotkin K: dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res 1999, 9(8):677–679.PubMedGoogle Scholar
- Tay CY, Mitchell H, Dong Q, Goh K-L, Dawes IW, Lan R: Population structure of Helicobacter pylori among ethnic groups in Malaysia: recent acquisition of the bacterium by the Malay population. BMC Microbiol 2009, 9: 126. 10.1186/1471-2180-9-126View ArticlePubMedPubMed CentralGoogle Scholar
- Uyub A, Raj S, Visvanathan R, Nazim M, Aiyar S, Anuar A, Mansur M: Helicobacter pylori infection in North-eastern Peninsular Malaysia evidence for an unusually low prevalence. Scand J Gastroentero 1994, 29: 209–213. 10.3109/00365529409090465View ArticleGoogle Scholar
- Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, Fan W, Zhang J, Li J, Zhang J: The diploid genome sequence of an Asian individual. Nature 2008, 456: 60–65. 10.1038/nature07484View ArticlePubMedPubMed CentralGoogle Scholar
- Whirl-Carrillo M, McDonagh EM, Hebert JM, Gong L, Sangkuhl K, Thorn CF, Klein TE: Pharmacogenomics knowledge for personalized medicine. Clin Pharmacol Ther 2012, 92(4):414–417. 10.1038/clpt.2012.96View ArticlePubMedPubMed CentralGoogle Scholar
- Wong L-P, Ong RT-H, Poh W-T, Liu X, Chen P, Li R, Lam KK-Y, Pillai NE, Sim K-S, Xu H: Deep whole-genome sequencing of 100 southeast Asian Malays. Am J of Hum Genet 2013, 92(1):52–66. 10.1016/j.ajhg.2012.12.005View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.