the National Natural Science Foundation of China(31822052,31572381)
This work was supported by the National Natural Science Foundation of China (31822052 and 31572381) to Y.J and the Science & Technology Support Program of Sichuan (2016NYZ0042 and 2017NZDZX0002) to M.Z.L. We thank the High Performance Computing platform of Northwest A&F University for their assistance with the computing.
The author(s) declare that they have no conflict of interest.
SUPPORTING INFORMATION The supporting information is available online at
[1] Ai H., Fang X., Yang B., Huang Z., Chen H., Mao L., Zhang F., Zhang L., Cui L., He W., et al. Adaptation and possible ancient interspecies introgression in pigs identified by whole-genome sequencing. Nat Genet, 2015, 47: 217-225 CrossRef PubMed Google Scholar
[2] Arumemi F., Bayles I., Paul J., Milcarek C.. Shared and discrete interacting partners of ELL1 and ELL2 by yeast two-hybrid assay. ABB, 2013, 04: 774-780 CrossRef Google Scholar
[3]
Blanco, E., Parra, G., and Guigo, R. (2007). Using geneid to identify genes. Curr Protoc Bioinformatics
[4] Burge C.B., Karlin S.. Finding the genes in genomic DNA. Curr Opin Struct Biol, 1998, 8: 346-354 CrossRef Google Scholar
[5] Camacho C., Coulouris G., Avagyan V., Ma N., Papadopoulos J., Bealer K., Madden T.L.. BLAST+: architecture and applications. BMC BioInf, 2009, 10: 421 CrossRef PubMed Google Scholar
[6] Casper J., Zweig A.S., Villarreal C., Tyner C., Speir M.L., Rosenbloom K.R., Raney B.J., Lee C.M., Lee B.T., Karolchik D., et al. OUP accepted manuscript. Nucleic Acids Res, 2017, CrossRef PubMed Google Scholar
[7] Christopoulos A., Ligoudistianou C., Bethanis P., Gazouli M.. Successful use of adipose-derived mesenchymal stem cells to correct a male breast affected by Poland Syndrome: a case report. J Surg Case Rep, 2018, 2018(7): rjy151 CrossRef PubMed Google Scholar
[8] Dixon J.R., Selvaraj S., Yue F., Kim A., Li Y., Shen Y., Hu M., Liu J.S., Ren B.. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature, 2012, 485: 376-380 CrossRef PubMed ADS Google Scholar
[9] Doerks T., Copley R.R., Schultz J., Ponting C.P., Bork P.. Systematic identification of novel protein domain families associated with nuclear functions. Genome Res, 2002, 12: 47-56 CrossRef PubMed Google Scholar
[10] Dong P., Tu X., Chu P.Y., Lü P., Zhu N., Grierson D., Du B., Li P., Zhong S.. 3D chromatin architecture of large plant genomes determined by local A/B compartments. Mol Plant, 2017, 10: 1497-1509 CrossRef PubMed Google Scholar
[11] Durand N.C., Shamim M.S., Machol I., Rao S.S.P., Huntley M.H., Lander E.S., Aiden E.L.. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst, 2016, 3: 95-98 CrossRef PubMed Google Scholar
[12] Fang X., Mou Y., Huang Z., Li Y., Han L., Zhang Y., Feng Y., Chen Y., Jiang X., Zhao W., et al. The sequence and analysis of a Chinese pig genome. Gigascience, 2012, 1: 16 CrossRef PubMed Google Scholar
[13] Frantz L.A.F., Schraiber J.G., Madsen O., Megens H.J., Cagan A., Bosse M., Paudel Y., Crooijmans R.P.M.A., Larson G., Groenen M.A.M.. Evidence of long-term gene flow and selection during domestication from analyses of Eurasian wild and domestic pig genomes. Nat Genet, 2015, 47: 1141-1148 CrossRef PubMed Google Scholar
[14] Frazee A.C., Pertea G., Jaffe A.E., Langmead B., Salzberg S.L., Leek J.T.. Ballgown bridges the gap between transcriptome assembly and expression analysis. Nat Biotechnol, 2015, 33: 243-246 CrossRef PubMed Google Scholar
[15]
Golicz
A.A.,
Bayer
P.E.,
Barker
G.C.,
Edger
P.P.,
Kim
H.R.,
Martinez
P.A.,
Chan
C.K.K.,
Severn-Ellis
A.,
McCombie
W.R.,
Parkin
I.A.P., et al.
The pangenome of an agronomically important crop plant
[16]
Gordon
S.P.,
Contreras-Moreira
B.,
Woods
D.P.,
Des Marais
D.L.,
Burgess
D.,
Shu
S.,
Stritt
C.,
Roulin
A.C.,
Schackwitz
W.,
Tyler
L., et al.
Extensive gene content variation in the
[17] Groenen M.A.M., Archibald A.L., Uenishi H., Tuggle C.K., Takeuchi Y., Rothschild M.F., Rogel-Gaillard C., Park C., Milan D., Megens H.J., et al. Analyses of pig genomes provide insight into porcine demography and evolution. Nature, 2012, 491: 393-398 CrossRef PubMed ADS Google Scholar
[18] Guirao-Rico S., Ramirez O., Ojeda A., Amills M., Ramos-Onsins S.E.. Porcine Y-chromosome variation is consistent with the occurrence of paternal gene flow from non-Asian to Asian populations. Heredity, 2018, 120: 63-76 CrossRef PubMed Google Scholar
[19] Hirsch C.N., Foerster J.M., Johnson J.M., Sekhon R.S., Muttoni G., Vaillancourt B., Peñagaricano F., Lindquist E., Pedraza M.A., Barry K., et al. Insights into the maize pan-genome and pan-transcriptome. Plant Cell, 2014, 26: 121-135 CrossRef PubMed Google Scholar
[20] Jeong H., Song K.D., Seo M., Caetano-Anollés K., Kim J., Kwak W., Oh J.D., Kim E.S., Jeong D.K., Cho S., et al. Exploring evidence of positive selection reveals genetic basis of meat quality traits in Berkshire pigs through whole genome sequencing. BMC Genet, 2015, 16: 104 CrossRef PubMed Google Scholar
[21] Kent W.J.. BLAT—The BLAST-like alignment tool. Genome Res, 2002, 12: 656-664 CrossRef PubMed Google Scholar
[22] Kim D., Langmead B., Salzberg S.L.. HISAT: a fast spliced aligner with low memory requirements. Nat Methods, 2015, 12: 357-360 CrossRef PubMed Google Scholar
[23] Knight P.A., Ruiz D.. A fast algorithm for matrix balancing. IMA J Numer Anal, 2013, 33: 1029-1047 CrossRef Google Scholar
[24] Kumar S., Stecher G., Tamura K.. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol, 2016, 33: 1870-1874 CrossRef PubMed Google Scholar
[25] Larson G., Dobney K., Albarella U., Fang M., Matisoo-Smith E., Robins J., Lowden S., Finlayson H., Brand T., Willerslev E., et al. Worldwide phylogeography of wild boar reveals multiple centers of pig domestication. Science, 2005, 307: 1618-1621 CrossRef PubMed ADS Google Scholar
[26] Leung D., Jung I., Rajagopal N., Schmitt A., Selvaraj S., Lee A.Y., Yen C.A., Lin S., Lin Y., Qiu Y., et al. Integrative analysis of haplotype-resolved epigenomes across human tissues. Nature, 2015, 518: 350-354 CrossRef PubMed ADS Google Scholar
[27] Li H., Durbin R.. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 2009, 25: 1754-1760 CrossRef PubMed Google Scholar
[28] Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R., Durbin R.. The sequence alignment/map format and SAMtools. Bioinformatics, 2009, 25: 2078-2079 CrossRef PubMed Google Scholar
[29]
Li
M.,
Chen
L.,
Tian
S.,
Lin
Y.,
Tang
Q.,
Zhou
X.,
Li
D.,
Yeung
C.K.L.,
Che
T.,
Jin
L., et al.
Comprehensive variation discovery and recovery of missing sequence in the pig genome using multiple
[30] Li M., Tian S., Jin L., Zhou G., Li Y., Zhang Y., Wang T., Yeung C.K.L., Chen L., Ma J., et al. Genomic analyses identify distinct patterns of selection in domesticated pigs and Tibetan wild boars. Nat Genet, 2013, 45: 1431-1438 CrossRef PubMed Google Scholar
[31] Li R., Li Y., Zheng H., Luo R., Zhu H., Li Q., Qian W., Ren Y., Tian G., Li J., et al. Building the sequence map of the human pan-genome. Nat Biotechnol, 2010, 28: 57-63 CrossRef PubMed Google Scholar
[32]
Li
Y.,
Zhou
G.,
Ma
J.,
Jiang
W.,
Jin
L.,
Zhang
Z.,
Guo
Y.,
Zhang
J.,
Sui
Y.,
Zheng
L., et al.
[33] Lieberman-Aiden E., van Berkum N.L., Williams L., Imakaev M., Ragoczy T., Telling A., Amit I., Lajoie B.R., Sabo P.J., Dorschner M.O., et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 2009, 326: 289-293 CrossRef PubMed ADS Google Scholar
[34] McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., Garimella K., Altshuler D., Gabriel S., Daly M., et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res, 2010, 20: 1297-1303 CrossRef PubMed Google Scholar
[35]
Monat
C.,
Pera
B.,
Ndjiondjop
M.N.,
Sow
M.,
Tranchant-Dubreuil
C.,
Bastianelli
L.,
Ghesquière
A.,
Sabot
F..
[36] Morgulis A., Gertz E.M., Schäffer A.A., Agarwala R.. WindowMasker: window-based masker for sequenced genomes. Bioinformatics, 2006, 22: 134-141 CrossRef PubMed Google Scholar
[37]
Neafsey
D.E.,
Waterhouse
R.M.,
Abai
M.R.,
Aganezov
S.S.,
Alekseyev
M.A.,
Allen
J.E.,
Amon
J.,
Arcà
B.,
Arensburger
P.,
Artemov
G., et al.
Highly evolvable malaria vectors: The genomes of 16
[38] Pertea M., Pertea G.M., Antonescu C.M., Chang T.C., Mendell J.T., Salzberg S.L.. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol, 2015, 33: 290-295 CrossRef PubMed Google Scholar
[39] Rao S.S.P., Huntley M.H., Durand N.C., Stamenova E.K., Bochkov I.D., Robinson J.T., Sanborn A.L., Machol I., Omer A.D., Lander E.S., et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell, 2014, 159: 1665-1680 CrossRef PubMed Google Scholar
[40] Ron G., Globerson Y., Moran D., Kaplan T.. Promoter-enhancer interactions identified from Hi-C data using probabilistic models and hierarchical topological domains. Nat Commun, 2017, 8: 2237 CrossRef PubMed ADS Google Scholar
[41]
Schatz
M.C.,
Maron
L.G.,
Stein
J.C.,
Hernandez Wences
A.,
Gurtowski
J.,
Biggers
E.,
Lee
H.,
Kramer
M.,
Antoniou
E.,
Ghiban
E., et al.
Whole genome
[42] Shen W., Le S., Li Y., Hu F.. SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS ONE, 2016, 11: e0163962 CrossRef PubMed ADS Google Scholar
[43] Sherman R.M., Forman J., Antonescu V., Puiu D., Daya M., Rafaels N., Boorgula M.P., Chavan S., Vergara C., Ortega V.E., et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat Genet, 2019, 51: 30-35 CrossRef PubMed Google Scholar
[44] Stanke M., Keller O., Gunduz I., Hayes A., Waack S., Morgenstern B.. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res, 2006, 34: W435-W439 CrossRef PubMed Google Scholar
[45] Sun C., Hu Z., Zheng T., Lu K., Zhao Y., Wang W., Shi J., Wang C., Lu J., Zhang D., et al. RPAN: rice pan-genome browser for ∼3000 rice genomes. Nucleic Acids Res, 2017, 45: 597-605 CrossRef PubMed Google Scholar
[46] Uyama T., Ichi I., Kono N., Inoue A., Tsuboi K., Jin X.H., Araki N., Aoki J., Arai H., Ueda N.. Regulation of peroxisomal lipid metabolism by catalytic activity of tumor suppressor H-rev107. J Biol Chem, 2012, 287: 2706-2718 CrossRef PubMed Google Scholar
[47] Vaccari C.M., Romanini M.V., Musante I., Tassano E., Gimelli S., Divizia M.T., Torre M., Morovic C.G., Lerone M., Ravazzolo R., et al. De novo deletion of chromosome 11q12.3 in monozygotic twins affected by Poland Syndrome. BMC Med Genet, 2014, 15: 63 CrossRef PubMed Google Scholar
[48] Wang X., Zheng Z., Cai Y., Chen T., Li C., Fu W., Jiang Y.. CNVcaller: highly efficient and widely applicable software for detecting copy number variations in large populations. GigaScience, 2017, 6 CrossRef PubMed Google Scholar
[49]
Wong
K.H.Y.,
Levy-Sakin
M.,
Kwok
P.Y..
[50] Xiao S., Xie D., Cao X., Yu P., Xing X., Chen C.C., Musselman M., Xie M., West F.D., Lewin H.A., et al. Comparative epigenomic annotation of regulatory DNA. Cell, 2012, 149: 1381-1392 CrossRef PubMed Google Scholar
[51] Xie C., Mao X., Huang J., Ding Y., Wu J., Dong S., Kong L., Gao G., Li C.Y., Wei L.. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res, 2011, 39: W316-W322 CrossRef PubMed Google Scholar
[52] Yan G., Zhang G., Fang X., Zhang Y., Li C., Ling F., Cooper D.N., Li Q., Li Y., van Gool A.J., et al. Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques. Nat Biotechnol, 2011, 29: 1019-1023 CrossRef PubMed Google Scholar
[53] Zhang Y., Liu T., Meyer C.A., Eeckhoute J., Johnson D.S., Bernstein B.E., Nussbaum C., Myers R.M., Brown M., Li W., et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol, 2008, 9: R137 CrossRef PubMed Google Scholar
[54] Zhao Q., Feng Q., Lu H., Li Y., Wang A., Tian Q., Zhan Q., Lu Y., Zhang L., Huang T., et al. Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat Genet, 2018, 50: 278-284 CrossRef PubMed Google Scholar
Figure 1
Construction of the pig pan-genome and the characterization of pan-sequences. A,
Figure 2
Pan-sequences validation and population-specific pattern. A, Homologue identification of pan-sequences in 10 mammalian genomes. Only the best hit was retained for each pan-sequence. B, An 87×87 matrix showing the number of shared pan-sequences among all the individuals by pairs. Each cell represents the number of shared pan-sequences by two individuals. See Table S3 in Supporting Information for the classification of each group. C, Genes contained in the pan-sequences. One pan-sequence of
Figure 3
The 3D spatial structure of the pan-genome. A, The distributions of the A/B compartment, TAD and anchored pan-sequences. B, The relative length-proportion of the A compartment over the B compartment in the pig genome (left) and the relative length-proportion of pan-sequences located in the A compartment over those located in the B compartment (right). C, The relative length-proportion of TAD boundary regions over TAD interior regions in Sscrofa11.1 (left) and the relative length-proportion of pan-sequences located in TAD boundary regions over TAD interior regions (right). D, An example of improving a 3D spatial structure after replacing the weakly interacting sequences with the non-reference pan-sequences. The interaction of pan-sequences with flanking sequences was supported by more read contacts than the original interaction of the counterparts in the genome with the flanking sequences.
Figure 4
Improvements of genomic analyses by using the pan-genome. A, Comparison of the mapping ratio of resequencing data using the pan-genome versus Sscrofa11.1. B, Comparison of read-mapping quality using the pan-genome versus Sscrofa11.1. C, Comparison of corrected read-mapping depth using the pan-genome versus Sscrofa11.1. D, Improved read mapping using the pan-genome versus Sscrofa11.1 as viewed with IGV.
Figure 5
The processing pipeline used to construct the PIGPAN database. PIGPAN integrated genomics, transcriptomics and regulatory data. Users can search for a gene symbol or a genomic region to obtain results in the form of an interactive table and graph. A, The system diagram of PIGPAN. B, The 17 tracks released against the pig pan-genome in our local UCSC Genome Browser server. C, One case showing the copy number difference of the