Email: [email protected] 1000 Genomes Project (1000GP) represents the most comprehensive world-wide nucleotide variation data set so far in humans, providing the sequencing and analysis of 2504 genomes from 26 populations and reporting 84 million variants.The availability of this sequence data provides the human lineage with an invaluable resource for population genomics studies, allowing the testing of molecular population genetics hypotheses and eventually the understanding of the evolutionary dynamics of genetic variation in human might overlap non-accessible nucleotides, although these positions were discarded for the population genomics analyses) to focus on broader scale patterns of diversity across the genome.

Efficient and reliable parameter estimates have been computed using a novel pipeline that faces the unique features and limitations of the 1000GP data, and include a battery of nucleotide variation measures, divergence and linkage disequilibrium parameters, as well as different tests of neutrality, estimated in non-overlapping windows along the chromosomes and in annotated genes for all 26 populations of the 1000GP. Soon after the elucidation of the entire human genome (1–3), the description of genetic variation in human populations and the identification of those variants that affect health and disease became the next challenges of genomics research (4).To compute divergence metrics and neutrality tests based on the comparison of polymorphism and divergence, we added differences between humans and chimpanzees to the VCF files, as identified from a precomputed hg19 = pan Tro4 alignment obtained from the VISTA browser (23) in multi-FASTA format (MFA).Specifically, the pairwise alignment was converted to VCF using custom scripts and merged with the 1000GP VCF files using (24), based on a total of 104 246 informative meioses from six recent studies of human pedigrees.The signature of long-range haplotypes persists for a relatively short period of time ( details the differences between the two databases.

We have designed and implemented a custom pipeline (Figure 1) facing the unique features and limitations of the 1000GP Phase III data (15).We want to warn the user that four of the analyzed populations present admixture (corresponding to the Admixed American metapopulation), so special care should be taken while interpreting Pop Human results in those cases.


