gargammel
sequence simulator for ancient DNA
About
gargammel is a pipeline to simulate ancient DNA (aDNA) from a set of known references. gargammel seeks to emulate the in vivo process that leads to the sequencing of aDNA fragments:
- First, fragments are collected from a set of reference sequences. These reference sequences are designed to represent the endogenous DNA, the contamination both from the same species (e.g. present-day humans), and/or microbes.
- The patterns of mis-incorporations of ancient DNA fragments can be also accounted for.
- Second, aDNA damage is added to those fragments. Sequencing adapters are appended to form reads of a specific length.
- Finally, sequencing errors with corresponding quality scores are applied to the reads.
The resulting data is a set of Illumina reads that can be used to test certain hypotheses about aDNA. Here is a potential set of questions that could be answered using our pipeline:
- Impact of present-day human contamination on various statistics used in population genetics
- Influence of high levels of deamination on mapping
- Impact of a high number of microbial sequences on alignment to a specific reference
- Ability to infer the metagenomic profile of a sample depending on the aDNA fragment length distribution
- Accuracy of contamination estimates
To simulate Illumina sequencing errors, we use the ART package.
gargammel has been developed in Ludovic Orlando's and Eske Willerslev's research groups at the Center for GeoGenetics at the University of Copenhagen. The code was implemented by Gabriel Renaud in collaboration with Kristian Hanghoej.
News
- February 28, 2021: We can now handle circular references.
- November 21, 2016: gargammel is published in Bioinformatics!
Download
You can either type:
git clone --depth 1 https://github.com/grenaud/gargammel.git
Or click on the tar/zip links at the top of the page.
Submodules
gargammel is composed of the following subprograms:
- fragSim: simulation of DNA fragmentation due to degradation. Ability to simulate the DNA composition at the 5'/3' ends of the fragments
- deamSim: program to add in silico damage (or deamination) to the DNA fragments
- apdtSim: module to transform the fragments (damaged or not) into raw Illumina reads
The 3 programs were written in C++. There is driver script in Perl that automates the process and simulates the in vivo process that generates aDNA fragments and calls ART to add sequencing errors.
Documentation, requirements and examples of usage
For installation, usage and other questions please refer to the README.
Citing
Please cite our paper:
Renaud, G., Hanghoej, K., Willerslev, E. & Orlando, L. (2016). gargammel: a sequence simulator for ancient DNA Bioinformatics, btw670
Acknowledgments
This work was supported by the Danish Council for Independent Research, Natural Sciences (4002-00152B); the Danish National Research Foundation (DNRF94); the Villum Fonden (miGENEPI), and; Initiative d'Excellence Chaires d’attractivité, Université de Toulouse (OURASI).
Support/Feature request/Push requests
Please contact Gabriel Renaud (@grenaud) for further information: