BioBox: Collection of Tools for Sequence Analysis

Bernhard Haubold

This is a collection of programs for efficiently carrying out routine sequence analysis tasks under the UNIX command line. Each tool comes with its own documentation. Please drop me a line at last_name at evolbio.mpg.de in case you find an error in this software.

  1. cchar, v. 1.6: Count characters in sequence data.
  2. cpg, v. 0.7: Compute the CpG content of DNA sequences.
  3. cutSeq, v. 0.11: Cut regions from molecular sequences.
  4. generateQuerySbjct, v. 0.4: Generate pairs of homologous DNA sequences.
  5. gd, v. 0.12: Calculate genetic diversity (pi, S, and Tajima's D) from aligned DNA sequences with or without sliding window.
  6. getSeq, v. 0.4: Get specific sequences from a FASTA file containing multiple entries.
  7. ms2dna, v. 1.16: Generate samples of homologous DNA sequences evolved under defined evolutionary scenarios by converting the output of Richard Hudson's coalescent simulation program ms. As of version 1.11, it can also deal with output generated by Gary Chen's fast coalescent simulator MaCS using the pipeline macs [options] | msformatter | ms2dna -a.
  8. randomizeSeq, v. 0.8: Randomize sequences.
  9. sequencer, v. 1.14: Simulate shotgun sequencing with paired (as of version 1.11) or unpaired reads and a user-defined error rate.
  10. simK, v. 0.4: Simulate pair of sequences with given number of substitutions/site (K).



Bernhard Haubold 2015-06-03