Subsections

Tutorial

Global Ir

To test ir, download the example genome of Mycoplasma genitalium into the newly created directory Ir_xxx and uncompress it:
gunzip l43967.fasta.gz
To calculate the Ir for this sequence, enter
./ir -i l43967.fasta
to get
# Len   I_r
580076  0.1388
We can compare this to the Ir value of the shuffled version of the genome by typing
shuffleseq -filter l43967.fasta | ./ir
where shuffleseq is part of the EMBOSS software package [3]. The Ir of the shuffled sequence should be close to zero.

Window Analysis

Instead of just computing an Ir for an entire sequence, it is often more informative to look at local variations in Ir. This can be done by carrying out a window analysis using the -w option. In order to visualize the result of such a window analysis, we use the program graph, which is part of the plotutils package. Given a functioning installation of plotutils, try
./ir -w 1000 < l43967.fasta \
| graph -y 0  -X Ir -Y Position -T X
where \ indicates line continuation. Notice the very sharp peaks in the resulting plot corresponding to regions with exceptional repetitiveness. We can smooth the plot by increasing the window size to, say, 10,000:
./ir -w 10000 < l43967.fasta \
| graph -y 0  -X Ir -Y Position -T X
In order to print our plot, generate it first in xfig format, for example:
./ir -w 10000 < l43967.fasta \
| graph -y 0  -X Ir -Y Position -T fig > mg.fig
The resulting file can now be manipulated using the program xfig. Alternatively, we can convert it to postscript by typing
fig2dev -L ps mg.fig > mg.ps
Similarly, encapsulated postscript for inclusion in, e.g., LATEX documents can be generated by
fig2dev -L eps mg.fig > mg.eps

Multiple Sequences

Multiple sequences can be analyzed either separately, or combinded. To see the difference between these two modes, compute the Ir of the random sequence supplied together with the source files:
cat ranSeq.fasta | ./ir # Len   I_r
10000   0.0012
If we add another exact copy of this sequence to the analysis, we expect to see a great increase in Ir :
cat ranSeq.fasta ranSeq.fasta | ./ir
# Len   I_r
20000   6.3702
However, if we switch on the separate mode of analysis, we return to the low Ir value of the original isolated sequence:
cat ranSeq.fasta ranSeq.fasta | ./ir -s
# Seq   Len     I_r
>Random Sequence #1; G/C=0.50   10000   0.0012
>Random Sequence #1; G/C=0.50   10000   0.0012

When carrying out a window analysis on multiple sequences it might be useful to write the Ir values of the different sequences into distinct files. This can be done using the UNIX tool csplit. For our biologically rather meaningless example we can execute

cat ranSeq.fasta ranSeq.fasta| ./ir -w 100 | csplit -f dataset -z - /\>/ {1}
to find the Ir values for the first sequence in dataset01 and those for the second sequence in dataset02.