gunzip l43967.fasta.gzTo calculate the Ir for this sequence, enter
./ir -i l43967.fastato get
# Len I_r 580076 0.1388We can compare this to the Ir value of the shuffled version of the genome by typing
shuffleseq -filter l43967.fasta | ./irwhere shuffleseq is part of the EMBOSS software package [3]. The Ir of the shuffled sequence should be close to zero.
./ir -w 1000 < l43967.fasta \ | graph -y 0 -X Ir -Y Position -T Xwhere \ indicates line continuation. Notice the very sharp peaks in the resulting plot corresponding to regions with exceptional repetitiveness. We can smooth the plot by increasing the window size to, say, 10,000:
./ir -w 10000 < l43967.fasta \ | graph -y 0 -X Ir -Y Position -T XIn order to print our plot, generate it first in xfig format, for example:
./ir -w 10000 < l43967.fasta \ | graph -y 0 -X Ir -Y Position -T fig > mg.figThe resulting file can now be manipulated using the program xfig. Alternatively, we can convert it to postscript by typing
fig2dev -L ps mg.fig > mg.psSimilarly, encapsulated postscript for inclusion in, e.g., LATEX documents can be generated by
fig2dev -L eps mg.fig > mg.eps
cat ranSeq.fasta | ./ir # Len I_r 10000 0.0012If we add another exact copy of this sequence to the analysis, we expect to see a great increase in Ir :
cat ranSeq.fasta ranSeq.fasta | ./ir # Len I_r 20000 6.3702However, if we switch on the separate mode of analysis, we return to the low Ir value of the original isolated sequence:
cat ranSeq.fasta ranSeq.fasta | ./ir -s # Seq Len I_r >Random Sequence #1; G/C=0.50 10000 0.0012 >Random Sequence #1; G/C=0.50 10000 0.0012
When carrying out a window analysis on multiple sequences it might be useful to write the Ir values of the different sequences into distinct files. This can be done using the UNIX tool csplit. For our biologically rather meaningless example we can execute
cat ranSeq.fasta ranSeq.fasta| ./ir -w 100 | csplit -f dataset -z - /\>/ {1}to find the Ir values for the first sequence in dataset01 and those for the second sequence in dataset02.