Pileup format is first used by Tony Cox and Zemin Ning at the Sanger Institute. It desribes the base-pair information at each chromosomal position. This format facilitates SNP/indel calling and brief alignment viewing by eyes.
Pileup 格式是桑格中心(Tony Cox and Zemin Ning)提出,描述可用肉眼观察的某一个区域所有reads匹配的情况。
The pileup format has several variants. The default output by SAMtools looks like this
/
- seq1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&
- seq1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+
- seq1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6
- seq1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<<
- seq1 276 G 22 ...T,,.,.,...,,,.,.... 33;+<<7=7<<7<&<<1;<<6<
- seq1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&<
- seq1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<<
- seq1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<
where each line consists of
.
stands for a match to the reference base on the forward strand代表匹配到正链
,
for a match on the reverse strand代表匹配到负链
ACGTN
for a mismatch on the forward strand大写的ACGTN
代表与reference的正向链上不同的实际碱基的5种情况
acgtn
for a mismatch on the reverse strand小写的acgtn
代表与reference的反向链上不同的实际碱基的5种情况
+[0-9]+[ACGTNacgtn]+
indicates there is an insertion between this reference position and the next reference position. The length of the insertion is given by the integer in the pattern, followed by the inserted sequence.
seq2 156 A 11 .$......+2AG.+2AG.+2AGGG <975;:<<<<<
中的+2AG
有3处,代表有3个read上有AG
的2个bp的插入seq3 200 A 20 ,,,,,..,.-4CACC.-4CACC....,.,,.^~. ==<<<<<<<<<<<::<;2<<
同理,此处的-4CACC
有2处,代表有2个read上有CACC
的4个bp的缺失^
marks the start of a read segment which is a contiguous subsequence on the read separated by N/S/H
CIGAR operations.^
代表刚好是read的开头
^
minus 33 gives the mapping quality.^
后面跟着的符号表示比对的质量(ASCII码减33)
$
marks the end of a read segment.$
代表刚好是read的结尾