当前位置:   article > 正文

cellranger atac 操作笔记-3:count 输出文件解读 (2-2)_scanpy fragments.tsv.gz

scanpy fragments.tsv.gz

3. 细胞barcode质控信息,singlecell.csv



singlecell.csv文件纵坐标为barcode, total, duplicate, chimeric, unmapped, lowmapq, mitochondrial, nonprimary, passed_filters, is__cell_barcode, excluded_reason, TSS_fragments, DNase_sensitive_region_fragments, enhancer_region_fragments, promoter_region_fragments, on_target_fragments, blacklist_region_fragments, peak_region_fragments, peak_region_cutsites等18指标;横坐标为所有barcode序列,包含未质控barcode,总行数超40W。需要注意通过不同pipeline输出的结果指标数量存在一定区别。

  1. # 文件内容
  2. head singlecell.csv
  3. barcode,total,duplicate,chimeric,unmapped,lowmapq,mitochondrial,nonprimary,passed_filters,is__cell_barcode,excluded_reason,TSS_fragments,DNase_sensitive_region_fragments,enhancer_region_fragments,promoter_region_fragments,on_target_fragments,blacklist_region_fragments,peak_region_fragments,peak_region_cutsites
  4. NO_BARCODE,8692958,1673643,1625,1220361,1114210,0,14527,4668592,0,0,0,0,0,0,0,0,0,0
  5. AAACGAAAGAAAGCAG-1,4,1,0,0,0,0,0,3,0,3,0,0,0,0,0,0,2,4
  6. AAACGAAAGAAAGGGT-1,1,0,0,0,0,0,0,1,0,3,0,0,0,0,0,0,1,2
  7. AAACGAAAGAAATACC-1,3,0,0,0,0,0,0,3,0,3,0,0,0,0,0,0,2,3
  8. AAACGAAAGAAATGGG-1,1200,644,0,109,238,0,6,203,0,0,15,0,0,0,15,0,39,76
  9. AAACGAAAGAAATTCG-1,316,184,0,21,50,0,0,61,0,0,4,0,0,0,4,0,13,25
  10. AAACGAAAGAACAGGA-1,1,0,0,0,0,0,0,1,0,3,0,0,0,0,0,0,0,0
  11. AAACGAAAGAACCCGA-1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
  12. AAACGAAAGAACGACC-1,3,0,0,1,0,0,0,2,0,0,1,0,0,0,1,0,2,4



barcodekeybarcodes present in input data
totalsequencingtotal read-pairs
duplicatemappingnumber of duplicate read-pairs
chimericmappingnumber of chimerically mapped read-pairs
unmappedmappingnumber of read-pairs with at least one end not mapped
lowmapqmappingnumber of read-pairs with <30 mapq on at least one end
mitochondrialmappingnumber of read-pairs mapping to mitochondria and non-nuclear contigs
nonprimarymappingthe number of reads that map to non-primary contigs
passed_filtersmappingnumber of non-duplicate, usable read-pairs i.e. "fragments"
is_cell_barcodecell callingbinary indicator of whether barcode is associated with a cell
excluded_reasoncell calling0: barcode was not excluded; 1: barcode was excluded because it is a gel bead doublet; 2: barcode was excluded because it is low-targeting; 3: barcode was excluded because it is a barcode multiplet
TSS_fragmentstargetingnumber of fragments overlapping with TSS regions
DNase_sensitive_region_fragmentstargetingnumber of fragments overlapping with DNase sensitive regions
enhancer_region_fragmentstargetingnumber of fragments overlapping enhancer regions
promoter_region_fragmentstargetingnumber of fragments overlapping promoter regions
on_target_fragmentstargetingnumber of fragments overlapping any of TSS, enhancer, promoter and DNase hypersensitivity sites (counted with multiplicity)
blacklist_region_fragmentstargetingnumber of fragments overlapping blacklisted regions
peak_region_fragmentsdenovo targetingnumber of fragments overlapping peaks
peak_region_cutsitesdenovo targetingnumber of ends of fragments in peak regions

4. BAM 和 .BAM.BAI 文件


Cell Ranger count (gene expression) 输出文件解读_韩建刚(CAAS-UCD)的博客-CSDN博客

5. fragment file,包括fragments.tsv.gz 和 fragments.tsv.gz.tbi

fragments.tsv.gz 文件中行:不同的fragment,列:如下5种属性,

chromReference genome chromosome of fragment
chromStartAdjusted start position of fragment on chromosome.
chromEndAdjusted end position of fragment on chromosome. The end position is exclusive, so represents the position immediately following the fragment interval.
barcodeThe 10x cell barcode of this fragment. This corresponds to the CB tag attached to the corresponding BAM file records for this fragment.
readSupportThe total number of read pairs associated with this fragment. This includes the read pair marked unique and all duplicate read pairs.
  1. tail fragments.tsv.gz
  2. 19 55423327 55423435 GATTAGCTCAAGAGAT-1 1
  3. 19 55423327 55423435 TCAAGACCATGCGCTG-1 11
  4. 19 55423327 55423448 ACGTGGCCAGGTTATC-1 1
  5. 19 55423327 55423448 TCAGGTACAGGGCTTC-1 5
  6. 19 55423327 55423453 CCTGCTAAGCGTCTGC-1 13
  7. 19 55423327 55423453 GGTCATATCAGTGGTT-1 5
  8. 19 55423327 55423457 ACCGGGTTCGGGACAA-1 4
  9. 19 55423327 55423457 AGTTACGTCGCAACTA-1 2
  10. 19 55423327 55423458 GACCGACGTTACGGAG-1 6

6. peaks file, peaks.bed


Column NumberNameDescription
1chromReference genome chromosome of peak
2chromStartStart position of peak on chromosome.
3chromEndEnd position of peak on chromosome. The end position is exclusive, so represents the position immediately following the peak interval.

7.  peaks 注释

7.1 注释策略:(1)一个peak 可以被比对到多个基因;(2)一个peak 只能是promoter peak 或distal peak的一种;(3)只有蛋白编码基因能够被注释到。

7.2 注释具体过程:(1)如果peak在启动子区域(TSS位点 -1000bp,+100bp),会被注释为promoter peak;(2)如果在TSS 200kb 以内,但没有被注释成 promoter peak,则会被注释为 distal peak;(3)如果一个peak位于转录本中(基因内),同时既不是promoter peak 也不是distal peak,则会被定义为distal peak,但距离会被设为0;(4)如果一个 peak 在上边三步没有被注释到任何基因,最终会被定义为 intergenic peak

7.3 peak_annotation.tsv 格式

共包含6列,前三列为 peak 染色体位置,第四列为注释基因名字,第五列为peak到基因的距离,正值表示到 peak 起点位于 TSS 下游,负值表示 peak 终点位于 TSS 上游,0 表示 peak 与TSS 重叠或者位于基因转录本区域。

chromContig that contains the peak
startPeak start location
endPeak end location
geneGene symbol based on the gene annotation in the reference.
distanceDistance of peak from TSS of gene. Positive distance means the start of the peak is downstream of the position of the TSS, whereas negative distance means the end of the peak is upstream of the TSS. Zero distance means the peak overlaps with the TSS or the peak overlaps with the transcript body of the gene.
peak_typeCan be "promoter", "distal" or "intergenic".
  1. head peak_annotation.tsv
  2. chrom start end gene distance peak_type
  3. 1 12116 12985 ENSOARG00020000038 -13113 distal
  4. 1 29918 30788 ENSOARG00020000038 3821 distal
  5. 1 34037 35000 ENSOARG00020000038 7940 distal
  6. 1 36768 37257 ENSOARG00020000038 10671 distal
  7. 1 37368 38200 ENSOARG00020000038 11271 distal
  8. 1 46853 47720 ENSOARG00020000038 20756 distal
  9. 1 49556 50414 ENSOARG00020000038 23459 distal
  10. 1 59206 60122 FAM240C -27312 distal
  11. 1 63961 64830 FAM240C -22604 distal

