赞
踩
包含barcode中的片段质控信息,barcode中signal信号,与TSS重叠的片段数量等各种指标信息。
singlecell.csv文件纵坐标为barcode, total, duplicate, chimeric, unmapped, lowmapq, mitochondrial, nonprimary, passed_filters, is__cell_barcode, excluded_reason, TSS_fragments, DNase_sensitive_region_fragments, enhancer_region_fragments, promoter_region_fragments, on_target_fragments, blacklist_region_fragments, peak_region_fragments, peak_region_cutsites等18指标;横坐标为所有barcode序列,包含未质控barcode,总行数超40W。需要注意通过不同pipeline输出的结果指标数量存在一定区别。
- # 文件内容
- head singlecell.csv
- barcode,total,duplicate,chimeric,unmapped,lowmapq,mitochondrial,nonprimary,passed_filters,is__cell_barcode,excluded_reason,TSS_fragments,DNase_sensitive_region_fragments,enhancer_region_fragments,promoter_region_fragments,on_target_fragments,blacklist_region_fragments,peak_region_fragments,peak_region_cutsites
- NO_BARCODE,8692958,1673643,1625,1220361,1114210,0,14527,4668592,0,0,0,0,0,0,0,0,0,0
- AAACGAAAGAAAGCAG-1,4,1,0,0,0,0,0,3,0,3,0,0,0,0,0,0,2,4
- AAACGAAAGAAAGGGT-1,1,0,0,0,0,0,0,1,0,3,0,0,0,0,0,0,1,2
- AAACGAAAGAAATACC-1,3,0,0,0,0,0,0,3,0,3,0,0,0,0,0,0,2,3
- AAACGAAAGAAATGGG-1,1200,644,0,109,238,0,6,203,0,0,15,0,0,0,15,0,39,76
- AAACGAAAGAAATTCG-1,316,184,0,21,50,0,0,61,0,0,4,0,0,0,4,0,13,25
- AAACGAAAGAACAGGA-1,1,0,0,0,0,0,0,1,0,3,0,0,0,0,0,0,0,0
- AAACGAAAGAACCCGA-1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
- AAACGAAAGAACGACC-1,3,0,0,1,0,0,0,2,0,0,1,0,0,0,1,0,2,4
各指标含义:
umn | Type | Description |
---|---|---|
barcode | key | barcodes present in input data |
total | sequencing | total read-pairs |
duplicate | mapping | number of duplicate read-pairs |
chimeric | mapping | number of chimerically mapped read-pairs |
unmapped | mapping | number of read-pairs with at least one end not mapped |
lowmapq | mapping | number of read-pairs with <30 mapq on at least one end |
mitochondrial | mapping | number of read-pairs mapping to mitochondria and non-nuclear contigs |
nonprimary | mapping | the number of reads that map to non-primary contigs |
passed_filters | mapping | number of non-duplicate, usable read-pairs i.e. "fragments" |
is_cell_barcode | cell calling | binary indicator of whether barcode is associated with a cell |
excluded_reason | cell calling | 0: barcode was not excluded; 1: barcode was excluded because it is a gel bead doublet; 2: barcode was excluded because it is low-targeting; 3: barcode was excluded because it is a barcode multiplet |
TSS_fragments | targeting | number of fragments overlapping with TSS regions |
DNase_sensitive_region_fragments | targeting | number of fragments overlapping with DNase sensitive regions |
enhancer_region_fragments | targeting | number of fragments overlapping enhancer regions |
promoter_region_fragments | targeting | number of fragments overlapping promoter regions |
on_target_fragments | targeting | number of fragments overlapping any of TSS, enhancer, promoter and DNase hypersensitivity sites (counted with multiplicity) |
blacklist_region_fragments | targeting | number of fragments overlapping blacklisted regions |
peak_region_fragments | denovo targeting | number of fragments overlapping peaks |
peak_region_cutsites | denovo targeting | number of ends of fragments in peak regions |
根据比对位置排序后的bam文件,其格式与标准的sam/bam文件略有区别,详见之前的文章
Cell Ranger count (gene expression) 输出文件解读_韩建刚(CAAS-UCD)的博客-CSDN博客
fragments.tsv.gz 文件中行:不同的fragment,列:如下5种属性,
Name | Description |
---|---|
chrom | Reference genome chromosome of fragment |
chromStart | Adjusted start position of fragment on chromosome. |
chromEnd | Adjusted end position of fragment on chromosome. The end position is exclusive, so represents the position immediately following the fragment interval. |
barcode | The 10x cell barcode of this fragment. This corresponds to the CB tag attached to the corresponding BAM file records for this fragment. |
readSupport | The total number of read pairs associated with this fragment. This includes the read pair marked unique and all duplicate read pairs. |
- tail fragments.tsv.gz
- 19 55423327 55423435 GATTAGCTCAAGAGAT-1 1
- 19 55423327 55423435 TCAAGACCATGCGCTG-1 11
- 19 55423327 55423448 ACGTGGCCAGGTTATC-1 1
- 19 55423327 55423448 TCAGGTACAGGGCTTC-1 5
- 19 55423327 55423453 CCTGCTAAGCGTCTGC-1 13
- 19 55423327 55423453 GGTCATATCAGTGGTT-1 5
- 19 55423327 55423457 ACCGGGTTCGGGACAA-1 4
- 19 55423327 55423457 AGTTACGTCGCAACTA-1 2
- 19 55423327 55423458 GACCGACGTTACGGAG-1 6
每一个peak用基因组一段序列区域来表示,其起点-终点分别代表一个酶切事件
Column Number | Name | Description |
---|---|---|
1 | chrom | Reference genome chromosome of peak |
2 | chromStart | Start position of peak on chromosome. |
3 | chromEnd | End position of peak on chromosome. The end position is exclusive, so represents the position immediately following the peak interval. |
7.1 注释策略:(1)一个peak 可以被比对到多个基因;(2)一个peak 只能是promoter peak 或distal peak的一种;(3)只有蛋白编码基因能够被注释到。
7.2 注释具体过程:(1)如果peak在启动子区域(TSS位点 -1000bp,+100bp),会被注释为promoter peak;(2)如果在TSS 200kb 以内,但没有被注释成 promoter peak,则会被注释为 distal peak;(3)如果一个peak位于转录本中(基因内),同时既不是promoter peak 也不是distal peak,则会被定义为distal peak,但距离会被设为0;(4)如果一个 peak 在上边三步没有被注释到任何基因,最终会被定义为 intergenic peak
7.3 peak_annotation.tsv 格式
共包含6列,前三列为 peak 染色体位置,第四列为注释基因名字,第五列为peak到基因的距离,正值表示到 peak 起点位于 TSS 下游,负值表示 peak 终点位于 TSS 上游,0 表示 peak 与TSS 重叠或者位于基因转录本区域。
Name | Description |
---|---|
chrom | Contig that contains the peak |
start | Peak start location |
end | Peak end location |
gene | Gene symbol based on the gene annotation in the reference. |
distance | Distance of peak from TSS of gene. Positive distance means the start of the peak is downstream of the position of the TSS, whereas negative distance means the end of the peak is upstream of the TSS. Zero distance means the peak overlaps with the TSS or the peak overlaps with the transcript body of the gene. |
peak_type | Can be "promoter", "distal" or "intergenic". |
- head peak_annotation.tsv
- chrom start end gene distance peak_type
- 1 12116 12985 ENSOARG00020000038 -13113 distal
- 1 29918 30788 ENSOARG00020000038 3821 distal
- 1 34037 35000 ENSOARG00020000038 7940 distal
- 1 36768 37257 ENSOARG00020000038 10671 distal
- 1 37368 38200 ENSOARG00020000038 11271 distal
- 1 46853 47720 ENSOARG00020000038 20756 distal
- 1 49556 50414 ENSOARG00020000038 23459 distal
- 1 59206 60122 FAM240C -27312 distal
- 1 63961 64830 FAM240C -22604 distal
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。