Show Menu
Cheatography

Bedtools Cheat Sheet (DRAFT) by

This is a draft cheat sheet. It is a work in progress and is not finished yet.

BED file format

Column
e.g.
Definition
chrom
Sc112.1
<ST­R> name of chromo­som­e/s­caffold
start
2134
<IN­T> start position of feature
end
2565
<IN­T> end position of feature
name
gene123
<ST­R> name of feature
score
544
<NU­M> score for the feature e.g. bit score
strand
+
<+/­-/.> strand on which feature is located
thickStart
2235
thickEnd
2489
itemRgb
255,0,0
blockCount
2
blockSizes
150,80
blockS­tarts
0,2333

GFF vs BED indexing

GFF    ┌─1   2   3─┐ 4   ...
         G---A---T   C   ...
BED    └─0   1   2 └─3   ...
gff > bed:
bed_start = gff_start - 1,
bed_end = gff_end
bed > gff:
gff_start = bed_start + 1,
gff_end = bed_end
 

getfasta

$ bedtools getfasta [OPTIONS] -fi <input FASTA> -bed <BE­D/G­FF/­VCF>
options
-fo
Specify an output file name. By default, output goes to stdout.
-name
Use the “name” column in the BED file for the FASTA headers in the output FASTA file.
-tab
Report extract sequences in a tab-de­limited format instead of in FASTA format.
-bedOut
Report extract sequences in a tab-de­limited BED format instead of in FASTA format.
-s
Force strand­edness. If the feature occupies the antisense strand, the sequence will be reverse comple­mented. Default: strand inform­ation is ignored.
-split
Given BED12 input, extract and concat­enate the sequences from the BED “blocks” (e.g., exons)

maskfasta

$ bedtools maskfasta [OPTIONS] -fi <input FASTA> -bed <BE­D/G­FF/­VCF> -fo <output FASTA>
OPTIONS
-soft
Soft-mask (that is, convert to lower-case bases) the FASTA sequence. By default, hard-m­asking (that is, conversion to Ns) is performed.
-mc
Replace masking character. That is, instead of masking with Ns, use another character.
FASTA   A­CTG­ATC­ATG­ATA­CAT­GAT­ACC­ATT­AGG­ATA­CAATA

BED ­ ­ ­ ­ ­ ­ ­ ­ ­ ­ ­███­█       ██­███­ ­ ­ ­ ­ ­ ­███­█    

FASTA'­ ­ ­ACT­GAT­NNN­NAT­ACA­TGN­NNN­NAT­TAG­GNN­NNAATA