@DocumentedFeature @BetaFeature public class CollectIndependentReplicateMetrics extends CommandLineProgram
The estimation is based on duplicate-sets of size 2 and 3 and gives separate estimates from each. The assumption is that the duplication rate (biological or otherwise) is independent of the duplicate-set size. A significant difference between the two rates may be an indication that this assumption is incorrect.
The duplicate sets are found using the mate-cigar tag (MC) which is added by MergeBamAlignment , or FixMateInformation.
This program will not work without the MC tag.
Explanation of the calculation behind the estimation can be found in the IndependentReplicateMetric class.
The calculation Assumes a diploid organism (more accurately, assumes that only two alleles can appear at a HET site and that these two alleles will appear at equal probabilities. It requires as input a VCF with genotypes for the sample in question. NOTE: This class is very much in alpha stage, and still under heavy development (feel free to join!)
| Modifier and Type | Field | Description |
|---|---|---|
String |
BARCODE_BQ |
|
String |
BARCODE_TAG |
|
File |
INPUT |
|
File |
MATRIX_OUTPUT |
|
Integer |
MINIMUM_BARCODE_BQ |
|
Integer |
MINIMUM_BQ |
|
Integer |
MINIMUM_GQ |
|
Integer |
MINIMUM_MQ |
|
File |
OUTPUT |
|
String |
SAMPLE |
|
Integer |
STOP_AFTER |
|
File |
VCF |
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY| Constructor | Description |
|---|---|
CollectIndependentReplicateMetrics() |
| Modifier and Type | Method | Description |
|---|---|---|
protected int |
doWork() |
Do the work after command line has been parsed.
|
customCommandLineValidation, getCommandLine, getCommandLineParser, getDefaultHeaders, getFaqLink, getMetricsFile, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, requiresReference, setDefaultHeaders, useLegacyParser@Argument(shortName="I",
doc="Input (indexed) BAM file.")
public File INPUT
@Argument(shortName="O",
doc="Write metrics to this file")
public File OUTPUT
@Argument(shortName="MO",
doc="Write the confusion matrix (of UMIs) to this file",
optional=true)
public File MATRIX_OUTPUT
@Argument(shortName="V",
doc="Input VCF file")
public File VCF
@Argument(shortName="GQ",
doc="minimal value for the GQ field in the VCF to use variant site.",
optional=true)
public Integer MINIMUM_GQ
@Argument(shortName="MQ",
doc="minimal value for the mapping quality of the reads to be used in the estimation.",
optional=true)
public Integer MINIMUM_MQ
@Argument(shortName="BQ",
doc="minimal value for the base quality of a base to be used in the estimation.",
optional=true)
public Integer MINIMUM_BQ
@Argument(shortName="ALIAS",
doc="Name of sample to look at in VCF. Can be omitted if VCF contains only one sample.",
optional=true)
public String SAMPLE
@Argument(doc="Number of sets to examine before stopping.",
optional=true)
public Integer STOP_AFTER
@Argument(doc="Barcode SAM tag.",
optional=true)
public String BARCODE_TAG
@Argument(doc="Barcode Quality SAM tag.",
optional=true)
public String BARCODE_BQ
@Argument(shortName="MBQ",
doc="minimal value for the base quality of all the bases in a molecular barcode, for it to be used.",
optional=true)
public Integer MINIMUM_BARCODE_BQ
protected int doWork()
CommandLineProgramdoWork in class CommandLineProgram