@DocumentedFeature public class CheckFingerprint extends CommandLineProgram
FingerprintingSummaryMetrics and
FingerprintingDetailMetrics.
The output files may be specified individually using the SUMMARY_OUTPUT and DETAIL_OUTPUT options.
Alternatively the OUTPUT option may be used instead to give the base of the two output
files, with the summary metrics having a file extension "fingerprinting_summary_metrics",
and the detail metrics having a file extension "fingerprinting_detail_metrics".
java -jar picard.jar CheckFingerprint \
INPUT=sample.bam \
GENOTYPES=sample_genotypes.vcf \
HAPLOTYPE_DATABASE=fingerprinting_haplotype_database.txt \
OUTPUT=sample_fingerprinting
This tool calculates a single number that reports the LOD score for identity check between the INPUT
and the GENOTYPES. A positive value indicates that the data seems to have come from the same individual
or, in other words the identity checks out. The scale is logarithmic (base 10), so a LOD of 6 indicates
that it is 1,000,000 more likely that the data matches the genotypes than not. A negative value indicates
that the data do not match. A score that is near zero is inconclusive and can result from low coverage
or non-informative genotypes.
The identity check makes use of haplotype blocks defined in the HAPLOTYPE_MAP file to enable it to have higher
statistical power for detecting identity or swap by aggregating data from several SNPs in the haplotype block. This
enables an identity check of samples with very low coverage (e.g. ~1x mean coverage).
When provided a VCF, the identity check looks at the PL, GL and GT fields (in that order) and uses the first one that it finds.
| Modifier and Type | Field | Description |
|---|---|---|
File |
DETAIL_OUTPUT |
|
String |
EXPECTED_SAMPLE_ALIAS |
|
static String |
FINGERPRINT_DETAIL_FILE_SUFFIX |
|
static String |
FINGERPRINT_SUMMARY_FILE_SUFFIX |
|
double |
GENOTYPE_LOD_THRESHOLD |
|
String |
GENOTYPES |
|
File |
HAPLOTYPE_MAP |
|
boolean |
IGNORE_READ_GROUPS |
|
String |
INPUT |
|
String |
OBSERVED_SAMPLE_ALIAS |
|
String |
OUTPUT |
|
File |
SUMMARY_OUTPUT |
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY| Constructor | Description |
|---|---|
CheckFingerprint() |
| Modifier and Type | Method | Description |
|---|---|---|
protected String[] |
customCommandLineValidation() |
Put any custom command-line validation in an override of this method.
|
protected int |
doWork() |
Do the work after command line has been parsed.
|
getCommandLine, getCommandLineParser, getDefaultHeaders, getFaqLink, getMetricsFile, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, makeReferenceArgumentCollection, parseArgs, requiresReference, setDefaultHeaders, useLegacyParser@Argument(shortName="I",
doc="Input file SAM/BAM or VCF. If a VCF is used, it must have at least one sample. If there are more than one samples in the VCF, the parameter OBSERVED_SAMPLE_ALIAS must be provided in order to indicate which sample\'s data to use. If there are no samples in the VCF, an exception will be thrown.")
public String INPUT
@Argument(optional=true,
doc="If the input is a VCF, this parameters used to select which sample\'s data in the VCF to use.")
public String OBSERVED_SAMPLE_ALIAS
@Argument(shortName="O",
doc="The base prefix of output files to write. The summary metrics will have the file extension \'fingerprinting_summary_metrics\' and the detail metrics will have the extension \'fingerprinting_detail_metrics\'.",
mutex={"SUMMARY_OUTPUT","DETAIL_OUTPUT"})
public String OUTPUT
@Argument(shortName="S",
doc="The text file to which to write summary metrics.",
mutex="OUTPUT")
public File SUMMARY_OUTPUT
@Argument(shortName="D",
doc="The text file to which to write detail metrics.",
mutex="OUTPUT")
public File DETAIL_OUTPUT
@Argument(shortName="G",
doc="File of genotypes (VCF) to be used in comparison. May contain any number of genotypes; CheckFingerprint will use only those that are usable for fingerprinting.")
public String GENOTYPES
@Argument(shortName="SAMPLE_ALIAS",
optional=true,
doc="This parameter can be used to specify which sample\'s genotypes to use from the expected VCF file (the GENOTYPES file). If it is not supplied, the sample name from the input (VCF or BAM read group header) will be used.")
public String EXPECTED_SAMPLE_ALIAS
@Argument(shortName="H",
doc="The file lists a set of SNPs, optionally arranged in high-LD blocks, to be used for fingerprinting. See https://software.broadinstitute.org/gatk/documentation/article?id=9526 for details.")
public File HAPLOTYPE_MAP
@Argument(shortName="LOD",
doc="When counting haplotypes checked and matching, count only haplotypes where the most likely haplotype achieves at least this LOD.")
public double GENOTYPE_LOD_THRESHOLD
@Argument(optional=true,
shortName="IGNORE_RG",
doc="If the input is a SAM/BAM, and this parameter is true, treat the entire input BAM as one single read group in the calculation, ignoring RG annotations, and producing a single fingerprint metric for the entire BAM.")
public boolean IGNORE_READ_GROUPS
public static final String FINGERPRINT_SUMMARY_FILE_SUFFIX
public static final String FINGERPRINT_DETAIL_FILE_SUFFIX
protected int doWork()
CommandLineProgramdoWork in class CommandLineProgramprotected String[] customCommandLineValidation()
CommandLineProgramcustomCommandLineValidation in class CommandLineProgram