Skip to content

Commit

Permalink
CCS 6.2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
armintoepfer committed Sep 23, 2021
1 parent 4f56d31 commit db70caa
Show file tree
Hide file tree
Showing 6 changed files with 182 additions and 11 deletions.
12 changes: 11 additions & 1 deletion docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,17 @@ nav_order: 99

# Version changelog

**6.0.0**
**6.2.0**
* Upcoming SMRT Link release
* Improved low-complexity handling, better runtime and lower memory usage
* Improved BAM merge step, up to 5x faster
* Improved compute run time
* `INFO` logging if chemistry bundle is injected
* Enable strand splitting of ZMWs that contain large insertion heteroduplexes
* Use `TMPDIR` environment variable for storing temporary files
* New `INFO` log summary and `ccs_reports.txt`

6.0.0
* SMRT Link v10.1 release
* Increase number of HiFi reads
* Increase percentage of barcode yield
Expand Down
66 changes: 66 additions & 0 deletions docs/faq/mode-by-strand.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,69 @@ How does `--by-strand` work? For each ZMW:
* Create a draft for each strand
* Polish each strand
* Write out each polished strand consensus

### Summary logs
At the end of each execution, _ccs_ reports a summary for `--log-level INFO`.
This summary contains combined and individual metrics for double- and single-strand reads.
Using `--by-strand`, only single-strand reads are reported.

```
-------------------------------------------------
Summary stats abbreviations:
ZMW - A productive Zero-Mode Waveguide
DS - Double Strand
SS - Single Strand
DS-ZMW - All subreads were used from a single ZMW
SS-ZMW - ZMW is split into fwd and rev strands,
each strand is polished individually
DS-Read - CCS read of a DS-ZMW
SS-Read - CCS read of one strand of a SS-ZMW
HiFi - CCS reads with predicted accuracy >=Q20
UMY - Unique Molecular Yield of all reads passing filters
HiFi Yield - UMY of >=Q20 DS- and SS-ZMWs, longest read per ZMW
-------------------------------------------------
ZMWs Input : 5390
ZMWs Written : 1061
- DS / SS : 0 / 1061
UMY : 18.8 MBases (2.2 GBases/hr)
- DS / SS : 0 Bases / 18.8 MBases
HiFi Yield : 33.7 MBases (3.9 GBases/hr)
- DS / SS : 0 Bases / 33.7 MBases
HiFi Reads : 1909
- DS / SS : 0 / 1909
HiFi Avg Size : 17.7 KBases
HiFi Avg QV : 23.0
```

### By-strand `ccs_reports.txt`
Typical content of the by-strand `ccs_reports.txt` file. Contrary to the
default output, this file does not report numbers in ZMWs, but single-strand
reads. Accounting in single-strand ZMWs is not possible, as one strand might fail
and the other succeed.

```
Single-Strand Reads
Inputs : 9313 (86.38%)
Passed : 1909 (20.50%)
Failed : 7404 (79.50%)
Tandem repeats : 69 (0.932%)
Exclusive failed counts
Below SNR threshold : 101 (1.364%)
Median length filter : 0 (0.000%)
Shortcut filters : 0 (0.000%)
Lacking full passes : 4888 (66.02%)
Coverage drops : 6 (0.081%)
Insufficient draft cov : 30 (0.405%)
Draft too different : 0 (0.000%)
Draft generation error : 39 (0.527%)
Draft above --max-length : 0 (0.000%)
Draft below --min-length : 0 (0.000%)
Reads failed polishing : 0 (0.000%)
Empty coverage windows : 1 (0.014%)
CCS did not converge : 0 (0.000%)
CCS below minimum RQ : 2339 (31.59%)
Unknown error : 0 (0.000%)
```
92 changes: 92 additions & 0 deletions docs/faq/mode-heteroduplex-filtering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
layout: default
parent: FAQ
title: Mode --split-heteroduplex
---

# Attention: This is an early access feature!

## What is heteroduplex filtering?
Starting with _ccs_ v6.2.0, single-strand artifacts, such as insertions larger
than 20 bases, do not necessarily have to be filtered out. Using `--split-heteroduplexes`,
_ccs_ is able to split ZMW on-the-fly after detecting such heteroduplex and
process each strand separately. As a consequence, _ccs_ has to distinguish between
double-stranded (DS) and single-stranded (DS) ZMWs and their consensus reads.
Implications:

* Single-strand reads are stored in an extra file
* Summary logs report double-strand and single-strand metrics
* `ccs_reports.txt` file contains two columns, double-strand and single-strand reads

We are currently investigating how reliable we can detect indel and SNV
heteroduplexes and might add those to strand-aware splitting in future versions.

### Additional `*.stranded.bam` file
The file `outputPrefix.stranded.bam` contains all single-strand reads. Read names
follow the by-strand scheme with `/fwd` and `/rev` suffixes. There are up to two
reads per split ZMW.

### Summary logs
At the end of each execution, _ccs_ reports for `--log-level INFO` a summary.
This summary contains combined and individual metrics for DS and SS.

```
-------------------------------------------------
Summary stats abbreviations:
ZMW - A productive Zero-Mode Waveguide
DS - Double Strand
SS - Single Strand
DS-ZMW - All subreads were used from a single ZMW
SS-ZMW - ZMW is split into fwd and rev strands,
each strand is polished individually
DS-Read - CCS read of a DS-ZMW
SS-Read - CCS read of one strand of a SS-ZMW
HiFi - CCS reads with predicted accuracy >=Q20
UMY - Unique Molecular Yield of all reads passing filters
HiFi Yield - UMY of >=Q20 DS- and SS-ZMWs, longest read per ZMW
-------------------------------------------------
ZMWs Input : 53895
ZMWs Written : 22684
- DS / SS : 22644 / 40
UMY : 413.2 MBases (6.8 GBases/hr)
- DS / SS : 412.4 MBases / 733.7 KBases
HiFi Yield : 413.5 MBases (6.8 GBases/hr)
- DS / SS : 412.4 MBases / 1.0 MBases
HiFi Reads : 22701
- DS / SS : 22644 / 57
HiFi Avg Size : 18.2 KBases
HiFi Avg QV : 30.2
```

### Strand-aware `ccs_reports.txt`
Typical content of the strand-aware `ccs_reports.txt` file. Contrary to the
default output, this file does not report numbers in ZMWs, but actual DS and SS
reads. Accounting in SS ZMWs is not possible, as one strand might fail and the
other succeed.

```
Double-Strand Reads Single-Strand Reads
Inputs : 53590 (99.43%) 609 (0.564%)
Passed : 22644 (42.25%) 57 (9.360%)
Failed : 30946 (57.75%) 552 (90.64%)
Tandem repeats : 461 (1.490%) 0 (0.000%)
Exclusive failed counts
Below SNR threshold : 870 (2.811%) 0 (0.000%)
Median length filter : 0 (0.000%) 0 (0.000%)
Shortcut filters : 0 (0.000%) 0 (0.000%)
Lacking full passes : 26226 (84.75%) 0 (0.000%)
Coverage drops : 30 (0.097%) 0 (0.000%)
Insufficient draft cov : 61 (0.197%) 310 (56.16%)
Draft too different : 0 (0.000%) 0 (0.000%)
Draft generation error : 173 (0.559%) 54 (9.783%)
Draft above --max-length : 0 (0.000%) 0 (0.000%)
Draft below --min-length : 0 (0.000%) 0 (0.000%)
Reads failed polishing : 0 (0.000%) 0 (0.000%)
Empty coverage windows : 3 (0.010%) 0 (0.000%)
CCS did not converge : 2 (0.006%) 0 (0.000%)
CCS below minimum RQ : 3581 (11.57%) 188 (34.06%)
Unknown error : 0 (0.000%) 0 (0.000%)
```
15 changes: 9 additions & 6 deletions docs/faq/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ title: Performance

## How fast is _ccs_?
### Latest version
The latest _ccs_ v6 can process 200 GBases HiFi yield in 24 hours for a 25 KBases
_ccs_ v6.0 can process 200 GBases HiFi yield in 24 hours for a 25 KBases
library on 2x64 cores at 2.4 GHz.
To put this into perspective for actual sequencing collections:

| Sample | Insert size | HiFi Yield | Run Time |
| :------: | :---------: | :---------: | :------: |
| HG002 | 15 KBases | 41.1 GBases | 5h 52m |
| HG002 | 18 KBases | 34.0 GBases | 4h 36m |
| Readwood | 25 KBases | 32.4 GBases | 3h 46m |
| Sample | Insert size | HiFi Yield | Run Time |
| :-----: | :---------: | :---------: | :------: |
| HG002 | 15 KBases | 41.1 GBases | 5h 52m |
| HG002 | 18 KBases | 34.0 GBases | 4h 36m |
| Redwood | 25 KBases | 32.4 GBases | 3h 46m |

### Relative performance v3.0 to v6.0
Current _ccs_ v6 achieves a >150x speed-up for 20 KBases inserts compared to
Expand Down Expand Up @@ -51,6 +51,7 @@ due to toolchain improvements for generating a more optimized binary.
| 4.2.0 | 2,806,886 | 10h 47m | 61d 9h | 72 GB | 18% |
| 5.0.0 | 2,807,317 | 6h 44m | 62d 22h | 27 GB | 37% |
| 6.0.0 | 2,831,192 | 5h 52m | 44d 17h | 20 GB | 13% |
| 6.2.0 | TBD | TBD | TBD | TBD | TBD |

#### **HG002 18kb SQII, 32 GBases HiFi yield**
Omitting v4.0.0, due to lack of chemistry support.
Expand All @@ -60,6 +61,7 @@ Omitting v4.0.0, due to lack of chemistry support.
| 4.2.0 | 1823016 | 8h 35m | 47d 13h | 80 GB | |
| 5.0.0 | 1824206 | 5h 29m | 50d 16h | 46 GB | 36% |
| 6.0.0 | 1855604 | 4h 36m | 30d 13h | 18 GB | 15% |
| 6.2.0 | TBD | TBD | TBD | TBD | TBD |

#### **Redwood 25kb SQII, 32 GBases HiFi yield**

Expand All @@ -69,6 +71,7 @@ Omitting v4.0.0, due to lack of chemistry support.
| 4.2.0 | 1,310,775 | 6h 37m | 43d 18h | 74 GB | 17% |
| 5.0.0 | 1,311,693 | 4h 36m | 41d 13h | 41 GB | 30% |
| 6.0.0 | 1,335,888 | 3h 56m | 25d 11h | 17 GB | 14% |
| 6.2.0 | TBD | TBD | TBD | TBD | TBD |

### How is CCS speed affected by raw base yield?
Raw base yield is the sum of all polymerase read lengths.
Expand Down
6 changes: 3 additions & 3 deletions docs/faq/reports-aux-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The following comments refer to the filters that are explained in the FAQ above.

ZMWs with tandem repeats : 10 (0.60%) <- With repeats larger than --min-tandem-repeat-length

Exclusive counts for ZMWs failing filters:
Exclusive failed counts
Below SNR threshold : 30 (28.85%) <- SNR below --min-snr.
Median length filter : 0 (0.00%) <- All subreads are <50% or >200% of the median subread length
Lacking full passes : 0 (0.00%) <- Fewer than --min-passes full-length (FL) reads
Expand All @@ -38,8 +38,8 @@ The following comments refer to the filters that are explained in the FAQ above.
CCS below minimum RQ : 0 (0.00%) <- Predicted accuracy is below --min-rq
Unknown error : 0 (0.00%) <- Rare implementation errors

If run in `--by-strand` mode, rows may contain half ZMWs, as we account
each strand as half a ZMW.
If run in `--by-strand` mode, please have a look at [the by-strand FAQ](/faq/mode-by-strand).\
If run in `--split-heteroduplexes` mode, please have a look at [the strand-aware FAQ](/faq/mode-heteroduplex-filtering).

### Coverage drops
Example for a coverage drop in a single ZMW, subreads colored by strand orientation:
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Please refer to our [official pbbioconda page](https://github.com/PacificBioscie
for information on Installation, Support, License, Copyright, and Disclaimer.

## Latest Version
Version **6.0.0**: [Full changelog here](/changelog)
Version **6.2.0**: [Full changelog here](/changelog)

## What's new!
_ccs_ is now running on the Sequel IIe instrument, transferring HiFi reads
Expand Down

0 comments on commit db70caa

Please sign in to comment.