1 Background
GDx at OUSAMG is planning to upscale the WGS production to 192
samples (4 x 48
or 2 x 48 + 1 x 96
) samples per week. Do we have enough capacity in IT and bioinformatics pipelines for this upscaling?
The capacity of IT & bioinformatics pipelines can be evaluated from following three aspects:
- Data transfer speed
- Data storage
- Pipeline capacity
This document will focus on the evaluation of pipeline capacity.
2 Pipeline Capacity
2.1 Available hardware
Machine | Units | DRAGEN type | DRAGENs per unit | Total DRAGENs |
---|---|---|---|---|
NovaSeq X Plus | 2a | Onboard | 4 | 8 |
NSC DRAGEN | 2b | External | 1 | 2 |
TSD DRAGEN | 1 | External | 1 | 1 |
- nox and nixie
- stormfly and toothless
2.2 Sequencing ➕ Demultiplexing ➕ Delivery data to bioinformatics pipeline
This section will evaluate the time needed for sequencing, demultiplexing and copying data from NovaSeq X to external storage (Boston).
2.2.1 Single 25B flowcell 2nd Analysis BCL Convert only
2.2.1.1 NovaSeq X reported time
Status | Started | Completed |
---|---|---|
Sequencing completed | 2024-11-18 13:02:04 | 2024-11-20 06:51:37 |
Sequencing file transfer to external storage completed | 2024-11-18 19:32:40 | 2024-11-20 08:57:45 |
Analysis completed | 2024-11-18 19:32:44 | 2024-11-20 13:58:15 |
Analysis file transfer to external storage completed | 2024-11-18 19:32:45 | 2024-11-20 13:58:15 |
2.2.1.2 Detailed steps time
The time taken by each step of each Novaseq X run differs. Numbers listed here is “close to” the medians.
- Sequencing of a single 25B flowcell takes
44h
. - Sequencing file transfer to external storage starts
7h
after sequencing start and finishes2h
after sequencing finishes.- Post-run wash time included which takes 1h48m
- Some reports need to be copied to external storage. But they must wait for completion of post-run wash.
- Secondary analysis (BCL Convert, Fastqc, ORA compression) takes extra
6h
.- BCL Convert takes
2h
after sequencing finish. - Fastqc and ORA compression take
4h
after BCL Convert finish.
- BCL Convert takes
- Coping secondary analysis data from sequencer to external storage (boston) takes extra
1h20m
.- Copying fastq.ora files takes
1h
- Copying other files, e.g., reports, logs, metadata files takes
20m
- Copying fastq.ora files takes
- After copying data to boston complete, deliver sequencing data to the nscDeliver folder, calculating md5sum and push samples to lims-exporter queue takes extra
15m
.
Sequencing starts to samples ready for running pipleline takes 52h
.
Status | Started | Completed |
---|---|---|
Sequencing completed | 2024-11-18 13:02:04 | 2024-11-20 06:51:37 |
bcl2fastq | 2024-11-19 21:12:28 | 2024-11-19 23:35:36 |
32 sample ora compress | 2024-11-19 23:35:36 | 2024-11-20 01:21:08 |
FastQC (map-align) msg_001 | 2024-11-19 23:35:36 | 2024-11-20 02:53:09 |
FastQC (map-align) msg_002 | 2024-11-19 23:35:36 | 2024-11-20 02:58:34 |
32 sample ora compress | 2024-11-20 01:21:08 | 2024-11-20 03:08:53 |
FastQC report generation | 2024-11-20 10:54:00 | 2024-11-20 11:11:00 |
Analysis file transfer to boston | 2024-11-20 13:40:00 | 2024-11-20 14:00:00 |
NSC automation (md5sum+deliver) | 2024-11-20 14:05:02 | 2024-11-20 14:15:05 |
This is a specific run on 2024-11-18.
2.2.2 Dual 25B flowcell; 2nd Analysis = BCL Convert only
The time taken by each step of each Novaseq X run differs. Numbers listed here is from a specific run 20241209.
2.2.2.1 NovaSeq X reported time
Status | Started | Completed | Duration |
---|---|---|---|
Sequencing completed | 2024-12-09 11:30:41 | 2024-12-11 07:09:03 | 43h37m |
Sequencing file transfer to external storage completed | 2024-12-09 18:01:57 | 2024-12-11 09:17:40 | 39h15m |
Analysis completed | 2024-12-09 18:02:06 | 2024-12-11 14:03:25 | 44h01m |
Analysis file transfer to external storage completed | 2024-12-09 18:02:07 | 2024-12-11 14:03:25 | 44h01m |
Status | Started | Completed | Duration |
---|---|---|---|
Sequencing completed | 2024-12-09 11:30:37 | 2024-12-11 07:12:55 | 43h42m |
Sequencing file transfer to external storage completed | 2024-12-09 18:02:33 | 2024-12-11 09:22:38 | 39h19m |
Analysis completed | 2024-12-09 18:02:38 | 2024-12-11 19:28:08 | 49h24m |
Analysis file transfer to external storage completed | 2024-12-09 18:02:40 | 2024-12-11 19:28:08 | 49h24m |
This is a specific run 20241209.
2.2.2.2 Detailed steps time
- Sequencing of dual 25B flowcell takes
44h
. - Sequencing file transfer to external storage starts
7h
after sequencing start and finishes2h
after sequencing finishes.- Post-run wash time included which takes 1h48m
- Some reports need to be copied to external storage. But they must wait for completion of post-run wash.
- Secondary analysis (BCL Convert, Fastqc, ORA compression, FastQC reporting) takes extra
10h
.- Flowcell B BCL Convert takes
2h
for each flowcell separately. - Flowcell B Fastqc and ORA compression are done in parallel and finishes
3h
after BCL Convert finish. - Flowcell A BCL Convert takes
2h
for each flowcell separately. - Flowcell A BFastqc and ORA compression are done in parallel and finishes
3h
after BCL Convert finish. - Flowcell B FastQC reporting for flowcell B takes
1h40m
. - Flowcell A FastQC reporting for flowcell B takes
1h40m
.
- Flowcell B BCL Convert takes
- Coping secondary analysis data from sequencer to external storage (boston) takes extra
2h40m
.- Copying fastq.ora files takes
1h
- Copying other files, e.g., reports, logs, metadata files takes
20m
- Sequencial for flowcell B, then A.
- Copying fastq.ora files takes
- After copying data to boston complete, deliver sequencing data to the nscDeliver folder, calculating md5sum and push samples to lims-exporter queue takes extra
15m
for each flowcell.
Sequencing starts to samples ready for running pipleline takes 57h
.
Job | Started | Completed |
---|---|---|
RunStartTime (B) | 2024-12-09 11:03:53 | |
Sequencing completed (B) | 2024-12-09 11:30:41 | 2024-12-11 07:09:03 |
Instrument Analytics Cycle 1 - 320 (B) | 2024-12-09 17:01:58 | 2024-12-11 06:09:02 |
BCL CONVERSION B | 2024-12-10 04:52:36 | 2024-12-10 21:33:20 |
bcl2fastq (B) | 2024-12-10 21:33:21 | 2024-12-10 23:39:08 |
16 sample ora compress (B) | 2024-12-10 23:39:09 | 2024-12-11 00:40:27 |
16 sample ora compress (B) | 2024-12-10 23:39:09 | 2024-12-11 00:36:40 |
16 sample ora compress (B) | 2024-12-10 23:39:09 | 2024-12-11 00:47:16 |
FastQC (map-align) (B) msg_001 | 2024-12-10 23:39:09 | 2024-12-11 01:16:52 |
FastQC (map-align) (B) msg_002 | 2024-12-10 23:39:09 | 2024-12-11 02:49:01 |
FastQC (map-align) (B) msg_003 | 2024-12-10 23:39:09 | 2024-12-11 01:21:56 |
FastQC (map-align) (B) msg_004 | 2024-12-10 23:39:09 | 2024-12-11 01:21:20 |
16 sample ora compress (B) | 2024-12-11 00:36:40 | 2024-12-11 01:33:13 |
fastq generation complete (B) | 2024-12-11 07:39:08 | |
Post Run (B) | 2024-12-11 07:47:48 | 2024-12-11 09:00:31 |
RunEndTime (B) | 2024-12-11 09:04:23 | |
FastQC report generation (B) | 2024-12-11 09:12:00 | 2024-12-11 10:50:00 |
Analysis file transfer to boston (B) | 2024-12-11 12:40:00 | 2024-12-11 14:02:00 |
NSC automation (md5sum+deliver) (B) | 2024-12-11 14:10:02 | 2024-12-11 14:16:49 |
Job | Started | Completed |
---|---|---|
RunStartTime (A) | 2024-12-09 11:03:52 | |
Sequencing completed (A) | 2024-12-09 11:30:37 | 2024-12-11 07:12:55 |
Instrument Analytics Cycle 1 - 320 (A) | 2024-12-09 17:02:34 | 2024-12-11 06:12:54 |
BCL CONVERSION A | 2024-12-10 04:49:52 | 2024-12-10 21:29:01 |
bcl2fastq (A) | 2024-12-11 02:51:04 | 2024-12-11 04:54:52 |
16 sample ora compress (A) | 2024-12-11 04:54:52 | 2024-12-11 06:03:44 |
16 sample ora compress (A) | 2024-12-11 04:54:52 | 2024-12-11 06:00:23 |
16 sample ora compress (A) | 2024-12-11 04:54:52 | 2024-12-11 06:04:37 |
FastQC (map-align) (A) msg_001 | 2024-12-11 04:54:52 | 2024-12-11 08:11:00 |
FastQC (map-align) (A) msg_002 | 2024-12-11 04:54:52 | 2024-12-11 06:32:46 |
FastQC (map-align) (A) msg_003 | 2024-12-11 04:54:52 | 2024-12-11 06:34:32 |
FastQC (map-align) (A) msg_004 | 2024-12-11 04:54:52 | 2024-12-11 06:35:42 |
16 sample ora compress (A) | 2024-12-11 06:00:23 | 2024-12-11 06:57:40 |
Post Run (A) | 2024-12-11 07:54:00 | 2024-12-11 09:04:23 |
RunEndTime (A) | 2024-12-11 09:04:23 | |
fastq generation complete (A) | 2024-12-11 12:54:52 | |
FastQC report generation (A) | 2024-12-11 14:33:00 | 2024-12-11 16:12:00 |
Analysis file transfer to boston (A) | 2024-12-11 18:00:00 | 2024-12-11 19:27:00 |
NSC automation (md5sum+deliver) (A) | 2024-12-11 19:35:01 | 2024-12-11 19:42:11 |
This is a specific run on 2024-12-09.
2.2.3 Dual 25B flowcell; 2nd Analysis = WGS Germline
Dual 25B flowcell (128 samples). 2nd Analysis is WGS Germline (mapping and variant calling). The time taken by each step of each Novaseq X run differs. Numbers listed here is from a specific run 20241213.
2.2.3.1 NovaSeq X reported time
2.2.3.2 Detailed steps time
- Sequencing of dual 25B flowcell takes
44h
. - Sequencing file transfer to external storage starts
7h
after sequencing start and finishes2h
after sequencing finishes.- Post-run wash time included which takes 1h48m
- Some reports need to be copied to external storage. But they must wait for completion of post-run wash.
- Secondary analysis (BCL Convert, Fastqc, ORA compression, Mapping and all variant calling) takes extra
16h
. - Coping secondary analysis data from sequencer to external storage (boston) takes extra
6h
3h
for flowcell B3h
for flowcell A
- After copying data to boston complete, deliver sequencing data to the nscDeliver folder, calculating md5sum and push samples to lims-exporter queue takes extra
15m
for each flowcell.
Sequencing starts to samples ready for running pipleline takes 68h
.
Job | Started | Completed |
---|---|---|
RunStartTime (B) | 2024-12-13 15:22:38 | |
Instrument Analytics Cycle 1 - 320 (B) | 2024-12-13 21:32:07 | 2024-12-15 10:39:45 |
BCL CONVERSION (B) | 2024-12-14 09:20:26 | 2024-12-15 01:58:22 |
Post Run (B) | 2024-12-15 11:39:46 | 2024-12-15 13:31:14 |
RunEndTime (B) | 2024-12-15 13:35:06 | |
8 sample pipeline (msg_001) | 2024-12-15 02:14:40 | 2024-12-15 06:12:10 |
8 sample pipeline (msg_002) | 2024-12-15 02:16:03 | 2024-12-15 06:08:38 |
8 sample pipeline (msg_003) | 2024-12-15 02:21:34 | 2024-12-15 06:02:43 |
8 sample pipeline (msg_004) | 2024-12-15 02:30:31 | 2024-12-15 07:21:59 |
8 sample pipeline (msg_005) | 2024-12-15 06:02:54 | 2024-12-15 09:27:04 |
8 sample pipeline (msg_006) | 2024-12-15 06:09:01 | 2024-12-15 10:15:17 |
8 sample pipeline (msg_007) | 2024-12-15 06:12:43 | 2024-12-15 11:00:17 |
8 sample pipeline (msg_008) | 2024-12-15 07:22:41 | 2024-12-15 12:21:11 |
FastQC etc. report generation (B) | 2024-12-15 17:33:00 | 2024-12-15 20:27:00 |
Analysis file transfer to boston (B) | 2024-12-15 22:10:00 | 2024-12-16 01:10:00 |
NSC automation (md5sum+deliver) (B) | 2024-12-16 01:15:01 | 2024-12-16 01:22:02 |
Job | Started | Completed |
---|---|---|
RunStartTime (A) | 2024-12-13 15:22:39 | |
Instrument Analytics Cycle 1 - 320 (A) | 2024-12-13 21:32:38 | 2024-12-15 10:43:37 |
Post Run (A) | 2024-12-15 11:43:38 | 2024-12-15 13:35:06 |
RunEndTime (A) | 2024-12-15 13:35:06 | |
BCL CONVERSION (A) | 2024-12-14 09:23:44 | 2024-12-15 02:03:05 |
8 sample pipeline(msg_001) | 2024-12-15 12:35:29 | 2024-12-15 16:20:28 |
8 sample pipeline(msg_002) | 2024-12-15 12:41:54 | 2024-12-15 15:59:51 |
8 sample pipeline(msg_003) | 2024-12-15 12:46:44 | 2024-12-15 16:29:28 |
8 sample pipeline(msg_004) (waiting for board access ~3h) | 2024-12-15 12:57:58 | 2024-12-15 19:40:46 |
8 sample pipeline(msg_005) | 2024-12-15 15:59:59 | 2024-12-15 20:09:04 |
8 sample pipeline(msg_006) | 2024-12-15 16:21:01 | 2024-12-15 20:12:05 |
8 sample pipeline(msg_007) | 2024-12-15 16:29:32 | 2024-12-15 20:44:58 |
8 sample pipeline(msg_008) | 2024-12-15 19:40:54 | 2024-12-15 22:53:59 |
FastQC etc. report generation (A) | 2024-12-16 04:20:00 | 2024-12-16 07:00:00 |
Analysis file transfer to boston (A) | 2024-12-16 08:40:00 | 2024-12-16 11:40:00 |
NSC automation (md5sum+deliver) (A) | 2024-12-16 11:50:01 | 2024-12-16 11:56:58 |
This is a specific run on 2024-12-09.
2.3 Pipeline capacity evaluation
DRAGEN is “single-threaded”. The pipeline capacity is determined by the number of DRAGENs available. And the time needed to process a sample.
2.3.1 Pipeline real-time monitoring (1 or 2 NSC DRAGEN in use)
With 1 or 2 NSC DRAGENs in use, the pipeline can process up to 36
or 72
samples in 24 hours.
2.3.1.1 2024-11-10 (Sun)
2.3.1.2 2024-11-11 (Mon)
2.3.1.3 2024-11-12 (Tue)idle
No DRAGEN run on 2024-11-12.
2.3.1.4 2024-11-13 (Wed)
2.3.1.5 2024-11-14 (Thu)
2.3.1.6 2024-11-15 (Fri)idle
Just 2 DRAGEN runs on 2024-11-12.
2.3.1.7 2024-11-16 (Sat)
2.3.1.8 2024-11-17 (Sun)full
A full day of DRAGEN runs on 2024-11-17.
2.3.1.9 2024-11-18 (Mon)
2.3.1.10 2024-11-19 (Tue)idle
No DRAGEN run on 2024-11-12.
2.3.1.11 2024-11-20 (Wed)idle
No DRAGEN run on 2024-11-12.