User Experience¶
We believe data digestion should be automated, and it should be done in an user-friendly manner .
Easy to setup¶
In addition to installation, the setup of bioinformatics pipeline is another stop that every user has to endure before launching the beautiful data analysis journey. In context of processing CEL-Seq2 data, the setup means telling pipeline where to find the aligner index files to align reads, where to find the genome annotation for quantification, and what is the length of UMI or cell barcodes, etc. These are required information to perform data digestion.
Instead of having users specify them over and over again when launching pipeline, we aimed to bring an user experience where the setup needs done for once and once only. This is the motivation behind configuration file, and we believe it should be reusable for fixed type of species.
See "Configuration" for details.
In addition, here is an utility tool named
MrY
. It aimed to download and manage all the
genome FASTA files, annotations (GTF/GFF) and furthermore create aligner index
(Bowtie2 and STAR) in a painless way.
Easy to handle complexed experiment¶
We aimed to design an intuitive way for users to specify the experiments despite of its complexed layout.
A space/tab separated file called "Experiment table" is our solution. Each row specifies a set of CEL-Seq2 data. Filling blanks of each row by following a simple rule stated as below:
For input reads file X, claim that cells with barcode indexes from i to j come from experiment Y.
Take the experiment in "Quick Start" for example,
CEL-Seq-pipeline
required a "sample
sheet"
with 36 lines to define the experiment layout.
#id | flocell | series | lane | il_barcode | cel_barcode | project |
---|---|---|---|---|---|---|
1 | C5BW1ACXX | CE_TC | L005 | 4 | 1 | CE_1_1 |
2 | C5BW1ACXX | CE_TC | L005 | 4 | 2 | CE_1_2 |
3 | C5BW1ACXX | CE_TC | L005 | 4 | 3 | CE_1_3 |
4 | C5BW1ACXX | CE_TC | L005 | 4 | 4 | CE_1_4 |
5 | C5BW1ACXX | CE_TC | L005 | 4 | 5 | CE_1_5 |
6 | C5BW1ACXX | CE_TC | L005 | 4 | 6 | CE_1_6 |
7 | C5BW1ACXX | CE_TC | L005 | 4 | 7 | CE_1_7 |
8 | C5BW1ACXX | CE_TC | L005 | 4 | 8 | CE_1_8 |
9 | C5BW1ACXX | CE_TC | L005 | 4 | 9 | CE_1_9 |
1 | C5BW1ACXX | CE_TC | L008 | 4 | 1 | CE_1_1 |
2 | C5BW1ACXX | CE_TC | L008 | 4 | 2 | CE_1_2 |
3 | C5BW1ACXX | CE_TC | L008 | 4 | 3 | CE_1_3 |
4 | C5BW1ACXX | CE_TC | L008 | 4 | 4 | CE_1_4 |
5 | C5BW1ACXX | CE_TC | L008 | 4 | 5 | CE_1_5 |
6 | C5BW1ACXX | CE_TC | L008 | 4 | 6 | CE_1_6 |
7 | C5BW1ACXX | CE_TC | L008 | 4 | 7 | CE_1_7 |
8 | C5BW1ACXX | CE_TC | L008 | 4 | 8 | CE_1_8 |
9 | C5BW1ACXX | CE_TC | L008 | 4 | 9 | CE_1_9 |
10 | C5BW1ACXX | CE_TC | L005 | 4 | 10 | CE_2_1 |
11 | C5BW1ACXX | CE_TC | L005 | 4 | 11 | CE_2_2 |
12 | C5BW1ACXX | CE_TC | L005 | 4 | 12 | CE_2_3 |
13 | C5BW1ACXX | CE_TC | L005 | 4 | 13 | CE_2_4 |
14 | C5BW1ACXX | CE_TC | L005 | 4 | 14 | CE_2_5 |
15 | C5BW1ACXX | CE_TC | L005 | 4 | 15 | CE_2_6 |
16 | C5BW1ACXX | CE_TC | L005 | 4 | 16 | CE_2_7 |
17 | C5BW1ACXX | CE_TC | L005 | 4 | 17 | CE_2_8 |
18 | C5BW1ACXX | CE_TC | L005 | 4 | 18 | CE_2_9 |
10 | C5BW1ACXX | CE_TC | L008 | 4 | 10 | CE_2_1 |
11 | C5BW1ACXX | CE_TC | L008 | 4 | 11 | CE_2_2 |
12 | C5BW1ACXX | CE_TC | L008 | 4 | 12 | CE_2_3 |
13 | C5BW1ACXX | CE_TC | L008 | 4 | 13 | CE_2_4 |
14 | C5BW1ACXX | CE_TC | L008 | 4 | 14 | CE_2_5 |
15 | C5BW1ACXX | CE_TC | L008 | 4 | 15 | CE_2_6 |
16 | C5BW1ACXX | CE_TC | L008 | 4 | 16 | CE_2_7 |
17 | C5BW1ACXX | CE_TC | L008 | 4 | 17 | CE_2_8 |
18 | C5BW1ACXX | CE_TC | L008 | 4 | 18 | CE_2_9 |
On the contrary, user will find it only takes celseq2
4 lines to do the same,
and done in much more intuitive manner.
SAMPLE_NAME | CELL_BARCODES_INDEX | R1 | R2 |
---|---|---|---|
CE_1 | 1-9 | path/to/lane5-R1.fastq.gz | path/to/lane5-R2.fastq.gz |
CE_2 | 10-18 | path/to/lane5-R1.fastq.gz | path/to/lane5-R2.fastq.gz |
CE_1 | 1-9 | path/to/lane8-R1.fastq.gz | path/to/lane8-R2.fastq.gz |
CE_2 | 10-18 | path/to/lane8-R1.fastq.gz | path/to/lane8-R2.fastq.gz |
See "Specify Experiment Table" for more instructions.
Easy to request resources¶
It is straightforward to run the pipeline of celseq2
by submitting jobs to
cluster, as celseq2
is built on top of snakemake
which is a powerful workflow
management framework.
For example, user could run the following command to submit jobs to computing nodes. Here it submits 10 jobs in parallel with 50G of memory requested by each.
celseq2 --config-file /path/to/wonderful_CEL-Seq2_config.yaml \ --experiment-table /path/to/wonderful_experiment_table.txt \ --output-dir /path/to/result_dir \ -j 10 \ --cluster "qsub -cwd -j y -l h_vmem=50G" &