Skip to content

User Experience

We believe data digestion should be automated, and it should be done in an user-friendly manner 🔑.


Easy to setup

In addition to installation, the setup of bioinformatics pipeline is another stop that every user has to endure before launching the beautiful data analysis journey. In context of processing CEL-Seq2 data, the setup means telling pipeline where to find the aligner index files to align reads, where to find the genome annotation for quantification, and what is the length of UMI or cell barcodes, etc. These are required information to perform data digestion.

Instead of having users specify them over and over again when launching pipeline, we aimed to bring an user experience where the setup needs done for once and once only. This is the motivation behind configuration file, and we believe it should be reusable for fixed type of species.

See "Configuration" for details.

In addition, here is an utility tool named MrY. It aimed to download and manage all the genome FASTA files, annotations (GTF/GFF) and furthermore create aligner index (Bowtie2 and STAR) in a painless way.

Easy to handle complexed experiment

We aimed to design an intuitive way for users to specify the experiments despite of its complexed layout.

A space/tab separated file called "Experiment table" is our solution. Each row specifies a set of CEL-Seq2 data. Filling blanks of each row by following a simple rule stated as below:

For input reads file X, claim that cells with barcode indexes from i to j come from experiment Y.

Take the experiment in "Quick Start" for example, CEL-Seq-pipeline required a "sample sheet" with 36 lines to define the experiment layout.

#id flocell series lane il_barcode cel_barcode project
1 C5BW1ACXX CE_TC L005 4 1 CE_1_1
2 C5BW1ACXX CE_TC L005 4 2 CE_1_2
3 C5BW1ACXX CE_TC L005 4 3 CE_1_3
4 C5BW1ACXX CE_TC L005 4 4 CE_1_4
5 C5BW1ACXX CE_TC L005 4 5 CE_1_5
6 C5BW1ACXX CE_TC L005 4 6 CE_1_6
7 C5BW1ACXX CE_TC L005 4 7 CE_1_7
8 C5BW1ACXX CE_TC L005 4 8 CE_1_8
9 C5BW1ACXX CE_TC L005 4 9 CE_1_9
1 C5BW1ACXX CE_TC L008 4 1 CE_1_1
2 C5BW1ACXX CE_TC L008 4 2 CE_1_2
3 C5BW1ACXX CE_TC L008 4 3 CE_1_3
4 C5BW1ACXX CE_TC L008 4 4 CE_1_4
5 C5BW1ACXX CE_TC L008 4 5 CE_1_5
6 C5BW1ACXX CE_TC L008 4 6 CE_1_6
7 C5BW1ACXX CE_TC L008 4 7 CE_1_7
8 C5BW1ACXX CE_TC L008 4 8 CE_1_8
9 C5BW1ACXX CE_TC L008 4 9 CE_1_9
10 C5BW1ACXX CE_TC L005 4 10 CE_2_1
11 C5BW1ACXX CE_TC L005 4 11 CE_2_2
12 C5BW1ACXX CE_TC L005 4 12 CE_2_3
13 C5BW1ACXX CE_TC L005 4 13 CE_2_4
14 C5BW1ACXX CE_TC L005 4 14 CE_2_5
15 C5BW1ACXX CE_TC L005 4 15 CE_2_6
16 C5BW1ACXX CE_TC L005 4 16 CE_2_7
17 C5BW1ACXX CE_TC L005 4 17 CE_2_8
18 C5BW1ACXX CE_TC L005 4 18 CE_2_9
10 C5BW1ACXX CE_TC L008 4 10 CE_2_1
11 C5BW1ACXX CE_TC L008 4 11 CE_2_2
12 C5BW1ACXX CE_TC L008 4 12 CE_2_3
13 C5BW1ACXX CE_TC L008 4 13 CE_2_4
14 C5BW1ACXX CE_TC L008 4 14 CE_2_5
15 C5BW1ACXX CE_TC L008 4 15 CE_2_6
16 C5BW1ACXX CE_TC L008 4 16 CE_2_7
17 C5BW1ACXX CE_TC L008 4 17 CE_2_8
18 C5BW1ACXX CE_TC L008 4 18 CE_2_9

On the contrary, user will find it only takes celseq2 4 lines to do the same, and done in much more intuitive manner.

SAMPLE_NAME CELL_BARCODES_INDEX R1 R2
CE_1 1-9 path/to/lane5-R1.fastq.gz path/to/lane5-R2.fastq.gz
CE_2 10-18 path/to/lane5-R1.fastq.gz path/to/lane5-R2.fastq.gz
CE_1 1-9 path/to/lane8-R1.fastq.gz path/to/lane8-R2.fastq.gz
CE_2 10-18 path/to/lane8-R1.fastq.gz path/to/lane8-R2.fastq.gz

See "Specify Experiment Table" for more instructions.

Easy to request resources

It is straightforward to run the pipeline of celseq2 by submitting jobs to cluster, as celseq2 is built on top of snakemake which is a powerful workflow management framework.

For example, user could run the following command to submit jobs to computing nodes. Here it submits 10 jobs in parallel with 50G of memory requested by each.

celseq2 --config-file /path/to/wonderful_CEL-Seq2_config.yaml \
    --experiment-table /path/to/wonderful_experiment_table.txt \
    --output-dir /path/to/result_dir \
    -j 10 \
    --cluster "qsub -cwd -j y -l h_vmem=50G" &