-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow merge #78
Comments
This is not actually a merge, because you don't want to merge the outputs from To merge outputs assemblerflow already have some solutions like compiler channels. Despite I am familiar with the issue you raise here, I am also skeptical that the second pipeline string is more human readable than the first. |
I agreed that it's not actually a merge per se, and that's something we should discuss about implementing. Adding the behaviour to multiply the processes after a fork to all the lanes would prevent users from having to copy big chunks of processes to each lane. This is specially important if there are lots of processes after or if there are many lanes in the fork. |
As @tiagofilipe12 said, the compiler channels already handle cases of merging. However, I agree with @cimendes in that it is cumbersome to repeat some processes after the fork. I've checked the pipeline parser code, and I think it would be simple to implement the following syntax:
The EDIT: This could get tricky when there lanes with different processes, but the rule could be to repeat the last lane's processes:
|
I find @ODiogoSilva proposal a minimal simplification of what is implemented at the moment and it would only be advantageous if you implement a very long chain of processes after the fork.
With this syntax, I find it very intuitive that the CDE processes are repeated in all the lanes of the fork. I don't find the proposed alternative as intuitive
Plus, as @ODiogoSilva stated, the lanes with different processes aren't as clear with this syntax. in that canse my proposal would be something like
|
I am keen on implementing @ODiogoSilva syntax, because it doesn't break the current usage (it just extends it) and it doesn't get confounded with merge of processes from different lanes (within a fork). Merge is in fact a trickier question because you may have 3 lanes but just want to merge two of them. I am still unsure on how to handle that in a simple nomenclature as a pipeline string. But maybe a possible implementation would be something like this:
The default behaviour would be if no list of processes is given to So breaking into tasks:
What do you think? |
Closed in favor of #174 |
At the moment assemblerflow allows to fork processes but if you want to add another process after the fork, you have to add after each process in the fork. Example:
assemblerflow.py build -t "integrity_coverage fastqc_trimmomatic remove_host (spades card_rgi | metaspades card_rgi | megahit card_rgi)"
It's much more intuitive to instead do:
assemblerflow.py build -t "integrity_coverage fastqc_trimmomatic remove_host (spades | metaspades | megahit) card_rgi "
This requires adding a new operator merge to assemblerflow.
The text was updated successfully, but these errors were encountered: