Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter out irrevelent KEGG pathways #492

Open
Azurelan35 opened this issue Dec 13, 2023 · 4 comments
Open

filter out irrevelent KEGG pathways #492

Azurelan35 opened this issue Dec 13, 2023 · 4 comments

Comments

@Azurelan35
Copy link

Hey everyone,
I am a beginner in bioinformatics, and I am currently working on the annotation results from eggNOG-mapper (emapper-2.1.9) to conduct an enrichment analysis with the <KEGG_ko> information from the list for a Lepidoptera species. It turns out there are many irrelevant pathways in my results, even after I only retrieved the results of <3SI2N@50557|Insecta>based on the <eggNOG_OGs> column, for example, I have some human/plant pathways...
I wonder if there are any solutions to filter out these irrelevant pathways?
Any suggestions will be appreciated, thx!
Lan

@Cantalapiedra
Copy link
Collaborator

Hi @Azurelan35 ,

Maybe not the answer you are looking for, but you may need to filter those pathways using information from the KEGG database, or you could re-annotate your sequences using --target_taxa or --excluded_taxa (check https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.12#annotation-options). Also --tax_scope could be of help here I guess.

Best,
Carlos

@Azurelan35
Copy link
Author

Hi @Azurelan35 ,

Maybe not the answer you are looking for, but you may need to filter those pathways using information from the KEGG database, or you could re-annotate your sequences using --target_taxa or --excluded_taxa (check https://github.com/eggnogdb/eggnog-mapper/wiki/eggNOG-mapper-v2.1.5-to-v2.1.12#annotation-options). Also --tax_scope could be of help here I guess.

Best, Carlos

Hi Carlos,
Thanks for your reply. I will have a look at the link and try to re-annotate again.
Best,
Lan

@rwitkowski3
Copy link

Hi @Cantalapiedra ,

I have the same issue--I am annotating a de novo transcriptome for a plant in the Asterales. My gene annotations appear to be totally within the plants, but there are many KEGG pathway annotation terms in human disease and vertebrate biology. How can I prevent eggNOG-mapper from annotating with such KEGG pathway terms? I've generated a DIAMOND database for the Viridiplantae and limited my search and annotation by plant taxonomy only: 71274,35493,33090 (asterids, Streptophyta, Viridiplantae)

My script is below:

emapper.py --dbmem
-m diamond
--dmnd_db ./data/viridiplantae.dmnd
--sensmode more-sensitive
--cpu 0 -i iter2-nov25.Trinity.fasta.transdecoder.pep
--tax_scope 71274,35493,33090
--tax_scope_mode inner_narrowest
--target_taxa 71274,35493,33090
-o virid_taxscope-virid_iter2-diamonddb

  1. Am I misinterpreting the manual's explanation of --tax_scope and --target_taxa? Would different target_taxa terms limit my annotations to exclude vertebrate KEGG Orthogroups/human disease KEGG pathways?
  2. Is this KEGG pathway annotation issue due to electronically-predicted, homology-based pathway annotation (i.e., would "experimental-only" annotations improve this, at the cost of filtering out some true positives?)?

@Cantalapiedra
Copy link
Collaborator

Hi @rwitkowski3 ,

Sorry for the delay answering.
I am not sure if you finally tested this using tax_scope and target_taxa. Did you get also human/vertebrage terms using those parameters?

In my understanding, using tax_scope should narrow the OGs being selected to create the group of orthologs from which to obtain annotations, and target_taxa should directly limit the annotations sources from the specified taxa.

Yes, this would limit the number of annotation terms that you may obtain, since you are reducing the chances to obtain annotations from more divergent proteins. But if those annotations do not make sense, I guess it is worth it.

I don't understand your second question very well. My guess is that it is due to how broad are some eggNOG OGs. If an OG comprises not only plant but also vertebrate proteins, if those proteins are identified as orthologs, the annotations will be transferred from all of them (unless it is limited with tax_sope and target_taxa, as already discussed, and hopefully as it should work).

Best,
Carlos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants