-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing Genes in GeneSummaries File #1154
Comments
Thanks for the report! On first glance, it appears to me that this is related to the rollout of our new Fusion data model. Spot checking several of the Genes you listed, I see that they no longer have any variants associated with them. For instance the Gene RASGRF1 has no direct variants under it any more, but there are new Fusion Features (OCLN::RASGRF1, IQGAP1::RASGRF1, SLC4A4::RASGRF1) which have associated variants and evidence items. The default behavior of the TSV exports is to only export Genes that have at least one variant, molecular profile, and evidence item associated with it, but in these cases it appears all of the associated variants were in fact fusion variants that have since been moved. We probably need to introduce a new FeatureSummaries.tsv file that includes all feature types (Genes, Fusions, and Factors) so that it can be comprehensive. We may also be able to introduce some heuristic to include Genes in the GeneSummaries.tsv that have curated summaries, sources, etc, or that are included in Fusion Features. We will get a fix out for this in the next release and I'll follow up here! |
We have introduced a new FeatureSummaries TSV (GeneSummaries is aliased to it so either link format will continue to work). This TSV contains not just Gene Features but all Feature types including Fusions and Factors. This should restore the missing genes. The new format is already live in the nightly releases and will take effect for the stable releases when the next one is generated. Apologies for taking so long to follow up here, things really slowed down through the holidays. |
Over the past few months many genes have been dropped from the GeneSummaries.tsv file on the CIViC Data Releases page.
The following genes were in the September release, but were missing from the October release:
BEND2, CBFA2T3, CBFB, CREB3L1, CREB3L2, DDIT3, DEK, DGKH, DUX4, FLI1, FUS, GLI1, HMGA2, IL2RB, MAML2, MAP3K8, MNX1, NCOA2, NFATC2, NUP214, NUP98, NUTM1, PDGFD, PRKACA, PTK2B, SH3PXD2A, SSX1, SSX2, SSX4, TLX3, WWTR1, ZFTA, ZNF384
Five additional genes were subsequently dropped in the November release:
KAT6A, RASGRF1, RBM15, VGLL2, YWHAE
These genes can still be looked up using the website which leads me to suspect that these genes were removed erroneously.
The text was updated successfully, but these errors were encountered: