diff --git a/notebooks/getting_started/exploring_clinical_data.ipynb b/notebooks/getting_started/exploring_clinical_data.ipynb deleted file mode 100644 index a7856191..00000000 --- a/notebooks/getting_started/exploring_clinical_data.ipynb +++ /dev/null @@ -1,2171 +0,0 @@ -{ - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [], - "include_colab_link": true - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } - }, - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "view-in-github", - "colab_type": "text" - }, - "source": [ - "" - ] - }, - { - "cell_type": "markdown", - "source": [ - "# Working with IDC clinical data without BigQuery\n", - "\n", - "In this notebook we cover the basics of how you can access and search IDC clinical data without depending on Google BigQuery.\n", - "\n", - "In addition to maintaining clinical data in Google BigQuery tables, we also export those in Parquet format into a public cloud-based storage bucket. Those files are free to download, and are rather small (as of IDC v18, less than 65MB altogether).\n", - "\n", - "Once downloaded, you can search the content using Pandas sytax of SQL.\n", - "\n", - "This brief notebook will guide you through the steps of the above.\n", - "\n", - "If you have never worked with IDC before, we recommend you first complete the getting started tutorial [here](https://github.com/ImagingDataCommons/IDC-Tutorials/blob/master/notebooks/getting_started/part2_searching_basics.ipynb).\n", - "\n", - "---\n", - "Initial version: Jul 2024\n", - "\n", - "Updated: Aug 2024" - ], - "metadata": { - "id": "RVHEoPZJgVbl" - } - }, - { - "cell_type": "markdown", - "source": [ - "## Prerequisites\n", - "\n", - "The only prerequisite is [`idc-index`](https://github.com/ImagingDataCommons/idc-index) - python package that contains various utilities to simplify access to IDC data. As part of this package installation, you will get several other packages that we will use later:\n", - "* `s5cmd` for very efficient download of data from cloud buckets using S3 API\n", - "* `pandas` for dataframe operations\n", - "* `duckdb` for querying pandas dataframes using SQL syntax" - ], - "metadata": { - "id": "t-tzpK4DakEP" - } - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "id": "k8B1uiZkYlHu" - }, - "outputs": [], - "source": [ - "%%capture\n", - "!pip install --upgrade idc-index" - ] - }, - { - "cell_type": "markdown", - "source": [ - "## Fetch clinical data index\n", - "\n", - "`idc-index` packages various tables with the key metadata. We refer to those as _indices_. The main index that supports API calls related to download and search is installed by default. To support search of the clinical data accompanying IDC images you will need the `clinical_index` table, which contains the list of all columns and all tables across all of the IDC collections that are available." - ], - "metadata": { - "id": "neM2YGeQamIu" - } - }, - { - "cell_type": "code", - "source": [ - "from idc_index import index\n", - "\n", - "c = index.IDCClient()\n", - "\n", - "c.fetch_index('clinical_index')\n", - "\n", - "print('Columns avaialable in clinical_index:\\n'+'\\n'.join(c.clinical_index.keys()))" - ], - "metadata": { - "id": "ehb4MeuPYy4c", - "outputId": "75fbb6b4-edb1-4598-8aa9-1753300ac193", - "colab": { - "base_uri": "https://localhost:8080/" - } - }, - "execution_count": 2, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Columns avaialable in clinical_index:\n", - "collection_id\n", - "case_col\n", - "table_name\n", - "column\n", - "column_label\n", - "data_type\n", - "original_column_headers\n", - "values\n", - "values_source\n", - "files\n", - "sheet_names\n", - "batch\n", - "column_numbers\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "# Find all clinical metadata available for the specific collection\n", - "\n", - "A common use case is to find all clinical data available for a specific IDC collection.\n", - "\n", - "The key columns of this dataframe are:\n", - "* `collection_id`: which collection given metadata attribute corresponds to\n", - "* `table_name`: the name of the table where this metadata attribute is located\n", - "* `column`: name of the column (attribute)\n", - "\n", - "Depending on the specific attribute and how it was provided/documented by the submitter, you may find more information about it in the `column_label` column.\n", - "\n", - "Let's assume we are interested in the clinical data accompanying the `rms_mutation_prediction` collection. We can select all clinical data attributes that are available for this collection as shown next." - ], - "metadata": { - "id": "DvIHHAg7ao8F" - } - }, - { - "cell_type": "code", - "source": [ - "# define the query that selects all rows where collection_id is 'rms_mutation_prediction'\n", - "# note that we can refer to clinical_index table in the query\n", - "query = \"\"\"\n", - "SELECT *\n", - "FROM clinical_index\n", - "WHERE collection_id = 'rms_mutation_prediction'\n", - "\"\"\"\n", - "\n", - "# execute the query\n", - "matching_items = c.sql_query(query)\n" - ], - "metadata": { - "id": "L6rMuMEHjiML" - }, - "execution_count": 5, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "matching_items" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 1000 - }, - "id": "X5jfORxlaQ-Q", - "outputId": "e10b214d-0550-413b-fdbb-991074482644" - }, - "execution_count": 6, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - " collection_id case_col \\\n", - "0 rms_mutation_prediction False \n", - "1 rms_mutation_prediction False \n", - "2 rms_mutation_prediction True \n", - "3 rms_mutation_prediction False \n", - "4 rms_mutation_prediction False \n", - "5 rms_mutation_prediction False \n", - "6 rms_mutation_prediction False \n", - "7 rms_mutation_prediction False \n", - "8 rms_mutation_prediction False \n", - "9 rms_mutation_prediction False \n", - "10 rms_mutation_prediction True \n", - "11 rms_mutation_prediction True \n", - "12 rms_mutation_prediction False \n", - "13 rms_mutation_prediction False \n", - "14 rms_mutation_prediction False \n", - "15 rms_mutation_prediction False \n", - "16 rms_mutation_prediction False \n", - "17 rms_mutation_prediction False \n", - "18 rms_mutation_prediction False \n", - "19 rms_mutation_prediction False \n", - "20 rms_mutation_prediction False \n", - "21 rms_mutation_prediction False \n", - "22 rms_mutation_prediction False \n", - "23 rms_mutation_prediction False \n", - "24 rms_mutation_prediction False \n", - "25 rms_mutation_prediction False \n", - "26 rms_mutation_prediction False \n", - "27 rms_mutation_prediction False \n", - "28 rms_mutation_prediction False \n", - "29 rms_mutation_prediction False \n", - "30 rms_mutation_prediction False \n", - "31 rms_mutation_prediction False \n", - "32 rms_mutation_prediction False \n", - "33 rms_mutation_prediction False \n", - "34 rms_mutation_prediction False \n", - "35 rms_mutation_prediction False \n", - "36 rms_mutation_prediction False \n", - "37 rms_mutation_prediction False \n", - "38 rms_mutation_prediction False \n", - "\n", - " table_name \\\n", - "0 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "1 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "2 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "3 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "4 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "5 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "6 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "7 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "8 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "9 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "10 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "11 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "12 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "13 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "14 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "15 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "16 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "17 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "18 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "19 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "20 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "21 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "22 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "23 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "24 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "25 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "26 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "27 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "28 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "29 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "30 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "31 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "32 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "33 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "34 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "35 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "36 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "37 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "38 bigquery-public-data.idc_v18_clinical.rms_muta... \n", - "\n", - " column column_label \\\n", - "0 sample_id sample_id \n", - "1 primary_site primary_site \n", - "2 participant_id participant_id \n", - "3 age_at_diagnosis age_at_diagnosis \n", - "4 dicom_patient_id idc_provenance_dicom_patient_id \n", - "5 dicom_patient_id idc_provenance_dicom_patient_id \n", - "6 dicom_patient_id idc_provenance_dicom_patient_id \n", - "7 days_to_recurrence days_to_recurrence \n", - "8 sample_anatomic_site sample_anatomic_site \n", - "9 days_to_last_followup days_to_last_followup \n", - "10 participantparticipant_id participant.participant_id \n", - "11 participantparticipant_id participant.participant_id \n", - "12 participant_age_at_collection participant_age_at_collection \n", - "13 tumor_grade tumor_grade \n", - "14 diagnosis_id diagnosis_id \n", - "15 tumor_morphology tumor_morphology \n", - "16 sample_description sample_description \n", - "17 tumor_incidence_type tumor_incidence_type \n", - "18 tumor_stage_clinical_m tumor_stage_clinical_m \n", - "19 tumor_stage_clinical_n tumor_stage_clinical_n \n", - "20 tumor_stage_clinical_t tumor_stage_clinical_t \n", - "21 progression_or_recurrence progression_or_recurrence \n", - "22 tissue_or_organ_of_origin tissue_or_organ_of_origin \n", - "23 site_of_resection_or_biopsy site_of_resection_or_biopsy \n", - "24 days_to_last_known_disease_status days_to_last_known_disease_status \n", - "25 primary_diagnosis_reference_source primary_diagnosis_reference_source \n", - "26 stage Stage \n", - "27 last_known_disease_status last_known_disease_status \n", - "28 metastasis_at_diagnosis Metastasis_at_diagnosis \n", - "29 histological_classification Histological_Classification \n", - "30 race race \n", - "31 source_batch idc_provenance_source_batch \n", - "32 source_batch idc_provenance_source_batch \n", - "33 source_batch idc_provenance_source_batch \n", - "34 sample_type sample_type \n", - "35 sample_tumor_status sample_tumor_status \n", - "36 gender gender \n", - "37 primary_diagnosis primary_diagnosis \n", - "38 disease_type disease_type \n", - "\n", - " data_type original_column_headers \\\n", - "0 String [['sample_id']] \n", - "1 String [['primary_site']] \n", - "2 String [['participant_id']] \n", - "3 float64 [['age_at_diagnosis']] \n", - "4 String [['idc_provenance_dicom_patient_id']] \n", - "5 String [['idc_provenance_dicom_patient_id']] \n", - "6 String [['idc_provenance_dicom_patient_id']] \n", - "7 String [['days_to_recurrence']] \n", - "8 String [['sample_anatomic_site']] \n", - "9 String [['days_to_last_followup']] \n", - "10 String [['participant.participant_id']] \n", - "11 String [['participant.participant_id']] \n", - "12 float64 [['participant_age_at_collection']] \n", - "13 String [['tumor_grade']] \n", - "14 String [['diagnosis_id']] \n", - "15 String [['tumor_morphology']] \n", - "16 String [['sample_description']] \n", - "17 String [['tumor_incidence_type']] \n", - "18 String [['tumor_stage_clinical_m']] \n", - "19 String [['tumor_stage_clinical_n']] \n", - "20 String [['tumor_stage_clinical_t']] \n", - "21 String [['progression_or_recurrence']] \n", - "22 String [['tissue_or_organ_of_origin']] \n", - "23 String [['site_of_resection_or_biopsy']] \n", - "24 String [['days_to_last_known_disease_status']] \n", - "25 String [['primary_diagnosis_reference_source']] \n", - "26 String [['Stage']] \n", - "27 String [['last_known_disease_status']] \n", - "28 String [['Metastasis_at_diagnosis']] \n", - "29 String [['Histological_Classification']] \n", - "30 String [['race']] \n", - "31 int64 [['idc_provenance_source_batch']] \n", - "32 int64 [['idc_provenance_source_batch']] \n", - "33 int64 [['idc_provenance_source_batch']] \n", - "34 String [['sample_type']] \n", - "35 String [['sample_tumor_status']] \n", - "36 String [['gender']] \n", - "37 String [['primary_diagnosis']] \n", - "38 String [['disease_type']] \n", - "\n", - " values \\\n", - "0 [] \n", - "1 [] \n", - "2 [] \n", - "3 [] \n", - "4 [] \n", - "5 [] \n", - "6 [] \n", - "7 [] \n", - "8 [] \n", - "9 [] \n", - "10 [] \n", - "11 [] \n", - "12 [] \n", - "13 [{'option_code': '', 'option_description': None}] \n", - "14 [{'option_code': '', 'option_description': None}] \n", - "15 [{'option_code': '', 'option_description': None}] \n", - "16 [{'option_code': '', 'option_description': None}] \n", - "17 [{'option_code': '', 'option_description': None}] \n", - "18 [{'option_code': '', 'option_description': None}] \n", - "19 [{'option_code': '', 'option_description': None}] \n", - "20 [{'option_code': '', 'option_description': None}] \n", - "21 [{'option_code': '', 'option_description': None}] \n", - "22 [{'option_code': '', 'option_description': None}] \n", - "23 [{'option_code': '', 'option_description': None}] \n", - "24 [{'option_code': '', 'option_description': None}] \n", - "25 [{'option_code': '', 'option_description': None}] \n", - "26 [{'option_code': '', 'option_description': Non... \n", - "27 [{'option_code': '', 'option_description': Non... \n", - "28 [{'option_code': '', 'option_description': Non... \n", - "29 [{'option_code': '', 'option_description': Non... \n", - "30 [{'option_code': '', 'option_description': Non... \n", - "31 [{'option_code': '0', 'option_description': No... \n", - "32 [{'option_code': '0', 'option_description': No... \n", - "33 [{'option_code': '0', 'option_description': No... \n", - "34 [{'option_code': 'Tumor', 'option_description'... \n", - "35 [{'option_code': 'Tumor', 'option_description'... \n", - "36 [{'option_code': 'Female', 'option_description... \n", - "37 [{'option_code': 'Rhabdomyosarcoma', 'option_d... \n", - "38 [{'option_code': 'Soft Tissue Tumors and Sarco... \n", - "\n", - " values_source \\\n", - "0 None \n", - "1 None \n", - "2 None \n", - "3 None \n", - "4 None \n", - "5 None \n", - "6 None \n", - "7 None \n", - "8 None \n", - "9 None \n", - "10 None \n", - "11 None \n", - "12 None \n", - "13 derived from inspection of values \n", - "14 derived from inspection of values \n", - "15 derived from inspection of values \n", - "16 derived from inspection of values \n", - "17 derived from inspection of values \n", - "18 derived from inspection of values \n", - "19 derived from inspection of values \n", - "20 derived from inspection of values \n", - "21 derived from inspection of values \n", - "22 derived from inspection of values \n", - "23 derived from inspection of values \n", - "24 derived from inspection of values \n", - "25 derived from inspection of values \n", - "26 derived from inspection of values \n", - "27 derived from inspection of values \n", - "28 derived from inspection of values \n", - "29 derived from inspection of values \n", - "30 derived from inspection of values \n", - "31 derived from inspection of values \n", - "32 derived from inspection of values \n", - "33 derived from inspection of values \n", - "34 derived from inspection of values \n", - "35 derived from inspection of values \n", - "36 derived from inspection of values \n", - "37 derived from inspection of values \n", - "38 derived from inspection of values \n", - "\n", - " files sheet_names batch \\\n", - "0 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [sample] [0] \n", - "1 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "2 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [participant] [0] \n", - "3 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "4 [] [] [] \n", - "5 [] [] [] \n", - "6 [] [] [] \n", - "7 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "8 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [sample] [0] \n", - "9 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "10 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "11 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [sample] [0] \n", - "12 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [sample] [0] \n", - "13 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [sample] [0] \n", - "14 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "15 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [sample] [0] \n", - "16 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [sample] [0] \n", - "17 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [sample] [0] \n", - "18 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [sample] [0] \n", - "19 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [sample] [0] \n", - "20 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [sample] [0] \n", - "21 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "22 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "23 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "24 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "25 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "26 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "27 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "28 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "29 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [sample] [0] \n", - "30 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [participant] [0] \n", - "31 [] [] [] \n", - "32 [] [] [] \n", - "33 [] [] [] \n", - "34 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [sample] [0] \n", - "35 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [sample] [0] \n", - "36 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [participant] [0] \n", - "37 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "38 [CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... [diagnosis] [0] \n", - "\n", - " column_numbers \n", - "0 [1] \n", - "1 [5] \n", - "2 [0] \n", - "3 [6] \n", - "4 [] \n", - "5 [] \n", - "6 [] \n", - "7 [9] \n", - "8 [3] \n", - "9 [13] \n", - "10 [0] \n", - "11 [0] \n", - "12 [4] \n", - "13 [6] \n", - "14 [1] \n", - "15 [10] \n", - "16 [12] \n", - "17 [11] \n", - "18 [9] \n", - "19 [8] \n", - "20 [7] \n", - "21 [14] \n", - "22 [12] \n", - "23 [15] \n", - "24 [11] \n", - "25 [4] \n", - "26 [8] \n", - "27 [10] \n", - "28 [7] \n", - "29 [5] \n", - "30 [1] \n", - "31 [] \n", - "32 [] \n", - "33 [] \n", - "34 [2] \n", - "35 [13] \n", - "36 [2] \n", - "37 [3] \n", - "38 [2] " - ], - "text/html": [ - "\n", - "
\n", - " | collection_id | \n", - "case_col | \n", - "table_name | \n", - "column | \n", - "column_label | \n", - "data_type | \n", - "original_column_headers | \n", - "values | \n", - "values_source | \n", - "files | \n", - "sheet_names | \n", - "batch | \n", - "column_numbers | \n", - "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "sample_id | \n", - "sample_id | \n", - "String | \n", - "[['sample_id']] | \n", - "[] | \n", - "None | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[sample] | \n", - "[0] | \n", - "[1] | \n", - "
1 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "primary_site | \n", - "primary_site | \n", - "String | \n", - "[['primary_site']] | \n", - "[] | \n", - "None | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[5] | \n", - "
2 | \n", - "rms_mutation_prediction | \n", - "True | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "participant_id | \n", - "participant_id | \n", - "String | \n", - "[['participant_id']] | \n", - "[] | \n", - "None | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[participant] | \n", - "[0] | \n", - "[0] | \n", - "
3 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "age_at_diagnosis | \n", - "age_at_diagnosis | \n", - "float64 | \n", - "[['age_at_diagnosis']] | \n", - "[] | \n", - "None | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[6] | \n", - "
4 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "dicom_patient_id | \n", - "idc_provenance_dicom_patient_id | \n", - "String | \n", - "[['idc_provenance_dicom_patient_id']] | \n", - "[] | \n", - "None | \n", - "[] | \n", - "[] | \n", - "[] | \n", - "[] | \n", - "
5 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "dicom_patient_id | \n", - "idc_provenance_dicom_patient_id | \n", - "String | \n", - "[['idc_provenance_dicom_patient_id']] | \n", - "[] | \n", - "None | \n", - "[] | \n", - "[] | \n", - "[] | \n", - "[] | \n", - "
6 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "dicom_patient_id | \n", - "idc_provenance_dicom_patient_id | \n", - "String | \n", - "[['idc_provenance_dicom_patient_id']] | \n", - "[] | \n", - "None | \n", - "[] | \n", - "[] | \n", - "[] | \n", - "[] | \n", - "
7 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "days_to_recurrence | \n", - "days_to_recurrence | \n", - "String | \n", - "[['days_to_recurrence']] | \n", - "[] | \n", - "None | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[9] | \n", - "
8 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "sample_anatomic_site | \n", - "sample_anatomic_site | \n", - "String | \n", - "[['sample_anatomic_site']] | \n", - "[] | \n", - "None | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[sample] | \n", - "[0] | \n", - "[3] | \n", - "
9 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "days_to_last_followup | \n", - "days_to_last_followup | \n", - "String | \n", - "[['days_to_last_followup']] | \n", - "[] | \n", - "None | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[13] | \n", - "
10 | \n", - "rms_mutation_prediction | \n", - "True | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "participantparticipant_id | \n", - "participant.participant_id | \n", - "String | \n", - "[['participant.participant_id']] | \n", - "[] | \n", - "None | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[0] | \n", - "
11 | \n", - "rms_mutation_prediction | \n", - "True | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "participantparticipant_id | \n", - "participant.participant_id | \n", - "String | \n", - "[['participant.participant_id']] | \n", - "[] | \n", - "None | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[sample] | \n", - "[0] | \n", - "[0] | \n", - "
12 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "participant_age_at_collection | \n", - "participant_age_at_collection | \n", - "float64 | \n", - "[['participant_age_at_collection']] | \n", - "[] | \n", - "None | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[sample] | \n", - "[0] | \n", - "[4] | \n", - "
13 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "tumor_grade | \n", - "tumor_grade | \n", - "String | \n", - "[['tumor_grade']] | \n", - "[{'option_code': '', 'option_description': None}] | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[sample] | \n", - "[0] | \n", - "[6] | \n", - "
14 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "diagnosis_id | \n", - "diagnosis_id | \n", - "String | \n", - "[['diagnosis_id']] | \n", - "[{'option_code': '', 'option_description': None}] | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[1] | \n", - "
15 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "tumor_morphology | \n", - "tumor_morphology | \n", - "String | \n", - "[['tumor_morphology']] | \n", - "[{'option_code': '', 'option_description': None}] | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[sample] | \n", - "[0] | \n", - "[10] | \n", - "
16 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "sample_description | \n", - "sample_description | \n", - "String | \n", - "[['sample_description']] | \n", - "[{'option_code': '', 'option_description': None}] | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[sample] | \n", - "[0] | \n", - "[12] | \n", - "
17 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "tumor_incidence_type | \n", - "tumor_incidence_type | \n", - "String | \n", - "[['tumor_incidence_type']] | \n", - "[{'option_code': '', 'option_description': None}] | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[sample] | \n", - "[0] | \n", - "[11] | \n", - "
18 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "tumor_stage_clinical_m | \n", - "tumor_stage_clinical_m | \n", - "String | \n", - "[['tumor_stage_clinical_m']] | \n", - "[{'option_code': '', 'option_description': None}] | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[sample] | \n", - "[0] | \n", - "[9] | \n", - "
19 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "tumor_stage_clinical_n | \n", - "tumor_stage_clinical_n | \n", - "String | \n", - "[['tumor_stage_clinical_n']] | \n", - "[{'option_code': '', 'option_description': None}] | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[sample] | \n", - "[0] | \n", - "[8] | \n", - "
20 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "tumor_stage_clinical_t | \n", - "tumor_stage_clinical_t | \n", - "String | \n", - "[['tumor_stage_clinical_t']] | \n", - "[{'option_code': '', 'option_description': None}] | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[sample] | \n", - "[0] | \n", - "[7] | \n", - "
21 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "progression_or_recurrence | \n", - "progression_or_recurrence | \n", - "String | \n", - "[['progression_or_recurrence']] | \n", - "[{'option_code': '', 'option_description': None}] | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[14] | \n", - "
22 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "tissue_or_organ_of_origin | \n", - "tissue_or_organ_of_origin | \n", - "String | \n", - "[['tissue_or_organ_of_origin']] | \n", - "[{'option_code': '', 'option_description': None}] | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[12] | \n", - "
23 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "site_of_resection_or_biopsy | \n", - "site_of_resection_or_biopsy | \n", - "String | \n", - "[['site_of_resection_or_biopsy']] | \n", - "[{'option_code': '', 'option_description': None}] | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[15] | \n", - "
24 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "days_to_last_known_disease_status | \n", - "days_to_last_known_disease_status | \n", - "String | \n", - "[['days_to_last_known_disease_status']] | \n", - "[{'option_code': '', 'option_description': None}] | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[11] | \n", - "
25 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "primary_diagnosis_reference_source | \n", - "primary_diagnosis_reference_source | \n", - "String | \n", - "[['primary_diagnosis_reference_source']] | \n", - "[{'option_code': '', 'option_description': None}] | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[4] | \n", - "
26 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "stage | \n", - "Stage | \n", - "String | \n", - "[['Stage']] | \n", - "[{'option_code': '', 'option_description': Non... | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[8] | \n", - "
27 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "last_known_disease_status | \n", - "last_known_disease_status | \n", - "String | \n", - "[['last_known_disease_status']] | \n", - "[{'option_code': '', 'option_description': Non... | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[10] | \n", - "
28 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "metastasis_at_diagnosis | \n", - "Metastasis_at_diagnosis | \n", - "String | \n", - "[['Metastasis_at_diagnosis']] | \n", - "[{'option_code': '', 'option_description': Non... | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[7] | \n", - "
29 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "histological_classification | \n", - "Histological_Classification | \n", - "String | \n", - "[['Histological_Classification']] | \n", - "[{'option_code': '', 'option_description': Non... | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[sample] | \n", - "[0] | \n", - "[5] | \n", - "
30 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "race | \n", - "race | \n", - "String | \n", - "[['race']] | \n", - "[{'option_code': '', 'option_description': Non... | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[participant] | \n", - "[0] | \n", - "[1] | \n", - "
31 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "source_batch | \n", - "idc_provenance_source_batch | \n", - "int64 | \n", - "[['idc_provenance_source_batch']] | \n", - "[{'option_code': '0', 'option_description': No... | \n", - "derived from inspection of values | \n", - "[] | \n", - "[] | \n", - "[] | \n", - "[] | \n", - "
32 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "source_batch | \n", - "idc_provenance_source_batch | \n", - "int64 | \n", - "[['idc_provenance_source_batch']] | \n", - "[{'option_code': '0', 'option_description': No... | \n", - "derived from inspection of values | \n", - "[] | \n", - "[] | \n", - "[] | \n", - "[] | \n", - "
33 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "source_batch | \n", - "idc_provenance_source_batch | \n", - "int64 | \n", - "[['idc_provenance_source_batch']] | \n", - "[{'option_code': '0', 'option_description': No... | \n", - "derived from inspection of values | \n", - "[] | \n", - "[] | \n", - "[] | \n", - "[] | \n", - "
34 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "sample_type | \n", - "sample_type | \n", - "String | \n", - "[['sample_type']] | \n", - "[{'option_code': 'Tumor', 'option_description'... | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[sample] | \n", - "[0] | \n", - "[2] | \n", - "
35 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "sample_tumor_status | \n", - "sample_tumor_status | \n", - "String | \n", - "[['sample_tumor_status']] | \n", - "[{'option_code': 'Tumor', 'option_description'... | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[sample] | \n", - "[0] | \n", - "[13] | \n", - "
36 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "gender | \n", - "gender | \n", - "String | \n", - "[['gender']] | \n", - "[{'option_code': 'Female', 'option_description... | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[participant] | \n", - "[0] | \n", - "[2] | \n", - "
37 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "primary_diagnosis | \n", - "primary_diagnosis | \n", - "String | \n", - "[['primary_diagnosis']] | \n", - "[{'option_code': 'Rhabdomyosarcoma', 'option_d... | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[3] | \n", - "
38 | \n", - "rms_mutation_prediction | \n", - "False | \n", - "bigquery-public-data.idc_v18_clinical.rms_muta... | \n", - "disease_type | \n", - "disease_type | \n", - "String | \n", - "[['disease_type']] | \n", - "[{'option_code': 'Soft Tissue Tumors and Sarco... | \n", - "derived from inspection of values | \n", - "[CCDI_Submission_Template_v1.0.1_DM_v2.2023-02... | \n", - "[diagnosis] | \n", - "[0] | \n", - "[2] | \n", - "
\n", - " | dicom_patient_id | \n", - "source_batch | \n", - "participantparticipant_id | \n", - "sample_id | \n", - "sample_type | \n", - "sample_anatomic_site | \n", - "participant_age_at_collection | \n", - "histological_classification | \n", - "tumor_grade | \n", - "tumor_stage_clinical_t | \n", - "tumor_stage_clinical_n | \n", - "tumor_stage_clinical_m | \n", - "tumor_morphology | \n", - "tumor_incidence_type | \n", - "sample_description | \n", - "sample_tumor_status | \n", - "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", - "RMS2325 | \n", - "0 | \n", - "RMS2325 | \n", - "PAWDLM | \n", - "Tumor | \n", - "Leg | \n", - "44.56 | \n", - "\n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | Tumor | \n", - "
1 | \n", - "RMS2124 | \n", - "0 | \n", - "RMS2124 | \n", - "PATMDI | \n", - "Tumor | \n", - "\n", - " | 0.90 | \n", - "BOTRYOID | \n", - "\n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | Tumor | \n", - "
2 | \n", - "RMS2137 | \n", - "0 | \n", - "RMS2137 | \n", - "PATVPL | \n", - "Tumor | \n", - "\n", - " | 0.83 | \n", - "BOTRYOID | \n", - "\n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | Tumor | \n", - "
3 | \n", - "RMS2140 | \n", - "0 | \n", - "RMS2140 | \n", - "PATYYW | \n", - "Tumor | \n", - "\n", - " | 1.07 | \n", - "BOTRYOID | \n", - "\n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | Tumor | \n", - "
4 | \n", - "RMS2145 | \n", - "0 | \n", - "RMS2145 | \n", - "PAUKHP | \n", - "Tumor | \n", - "\n", - " | 2.72 | \n", - "BOTRYOID | \n", - "\n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | Tumor | \n", - "
... | \n", - "... | \n", - "... | \n", - "... | \n", - "... | \n", - "... | \n", - "... | \n", - "... | \n", - "... | \n", - "... | \n", - "... | \n", - "... | \n", - "... | \n", - "... | \n", - "... | \n", - "... | \n", - "... | \n", - "
398 | \n", - "RMS2374 | \n", - "0 | \n", - "RMS2374 | \n", - "PAUPVA | \n", - "Tumor | \n", - "Paratesticular, left | \n", - "2.46 | \n", - "SPINDLE CELL RHABDOMYOSARCOMA | \n", - "\n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | Tumor | \n", - "
399 | \n", - "RMS2352 | \n", - "0 | \n", - "RMS2352 | \n", - "PASGZC | \n", - "Tumor | \n", - "Paratesticular, right | \n", - "0.68 | \n", - "SPINDLE CELL RHABDOMYOSARCOMA | \n", - "\n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | Tumor | \n", - "
400 | \n", - "RMS2205 | \n", - "0 | \n", - "RMS2205 | \n", - "PAMSJL | \n", - "Tumor | \n", - "Pelvis | \n", - "2.76 | \n", - "MIXED ALVEOLAR AND EMBRYONAL RHABDOMYOSARCOMA | \n", - "\n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | Tumor | \n", - "
401 | \n", - "RMS2267 | \n", - "0 | \n", - "RMS2267 | \n", - "PALWAA | \n", - "Tumor | \n", - "Soft tissue, abdomen | \n", - "17.96 | \n", - "MIXED ALVEOLAR AND EMBRYONAL RHABDOMYOSARCOMA | \n", - "\n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | Tumor | \n", - "
402 | \n", - "RMS2459 | \n", - "0 | \n", - "RMS2459 | \n", - "PAPRFM | \n", - "Tumor | \n", - "Prostate | \n", - "8.39 | \n", - "EMBRYONAL RHABDOMYOSARCOMA WITH DIFFUSE ANAPLASIA | \n", - "\n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | \n", - " | Tumor | \n", - "
403 rows × 16 columns
\n", - "