diff --git a/README.md b/README.md index 3a94a7d..a739962 100644 --- a/README.md +++ b/README.md @@ -150,7 +150,7 @@ $$ --> | **default_edge_weight** | ***float, default = 0.1***
Default edge weight ($w$) assigned to any edge with missing weight | | **degree_threshold** | ***float, default = 0.5***
Edges with weight $w$ > degree_threshold are counted as 1 towards the node degree $d$ | | **gene_expression_nodes** | ***list, default = []***
A list of predictors (e.g. TFs) to use that typically is found as columns in the training gene expression data $X_{train}$.
Any `gene_expression_nodes` not found in the `edge_list` are added internally into the network prior `edge_list` using pairwise `default_edge_weight`. Specifying `gene_expression_nodes` is *optional* but may boost the speed of training and fitting NetREm models (by adjusting the network prior in the beginning). Thus, if the gene expression data ($X$) is available, it is recommended to input `gene_expression_nodes`. Otherwise, NetREm automatically determines `gene_expression_nodes` when fitting the model with $X_{train}$ gene expression data (when *fit(X,y)* is called), but needs time to recalibrate the network prior based on $X_{train}$ nodes and value set for `overlapped_nodes_only`. | -| **overlapped_nodes_only** | ***boolean, default = False***
This determines if NetREm should focus on common nodes found in *network nodes* (from `edge_list`) and gene expression data (based on `gene_expression_nodes`). Here, *network nodes* not found in the gene expression data will always be removed. The priority is given to `gene_expression_nodes` since those have gene expression values that are used by the regression.
• If `overlapped_nodes_only = False`, the predictors will come from `gene_expression_nodes`, even if those are not found in the network `edge_list`. Some predictors may lack relationships in the prior network.
• If `overlapped_nodes_only = True`, the predictors used will need to be a common node: *network node* also found in the `gene_expression_nodes`.
| +| **overlapped_nodes_only** | ***boolean, default = False***
This determines if NetREm should focus on common nodes found in *network nodes* (from `edge_list`) and gene expression data (based on `gene_expression_nodes`). Here, *network nodes* not found in the gene expression data will always be removed. The priority is given to `gene_expression_nodes` since those have gene expression values that are used by the regression.
• If `overlapped_nodes_only = False`, the predictors will come from `gene_expression_nodes`, even if those are not found in the network `edge_list`. Some predictors may lack relationships in the prior network.
• If `overlapped_nodes_only = True`, the predictors used will need to be a common node: *network node* also found in the `gene_expression_nodes`.
See [overlapped_nodes_only.pdf](https://github.com/SaniyaKhullar/NetREm/blob/main/user_guide/overlapped_nodes_only.pdf) for hands-on examples. | | **standardize_X** | ***boolean, default = True***
This determines if NetREm should standardize $X$: subtracting the mean of $X$ and dividing by the standard deviation of $X$ using the training data.
| | **center_y** | ***boolean, default = True***
This determines if NetREm should center $y$: subtracting the mean of $y$ based on the training data
| | **y_intercept** | ***boolean, default = 'False'***
This is the `fit_intercept` parameter found in the [Lasso](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html) and [LassoCV](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html) classes in sklearn.
• If `y_intercept = True`, the model will be fit with a y-intercept term included.
• If `y_intercept = False`, the model will be fit with no y-intercept term. | @@ -241,6 +241,7 @@ We can evaluate our model performance capabilities on data like testing data usi $$ MSE = \frac{1}{m} \sum_{i=1}^m (y_i - \hat{y_i})^2 $$ ======= + ### Attributes: | Attribute | Definition |