Skip to content

Commit

Permalink
{SLmetrics} Version 0.3-0 🚀 (#33)
Browse files Browse the repository at this point in the history
> [!NOTE]
>
> See NEWS or commit history for detailed changes.

##  📚  What?

### 🚀 New features 

This update introduces four new features. These are described below,

**Cross-Entropy Loss (PR
#34 Weighted and unweighted
cross-entropy loss. The function can be used as follows,


``` r
# 1) define classes and
# observed classes (actual)
classes <- c("Class A", "Class B")

actual   <- factor(
  c("Class A", "Class B", "Class A"), 
  levels = classes

)

# 2) define probabilites
# and construct response_matrix
response <- c(
  0.2, 0.8, 
  0.8, 0.2, 
  0.7, 0.3
)

response_matrix <- matrix(
  response,
  nrow = 3,
  ncol = 2,
  byrow = TRUE
)

colnames(response_matrix) <- classes

response_matrix
#>      Class A Class B
#> [1,]     0.2     0.8
#> [2,]     0.8     0.2
#> [3,]     0.7     0.3

# 3) calculate entropy
SLmetrics::entropy(
  actual,
  response_matrix
)
#> [1] 1.19185
```



**Relative Root Mean Squared Error (Commit
5521b5b49b1e268c50d6b8d61ae1c6243c4944b3):**

The function normalizes the Root Mean Squared Error by a factor. There
is no official way of normalizing it - and in {SLmetrics} the RMSE can
be normalized using three options; mean-, range- and IQR-normalization.
It can be used as follows,

```r
# 1) define values
actual <- rnorm(1e3)
predicted <- actual + rnorm(1e3)

# 2) calculate Relative Root Mean Squared Error
cat(
  "Mean Relative Root Mean Squared Error", SLmetrics::rrmse(
    actual        = actual,
    predicted     = predicted,
    normalization = 0
  ),
  "Range Relative Root Mean Squared Error", SLmetrics::rrmse(
    actual        = actual,
    predicted     = predicted,
    normalization = 1
  ),
  "IQR Relative Root Mean Squared Error", SLmetrics::rrmse(
    actual        = actual,
    predicted     = predicted,
    normalization = 2
  ),
  sep = "\n"
)

#> Mean Relative Root Mean Squared Error
#> 2751.381
#> Range Relative Root Mean Squared Error
#> 0.1564043
#> IQR Relative Root Mean Squared Error
#> 0.7323898
```

**Weighted Receiver Operator Characteristics and Precision-Recall Curves
(PR #31

These functions returns the weighted version of `TPR`, `FPR` and
`precision`, `recalll` in `weighted.ROC()` and `weighted.prROC()`
respectively. The `weighted.ROC()`-function[^1] can be used as follows,

```r
actual    <- factor(sample(c("Class 1", "Class 2"), size = 1e6, replace = TRUE, prob = c(0.7, 0.3)))
response  <- ifelse(actual == "Class 1", rbeta(sum(actual == "Class 1"), 2, 5), rbeta(sum(actual == "Class 2"), 5, 2))
w         <- ifelse(actual == "Class 1", runif(sum(actual == "Class 1"), 0.5, 1.5), runif(sum(actual == "Class 2"), 1, **2))
```

``` r
# Plot
plot(SLmetrics::weighted.ROC(actual, response, w))
```

![](https://i.imgur.com/YG9kqZa.png)<!-- -->

### ⚠️ Breaking Changes

- **Weighted Confusion Matix:** The `w`-argument in `cmatrix()` has been
  removed in favor of the more verbose weighted confusion matrix call
  `weighted.cmatrix()`-function. See below,

Prior to version `0.3-0` the weighted confusion matrix were a part of
the `cmatrix()`-function and were called as follows,

``` r
SLmetrics::cmatrix(
    actual    = actual,
    predicted = predicted,
    w         = weights
)
```

This solution, although simple, were inconsistent with the remaining
implementation of weighted metrics in {SLmetrics}. To regain consistency
and simplicity the weighted confusion matrix are now retrieved as
follows,

``` r
# 1) define factors
actual    <- factor(sample(letters[1:3], 100, replace = TRUE))
predicted <- factor(sample(letters[1:3], 100, replace = TRUE))
weights   <- runif(length(actual))

# 2) without weights
SLmetrics::cmatrix(
    actual    = actual,
    predicted = predicted
)
```

    #>    a  b  c
    #> a  7  8 18
    #> b  6 13 15
    #> c 15 14  4

``` r
# 2) with weights
SLmetrics::weighted.cmatrix(
    actual    = actual,
    predicted = predicted,
    w         = weights
)
```

    #>          a        b        c
    #> a 3.627355 4.443065 7.164199
    #> b 3.506631 5.426818 8.358687
    #> c 6.615661 6.390454 2.233511

### 🐛  Bug-fixes

- **Return named vectors:** The classification metrics when
  `micro == NULL` were not returning named vectors. This has been fixed.


[^1]: The syntax is the same for `weighted.prROC()`
  • Loading branch information
serkor1 authored Dec 30, 2024
1 parent 2c896fa commit 878972a
Show file tree
Hide file tree
Showing 135 changed files with 4,862 additions and 3,746 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: SLmetrics
Title: Machine Learning Performance Evaluation on Steroids
Version: 0.2-0
Version: 0.3-0
Authors@R: c(
person(
given = "Serkan",
Expand Down
19 changes: 4 additions & 15 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -9,38 +9,27 @@ PKGNAME = SLmetrics
VERSION = $(shell grep "^Version:" DESCRIPTION | sed "s/Version: //")
TARBALL = $(PKGNAME)_$(VERSION).tar.gz

py-setup:
@echo "Setting up Python environment"
@echo "============================="
@python -m venv .venv
@echo "Activating virtual environment"
@pip cache purge
@python -m pip install --upgrade pip
@pip install numpy scipy torch torchmetrics scikit-learn imbalanced-learn mkl mkl-service mkl_fft mkl_random
@echo "Done!"

py-check:
@echo "Checking installed python modules"
@echo "================================="
@pip list

document:
clear
@echo "Documenting {$(PKGNAME)}"
@Rscript tools/document.R

build: document
@echo "Installing {$(PKGNAME)}"
rm -f src/*.o src/*.so
R CMD build .
R CMD INSTALL $(TARBALL)
rm -f $(TARBALL)
rm -f src/*.o src/*.so

check: document
@echo "Checking {$(PKGNAME)}"
rm -f src/*.o src/*.so
R CMD build .
R CMD check $(TARBALL)
rm -f $(TARBALL)
rm -rf $(PKGNAME).Rcheck
rm -f src/*.o src/*.so

build-site:
@echo "Building {pkgdown}"
Expand Down
18 changes: 18 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ S3method(csi,cmatrix)
S3method(csi,factor)
S3method(dor,cmatrix)
S3method(dor,factor)
S3method(entropy,factor)
S3method(fallout,cmatrix)
S3method(fallout,factor)
S3method(fbeta,cmatrix)
Expand All @@ -28,6 +29,7 @@ S3method(fpr,factor)
S3method(huberloss,numeric)
S3method(jaccard,cmatrix)
S3method(jaccard,factor)
S3method(logloss,factor)
S3method(mae,numeric)
S3method(mape,numeric)
S3method(mcc,cmatrix)
Expand Down Expand Up @@ -61,6 +63,7 @@ S3method(recall,cmatrix)
S3method(recall,factor)
S3method(rmse,numeric)
S3method(rmsle,numeric)
S3method(rrmse,numeric)
S3method(rrse,numeric)
S3method(rsq,numeric)
S3method(selectivity,cmatrix)
Expand All @@ -79,19 +82,23 @@ S3method(tpr,cmatrix)
S3method(tpr,factor)
S3method(tscore,cmatrix)
S3method(tscore,factor)
S3method(weighted.ROC,factor)
S3method(weighted.accuracy,factor)
S3method(weighted.baccuracy,factor)
S3method(weighted.ccc,numeric)
S3method(weighted.ckappa,factor)
S3method(weighted.cmatrix,factor)
S3method(weighted.csi,factor)
S3method(weighted.dor,factor)
S3method(weighted.entropy,factor)
S3method(weighted.fallout,factor)
S3method(weighted.fbeta,factor)
S3method(weighted.fdr,factor)
S3method(weighted.fer,factor)
S3method(weighted.fpr,factor)
S3method(weighted.huberloss,numeric)
S3method(weighted.jaccard,factor)
S3method(weighted.logloss,factor)
S3method(weighted.mae,numeric)
S3method(weighted.mape,numeric)
S3method(weighted.mcc,factor)
Expand All @@ -103,11 +110,13 @@ S3method(weighted.phi,factor)
S3method(weighted.pinball,numeric)
S3method(weighted.plr,factor)
S3method(weighted.ppv,factor)
S3method(weighted.prROC,factor)
S3method(weighted.precision,factor)
S3method(weighted.rae,numeric)
S3method(weighted.recall,factor)
S3method(weighted.rmse,numeric)
S3method(weighted.rmsle,numeric)
S3method(weighted.rrmse,numeric)
S3method(weighted.rrse,numeric)
S3method(weighted.rsq,numeric)
S3method(weighted.selectivity,factor)
Expand All @@ -128,6 +137,7 @@ export(ckappa)
export(cmatrix)
export(csi)
export(dor)
export(entropy)
export(fallout)
export(fbeta)
export(fdr)
Expand All @@ -136,6 +146,7 @@ export(fmi)
export(fpr)
export(huberloss)
export(jaccard)
export(logloss)
export(mae)
export(mape)
export(mcc)
Expand All @@ -153,6 +164,7 @@ export(rae)
export(recall)
export(rmse)
export(rmsle)
export(rrmse)
export(rrse)
export(rsq)
export(selectivity)
Expand All @@ -162,19 +174,23 @@ export(specificity)
export(tnr)
export(tpr)
export(tscore)
export(weighted.ROC)
export(weighted.accuracy)
export(weighted.baccuracy)
export(weighted.ccc)
export(weighted.ckappa)
export(weighted.cmatrix)
export(weighted.csi)
export(weighted.dor)
export(weighted.entropy)
export(weighted.fallout)
export(weighted.fbeta)
export(weighted.fdr)
export(weighted.fer)
export(weighted.fpr)
export(weighted.huberloss)
export(weighted.jaccard)
export(weighted.logloss)
export(weighted.mae)
export(weighted.mape)
export(weighted.mcc)
Expand All @@ -186,11 +202,13 @@ export(weighted.phi)
export(weighted.pinball)
export(weighted.plr)
export(weighted.ppv)
export(weighted.prROC)
export(weighted.precision)
export(weighted.rae)
export(weighted.recall)
export(weighted.rmse)
export(weighted.rmsle)
export(weighted.rrmse)
export(weighted.rrse)
export(weighted.rsq)
export(weighted.selectivity)
Expand Down
112 changes: 109 additions & 3 deletions NEWS.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,120 @@ knitr::opts_chunk$set(
set.seed(1903)
```

# Version 0.2-0
# Version 0.3-0

> Version 0.2-0 is considered pre-release of {SLmetrics}. We do not
> Version 0.3-0 is considered pre-release of {SLmetrics}. We do not
> expect any breaking changes, unless a major bug/issue is reported and its nature
> forces breaking changes.
## Improvements

## New Feature

* **Relative Root Mean Squared Error:** The function normalizes the Root Mean Squared Error by a facttor. There is no official way of normalizing it - and in {SLmetrics} the RMSE can be normalized using three options; mean-, range- and IQR-normalization. It can be used as follows,

```{r}
# 1) define values
actual <- rnorm(1e3)
predicted <- actual + rnorm(1e3)
# 2) calculate Relative Root Mean Squared Error
cat(
"Mean Relative Root Mean Squared Error", SLmetrics::rrmse(
actual = actual,
predicted = predicted,
normalization = 0
),
"Range Relative Root Mean Squared Error", SLmetrics::rrmse(
actual = actual,
predicted = predicted,
normalization = 1
),
"IQR Relative Root Mean Squared Error", SLmetrics::rrmse(
actual = actual,
predicted = predicted,
normalization = 2
),
sep = "\n"
)
```

* **Cross Entropy:** Weighted and unweighted Cross Entropy, with and without normalization. The function can be used as follows,

```{r}
# Create factors and response probabilities
actual <- factor(c("Class A", "Class B", "Class A"))
weights <- c(0.3,0.9,1)
response <- matrix(cbind(
0.2, 0.8,
0.8, 0.2,
0.7, 0.3
),nrow = 3, ncol = 2)
cat(
"Unweighted Cross Entropy:",
SLmetrics::entropy(
actual,
response
),
"Weighted Cross Entropy:",
SLmetrics::weighted.entropy(
actual = actual,
response = response,
w = weights
),
sep = "\n"
)
```

* **Weighted Receiver Operator Characteristics:** `weighted.ROC()`, the function calculates the weighted True Positive and False Positive Rates for each threshold.

* **Weighted Precision-Recall Curve:** `weighted.prROC()`, the function calculates the weighted Recall and Precsion for each threshold.

## Breaking Changes

* **Weighted Confusion Matix:** The `w`-argument in `cmatrix()` has been removed in favor of the more verbose weighted confusion matrix call `weighted.cmatrix()`-function. See below,

Prior to version `0.3-0` the weighted confusion matrix were a part of the `cmatrix()`-function and were called as follows,

```{r, eval = FALSE}
SLmetrics::cmatrix(
actual = actual,
predicted = predicted,
w = weights
)
```

This solution, although simple, were inconsistent with the remaining implementation of weighted metrics in {SLmetrics}. To regain consistency and simplicity the weighted confusion matrix are now retrieved as follows,

```{r}
# 1) define factors
actual <- factor(sample(letters[1:3], 100, replace = TRUE))
predicted <- factor(sample(letters[1:3], 100, replace = TRUE))
weights <- runif(length(actual))
# 2) without weights
SLmetrics::cmatrix(
actual = actual,
predicted = predicted
)
# 2) with weights
SLmetrics::weighted.cmatrix(
actual = actual,
predicted = predicted,
w = weights
)
```

## Bug-fixes

* **Return named vectors:** The classification metrics when `micro == NULL` were not returning named vectors. This has been fixed.

# Version 0.2-0

## Improvements

* **documentation:** The documentation has gotten some extra love, and now all functions have their formulas embedded, the details section have been freed from a general description of [factor] creation. This will make room for future expansions on the various functions where more details are required.

* **weighted classification metrics:** The `cmatrix()`-function now accepts the argument `w` which is the sample weights; if passed the respective method will return the weighted metric. Below is an example using sample weights for the confusion matrix,
Expand All @@ -40,7 +146,7 @@ SLmetrics::cmatrix(
)
# 2) with weights
SLmetrics::cmatrix(
SLmetrics::weighted.cmatrix(
actual = actual,
predicted = predicted,
w = weights
Expand Down
Loading

0 comments on commit 878972a

Please sign in to comment.