Skip to content

Commit

Permalink
MLJ interface (#140)
Browse files Browse the repository at this point in the history
  • Loading branch information
mtsch authored Jan 25, 2021
1 parent f477a5a commit 01fa766
Show file tree
Hide file tree
Showing 19 changed files with 511 additions and 85 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# v0.16.4

Add [MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl) support.

# v0.16.3

New function: `midlife`.
Expand Down
9 changes: 6 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "Ripserer"
uuid = "aa79e827-bd0b-42a8-9f10-2b302677a641"
authors = ["mtsch <[email protected]>"]
version = "0.16.3"
version = "0.16.4"

[deps]
Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"
Expand All @@ -11,6 +11,7 @@ Future = "9fa8497b-333b-5362-9e8d-4d0656e87820"
IterTools = "c8e1da08-722c-5040-9ed9-7db0dc04731e"
LightGraphs = "093fc24a-ae57-5d10-9952-331d41423f4d"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MLJModelInterface = "e80e1ace-859a-464e-9ed9-23947d8ae3ea"
MiniQhull = "978d7f02-9e05-4691-894f-ae31a51d76ca"
PersistenceDiagrams = "90b4794c-894b-4756-a0f8-5efeb5ddf7ae"
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
Expand All @@ -25,8 +26,9 @@ DataStructures = "0.17, 0.18"
Distances = "0.8, 0.9, 0.10"
IterTools = "1"
LightGraphs = "1.3.3"
MLJModelInterface = "^0.3.5"
MiniQhull = "0.2"
PersistenceDiagrams = "^0.8.2"
PersistenceDiagrams = "0.9"
ProgressMeter = "1"
RecipesBase = "1"
StaticArrays = "0.12, 1"
Expand All @@ -36,10 +38,11 @@ julia = "1"
[extras]
Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
SafeTestsets = "1bc83da4-3b8d-516f-aca4-4fe02f6d838f"
Suppressor = "fd094767-a336-5f1f-9728-57cf17d0bbfb"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Aqua", "Documenter", "Random", "SafeTestsets", "Suppressor", "Test"]
test = ["Aqua", "Documenter", "MLJBase", "Random", "SafeTestsets", "Suppressor", "Test"]
3 changes: 3 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,12 @@ Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
GLMNet = "8d5ece8b-de18-5317-b113-243142960cc6"
GR = "28b8d3ca-fb5f-59d9-8090-bfdbd6d07a71"
ImageIO = "82e4d734-157c-48bb-816b-45c225c6df19"
ImageMagick = "6218d12a-5da1-5696-b52f-db25d2ecc6d1"
Images = "916415d5-f1e6-5110-898d-aaa5f9f070e0"
Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
MLJDecisionTreeInterface = "c6f25543-311c-4c74-83dc-3ea6d1015661"
MultivariateStats = "6f286f6a-111f-5878-ab1e-185364afe411"
PersistenceDiagrams = "90b4794c-894b-4756-a0f8-5efeb5ddf7ae"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Expand Down
14 changes: 14 additions & 0 deletions docs/src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,20 @@ Ripserer.Chain
Mod
```

## MLJ.jl Interface

```@docs
Ripserer.RipsPersistentHomology
```

```@docs
Ripserer.AlphaPersistentHomology
```

```@docs
Ripserer.CubicalPersistentHomology
```

## Experimental Features

```@docs
Expand Down
6 changes: 3 additions & 3 deletions docs/src/examples/cubical.jl
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ curve_plot = plot(curve; legend=false, title="Curve")
# will use a small, 240×240 pixel version of the image. Ripserer should have no problems
# with processing larger images, but this will work well enough for this tutorial.

blackhole_image = load(joinpath(
@__DIR__, "../assets/data/240px-Black_hole_-_Messier_87_crop_max_res.jpg"
))
blackhole_image = load(
joinpath(@__DIR__, "../assets/data/240px-Black_hole_-_Messier_87_crop_max_res.jpg")
)
blackhole_plot = plot(blackhole_image; title="Black Hole")

# To use the image with Ripserer, we have to convert it to grayscale.
Expand Down
57 changes: 55 additions & 2 deletions docs/src/examples/malaria.jl
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# # Image Classification With Cubical Filtrations and Persistence Images
# # Image Classification With Cubical Persistent Homology

# In this example, we will show how to use Ripserer in an image classification
# context. Persistent homology is not a predictive algorithm, but it can be used to extract
# useful features from data.

# ## Setting up

using Ripserer
using PersistenceDiagrams
using Images # also required: ImageIO to read .png files
Expand Down Expand Up @@ -104,6 +106,8 @@ plot(plot(dim_1[end]; persistence=true), heatmap(image_1(dim_1[end]); aspect_rat

persims = [[vec(image_0(dim_0[i])); vec(image_1(dim_1[i]))] for i in 1:length(diagrams)]

# ## Fitting A Model

# Now it's time to fit our model. We will use
# [GLMNet.jl](https://github.com/JuliaStats/GLMNet.jl) to fit a regularized linear model.

Expand Down Expand Up @@ -137,7 +141,7 @@ nothing; # hide

# Get the classification accuracy.

accuracy = count(predictions .== test_y) / length(test_y)
count(predictions .== test_y) / length(test_y)

# Not half bad considering we haven't touched the images and we left pretty much all
# settings on default.
Expand All @@ -158,3 +162,52 @@ plot(

# These correspond to the area we identified at the beginning. Also note that in this case,
# the classifier does not care about ``H_1`` at all.

# ## Using MLJ

# Another, more straightforward way to execute a similar pipeline is to use Ripserer's
# [MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl) integration. We will use a
# random forest classifier for this example.

# We start by loading MLJ and the classifier. Not that
# [MLJDecisionTreeInterface.jl](https://github.com/bensadeghi/DecisionTree.jl) needs to be
# installed for this to work.

using MLJ
tree = @load RandomForestClassifier pkg = "DecisionTree" verbosity = 0

# We create a pipeline of `CubicalPersistentHomology` followed by the classifier. In this
# case, `CubicalPersistentHomology` takes care of both the homology computation and the
# conversion to persistence images.

pipe = @pipeline(CubicalPersistentHomology(), tree)

# We train the pipeline the same way you would fit any other MLJ model. Remember, we need to
# use grayscale versions of images stored in `inputs`.

classes = coerce(classes, Binary)
train, test = partition(eachindex(classes), 0.7; shuffle=true, rng=1337)
mach = machine(pipe, inputs, classes)
fit!(mach; rows=train)

# Next, we predict the classes on the test data and print out the classification accuracy.

yhat = predict_mode(mach, inputs[test])
accuracy(yhat, classes[test])

# The result is quite a bit worse than before. We can try mitigating that by using a
# different vectorizer.

pipe.cubical_persistent_homology.vectorizer = PersistenceCurveVectorizer()
mach = machine(pipe, inputs, classes)
fit!(mach; rows=train)

yhat = predict_mode(mach, inputs[test])
accuracy(yhat, classes[test])

# The result could be improved further by choosing a different model and
# vectorizer. However, this is just a short introduction. Please see the [MLJ.jl
# documentation](https://alan-turing-institute.github.io/MLJ.jl/dev/) for more information
# on model tuning and selection, and the [PersistenceDiagrams.jl
# documentation](https://mtsch.github.io/PersistenceDiagrams.jl/dev/mlj/) for a list of
# vectorizers and their options.
1 change: 1 addition & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ Ripserer and its companion package
* Various persistence diagram vectorization functions, implemented with persistence images
and persistence curves.
* Easy extensibility through a documented API.
* Integration with [MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl).
* Experimental shortest representative cycle computation.
* Experimental sparse circular coordinate computation.

Expand Down
8 changes: 7 additions & 1 deletion src/Ripserer.jl
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ using RecipesBase
using StaticArrays
using TupleTools

import MLJModelInterface

# This functionality is imported to avoid having to deal with name clashes. There is no
# piracy involved here.
import LightGraphs: vertices, edges, nv, adjacency_matrix
Expand All @@ -51,7 +53,10 @@ export Mod,
ripserer,
reconstruct_cycle,
Partition,
CircularCoordinates
CircularCoordinates,
RipsPersistentHomology,
AlphaPersistentHomology,
CubicalPersistentHomology

include("base/primefield.jl")
include("base/abstractcell.jl")
Expand All @@ -77,5 +82,6 @@ include("filtrations/edgecollapse.jl")

include("extra/cycles.jl")
include("extra/circularcoordinates.jl")
include("extra/mlj.jl")

end
8 changes: 6 additions & 2 deletions src/extra/cycles.jl
Original file line number Diff line number Diff line change
Expand Up @@ -135,9 +135,13 @@ function reconstruct_cycle(
distances=distance_matrix(filtration),
) where {T}
if !hasproperty(interval, :representative)
throw(ArgumentError("interval has no representative! Run `ripserer` with `reps=true`"))
throw(
ArgumentError("interval has no representative! Run `ripserer` with `reps=true`")
)
elseif !(eltype(interval.representative) <: AbstractChainElement{<:AbstractCell{1}})
throw(ArgumentError("cycles can only be reconstructed for 1-dimensional intervals."))
throw(
ArgumentError("cycles can only be reconstructed for 1-dimensional intervals.")
)
elseif !(birth(interval) _birth_or_value(r) < death(interval))
return simplex_type(filtration, 1)[]
else
Expand Down
Loading

2 comments on commit 01fa766

@mtsch
Copy link
Owner Author

@mtsch mtsch commented on 01fa766 Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/28595

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.16.4 -m "<description of version>" 01fa766ca525e174607c9c21a62d34ee8bf47a48
git push origin v0.16.4

Please sign in to comment.