stream_viz : A stream analysis & visualization tool

Welcome to stream_viz, our comprehensive data analysis and drift detection package! This library is designed to provide robust tools for encoding data, detecting various types of drifts, visualizing feature changes over time, and handling missing data with advanced visualization techniques. Whether you're dealing with normal data, data with missing values, or strategy data, our package has specialized encoders and detectors to meet your needs.

Links

Key Features

Data Encoding

Efficient and Flexible: Mechanisms for encoding various data types.
Specialized Encoders:
- NormalDataEncoder: For normal data without missing values.
- MissingDataEncoder: For normal data with missing values.
- KappaStrategyDataEncoder: For strategy data without missing values.

Drift Detection

Real Concept Drift: Algorithms for detecting changes in the conditional probability distribution $p(y|X)$.
- McDiarmid Drift Detection Method (MDDM): Uses a sliding window approach with arithmetic, geometric, and Euler weighting schemes.
Feature Drift: Detection using the Kolmogorov-Smirnov (KS) test and Population Stability Index (PSI) test.

Visualization

Feature Changes Over Time: Advanced techniques for analyzing feature changes.
Data Velocity: Visualization of the velocity of the stream.
Missingness Patterns: Techniques for visualizing missing data.
Learning Strategies: Compare performance metrics of different learning strategies over the course of the stream.

Goals

Research Stream Visualization Methods: Explore various methods for visualizing streaming data.
Implement Drift Detection: At least one method for concept drift detection.
Visualize Drift: Visualizations for both concept and feature drift.
Visualize Data Velocity: Techniques for visualizing the velocity of the stream.
Visualize Missing Values: Explore ways to visualize missing values.
Compare Learning Strategies: Visualizations to compare performance metrics of different learning strategies.

Achieved By

Integrating Implementations: All of the above implementations are integrated into a
Python package in an object-oriented way.

Usage

For detailed explanations or implementation details, please refer to our User Guide in jupyter notebook format or you can check out the pdf too.

Please check the readme of our package to understand the implementation details for it.

Installation

To get started with stream-viz, follow these installation instructions:

Create a new Conda environment with Python version between 3.8.0 and 3.10.0:
```
conda create --name your_env_name python=3.9
```
Alternatively, ensure your machine has a compatible Python version within this range.

Clone the repository:

git clone https://github.com/aditya0by0/stream-viz.git

Navigate to the package directory and install the package:
```
cd stream-viz
pip install .
```

Example

Here’s a brief example to get you started:

from stream_viz import NormalDataEncoder, RealConceptDriftDetector, DataStreamer

# Encode data
encoder = NormalDataEncoder()
encoder.read_csv_data('path/to/your/data.csv')
encoder.encode_data()

# Detect drift
drift_detector = RealConceptDriftDetector()
streamer = DataStreamer(drift_detector)
streamer.stream_data(encoder.X_encoded_data, encoder.y_encoded_data)

# Visualize results
drift_detector.plot()

For more detailed usage examples, please refer to the User Guide.

Contributing

We welcome contributions! Please see our Contributing Guide for more details.

License

This project is licensed under the MIT License. See the LICENSE file for details.

stream_viz: Advanced data analysis and drift detection made easy. Happy analyzing!

This package provides all the tools you need to effectively encode, detect drifts, and visualize streaming data, ensuring that your models remain robust and your insights stay accurate.

Few Examples with Visualizations

Data Encoder Implementations

from stream_viz.data_encoders.cfpdss_data_encoder import NormalDataEncoder
from stream_viz.utils.constants import _NORMAL_DATA_PATH  # Variable only for developers

normal_encoder = NormalDataEncoder()
# Here, add path to your file, the below variable is for internal use only.
# Add relevant/neccessary parameters supported by pandas.read_csv, if required
normal_encoder.read_csv_data(filepath_or_buffer=_NORMAL_DATA_PATH)
normal_encoder.encode_data()
normal_encoder.X_encoded_data.head()

	c5_b	c7_b	c8_b	c9_b	n0	n1	n2	n3	n4
0	0	1	0	0	0.528245	0.598345	0.558432	0.482846	0.612024
1	0	0	1	1	0.662432	0.423329	0.487623	0.454495	0.452664
2	0	0	1	1	0.562990	0.576429	0.545916	0.370166	0.543403
3	0	0	1	1	0.475311	0.566046	0.539992	0.421434	0.544852
4	1	0	1	0	0.370579	0.554642	0.536804	0.223743	0.392332

from stream_viz.data_encoders.cfpdss_data_encoder import MissingDataEncoder
from stream_viz.utils.constants import (
    _MISSING_DATA_PATH,
)  # Variable only for developers

missing_encoder = MissingDataEncoder()
missing_encoder.read_csv_data(
    filepath_or_buffer=_MISSING_DATA_PATH,  # Here, add path to your file, this variable is for internal use only.
    index_col=[
        0
    ],  # Add relevant/neccessary parameters supported by pandas.read_csv, if required
)
missing_encoder.encode_data()
missing_encoder.X_encoded_data.head()

	c5_b	c7_b	c8_b	c9_b	n0	n1	n2	n3	n4
0	0.0	1.0	0.0	0.0	0.530356	0.598345	0.519161	0.478557	0.620371
1	0.0	0.0	1.0	1.0	0.672618	0.423329	0.442055	0.449888	0.458838
2	0.0	0.0	1.0	1.0	0.567192	0.576429	0.505532	0.364614	0.550814
3	0.0	0.0	1.0	1.0	0.474236	0.566046	0.499081	0.416457	0.552283
4	1.0	0.0	1.0	0.0	0.363202	0.554642	0.495610	0.216550	0.397683

from stream_viz.data_encoders.strategy_data_encoder import KappaStrategyDataEncoder
from stream_viz.utils.constants import (
    _LEARNING_STRATEGY_DATA_PATH,
)  # Variable only for developers

kappa_encoder = KappaStrategyDataEncoder()
kappa_encoder.read_csv_data(
    filepath_or_buffer=_LEARNING_STRATEGY_DATA_PATH,  # Here, add path to your file, this variable is for internal use only.
    header=[
        0,
        1,
        2,
    ],  # Add relevant/neccessary parameters supported by pandas.read_csv, if required
    index_col=[0, 1],
)
kappa_encoder.encode_data()
kappa_encoder.encoded_data.head()

	model_all	model_optimal	model_label	model_feat	model_nafa	model_smraed_catc	model_smraed_sumc	model_smraed_prioc	model_smraed_
Batch_Start
50	0.593128	0.593128	0.432892	0.593128	0.432892	0.257426	0.257426	0.432892	0.593128
100	0.447950	0.409449	0.294671	0.332810	0.296875	0.334898	0.296875	0.221184	0.334898
150	0.838710	0.919614	0.388254	0.676375	0.384236	0.592834	0.634146	0.592834	0.426230
200	0.880000	0.840000	0.720000	0.760000	0.360000	0.680000	0.680000	0.680000	0.520000
250	0.918831	0.959612	0.720000	0.708819	0.295775	0.672131	0.708819	0.672131	0.573379

1. Real Concept Drift

# --------------- Real Concept Drift --------------------------------------------------
from stream_viz.data_streamer import DataStreamer
from stream_viz.real_drift.r_drift_detector import RealConceptDriftDetector

# Initialize DataStreamer with drift detectors
dt_streamer = DataStreamer(
    rcd_detector_obj=RealConceptDriftDetector(),
)

# Stream data and apply drift detection
dt_streamer.stream_data(
    X_df=missing_encoder.X_encoded_data, y_df=missing_encoder.y_encoded_data
)

# Plot results
dt_streamer.rcd_detector_obj.plot(start_tpt=100, end_tpt=3000)

2. Feature Drift Detection

from stream_viz.data_streamer import DataStreamer
from stream_viz.feature_drift.f_drift_detector import FeatureDriftDetector

dt_streamer = DataStreamer(
    fd_detector_obj=FeatureDriftDetector(data_encoder=normal_encoder)
)
dt_streamer.stream_data(
    X_df=normal_encoder.X_encoded_data, y_df=normal_encoder.y_encoded_data
)

dt_streamer.fd_detector_obj.plot(feature_name="n0")

dt_streamer.fd_detector_obj.plot(feature_name="c5")

3. Velocity Plots

from stream_viz.velocity.velocity_charts import FeatureVelocity

feature_vel = FeatureVelocity(missing_encoder)
# Plot categorical feature
feature_vel.plot(
    features="c5", chunk_size=100, start_period=10, end_period=35, x_label_every=5
)

from stream_viz.velocity.velocity_charts import FeatureVelocity

feature_vel = FeatureVelocity(missing_encoder)
# Plott numerical features
feature_vel.plot(features=["n0", "n1"], window_size=10, start_tp=200, end_tp=500)

Stream Graphs

from stream_viz.velocity.velocity_charts import StreamGraph

streamgraph = StreamGraph(normal_encoder)
streamgraph.plot("c5_b")

4. Missingness Plots

from stream_viz.data_missingness.missingness import HeatmapPlotter

plotter = HeatmapPlotter(missing_encoder.X_encoded_data)
plotter.display()

from stream_viz.data_missingness.missingness import StackedBarGraph

# Create the StackedBarGraph object and plot it
stacked_bar = StackedBarGraph(missing_encoder)
stacked_bar.plot("c5_b", 1000)

Scatter Plot

from stream_viz.data_missingness.missingness import ScatterPlotter

scatter_plotter = ScatterPlotter(normal_encoder, missing_encoder)
scatter_plotter.plot_numerical("n0")

scatter_plotter.plot_categorical("c5_b")

Data Missingness HeatMap Plot (Additional/Optional)

from stream_viz.data_missingness.missingness import MarHeatMap

mar_hm = MarHeatMap(
    normal_encoder_obj=normal_encoder, missing_encoder_obj=missing_encoder
)
mar_hm.plot(start_tpt=0, end_tpt=1299, significance_level=0.05)

5. Learning Strategies Plot

Strategy Plot

A basic plot:

from stream_viz.learning_strategies.strategy_viz import LearningStrategyChart

# Create the learning strategy chart and plot it
LearningStrategyChart(kappa_encoder.encoded_data).plot(start_tpt=11000, end_tpt=12950)

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
.github/workflows		.github/workflows
academic_files		academic_files
data		data
notebooks		notebooks
readme		readme
stream_viz		stream_viz
.gitattributes		.gitattributes
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
conda_packages.txt		conda_packages.txt
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stream_viz : A stream analysis & visualization tool

Links

Readme of our package (contains implementations details)

User Guide

Project Report

Project Presentation

Key Features

Data Encoding

Drift Detection

Visualization

Goals

Achieved By

Usage

Installation

Example

Contributing

License

Few Examples with Visualizations

Data Encoder Implementations

1. Real Concept Drift

2. Feature Drift Detection

3. Velocity Plots

Stream Graphs

4. Missingness Plots

Scatter Plot

Data Missingness HeatMap Plot (Additional/Optional)

5. Learning Strategies Plot

Strategy Plot

About

Releases

Packages

Contributors 3

Languages

License

aditya0by0/stream-viz

Folders and files

Latest commit

History

Repository files navigation

stream_viz : A stream analysis & visualization tool

Links

Readme of our package (contains implementations details)

User Guide

Project Report

Project Presentation

Key Features

Data Encoding

Drift Detection

Visualization

Goals

Achieved By

Usage

Installation

Example

Contributing

License

Few Examples with Visualizations

Data Encoder Implementations

1. Real Concept Drift

2. Feature Drift Detection

3. Velocity Plots

Stream Graphs

4. Missingness Plots

Scatter Plot

Data Missingness HeatMap Plot (Additional/Optional)

5. Learning Strategies Plot

Strategy Plot

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages