Skip to content

aditya0by0/stream-viz

Repository files navigation

stream_viz : A stream analysis & visualization tool

Welcome to stream_viz, our comprehensive data analysis and drift detection package! This library is designed to provide robust tools for encoding data, detecting various types of drifts, visualizing feature changes over time, and handling missing data with advanced visualization techniques. Whether you're dealing with normal data, data with missing values, or strategy data, our package has specialized encoders and detectors to meet your needs.

Links

Key Features

Data Encoding

  • Efficient and Flexible: Mechanisms for encoding various data types.
  • Specialized Encoders:
    • NormalDataEncoder: For normal data without missing values.
    • MissingDataEncoder: For normal data with missing values.
    • KappaStrategyDataEncoder: For strategy data without missing values.

Drift Detection

  • Real Concept Drift: Algorithms for detecting changes in the conditional probability distribution $p(y|X)$.
    • McDiarmid Drift Detection Method (MDDM): Uses a sliding window approach with arithmetic, geometric, and Euler weighting schemes.
  • Feature Drift: Detection using the Kolmogorov-Smirnov (KS) test and Population Stability Index (PSI) test.

Visualization

  • Feature Changes Over Time: Advanced techniques for analyzing feature changes.
  • Data Velocity: Visualization of the velocity of the stream.
  • Missingness Patterns: Techniques for visualizing missing data.
  • Learning Strategies: Compare performance metrics of different learning strategies over the course of the stream.

Goals

  • Research Stream Visualization Methods: Explore various methods for visualizing streaming data.
  • Implement Drift Detection: At least one method for concept drift detection.
  • Visualize Drift: Visualizations for both concept and feature drift.
  • Visualize Data Velocity: Techniques for visualizing the velocity of the stream.
  • Visualize Missing Values: Explore ways to visualize missing values.
  • Compare Learning Strategies: Visualizations to compare performance metrics of different learning strategies.

Achieved By

  • Integrating Implementations: All of the above implementations are integrated into a
  • Python package in an object-oriented way.

Usage

For detailed explanations or implementation details, please refer to our User Guide in jupyter notebook format or you can check out the pdf too.

Please check the readme of our package to understand the implementation details for it.

Installation

To get started with stream-viz, follow these installation instructions:

  1. Create a new Conda environment with Python version between 3.8.0 and 3.10.0:

    conda create --name your_env_name python=3.9

    Alternatively, ensure your machine has a compatible Python version within this range.

  2. Clone the repository:

    git clone https://github.com/aditya0by0/stream-viz.git
  3. Navigate to the package directory and install the package:

    cd stream-viz
    pip install .

Example

Here’s a brief example to get you started:

from stream_viz import NormalDataEncoder, RealConceptDriftDetector, DataStreamer

# Encode data
encoder = NormalDataEncoder()
encoder.read_csv_data('path/to/your/data.csv')
encoder.encode_data()

# Detect drift
drift_detector = RealConceptDriftDetector()
streamer = DataStreamer(drift_detector)
streamer.stream_data(encoder.X_encoded_data, encoder.y_encoded_data)

# Visualize results
drift_detector.plot()

For more detailed usage examples, please refer to the User Guide.

Contributing

We welcome contributions! Please see our Contributing Guide for more details.

License

This project is licensed under the MIT License. See the LICENSE file for details.


stream_viz: Advanced data analysis and drift detection made easy. Happy analyzing!


This package provides all the tools you need to effectively encode, detect drifts, and visualize streaming data, ensuring that your models remain robust and your insights stay accurate.


Few Examples with Visualizations

Data Encoder Implementations

from stream_viz.data_encoders.cfpdss_data_encoder import NormalDataEncoder
from stream_viz.utils.constants import _NORMAL_DATA_PATH  # Variable only for developers

normal_encoder = NormalDataEncoder()
# Here, add path to your file, the below variable is for internal use only.
# Add relevant/neccessary parameters supported by pandas.read_csv, if required
normal_encoder.read_csv_data(filepath_or_buffer=_NORMAL_DATA_PATH)
normal_encoder.encode_data()
normal_encoder.X_encoded_data.head()
c5_b c6_b c7_b c8_b c9_b n0 n1 n2 n3 n4
0 0 0 1 0 0 0.528245 0.598345 0.558432 0.482846 0.612024
1 0 0 0 1 1 0.662432 0.423329 0.487623 0.454495 0.452664
2 0 0 0 1 1 0.562990 0.576429 0.545916 0.370166 0.543403
3 0 0 0 1 1 0.475311 0.566046 0.539992 0.421434 0.544852
4 1 0 0 1 0 0.370579 0.554642 0.536804 0.223743 0.392332
from stream_viz.data_encoders.cfpdss_data_encoder import MissingDataEncoder
from stream_viz.utils.constants import (
    _MISSING_DATA_PATH,
)  # Variable only for developers

missing_encoder = MissingDataEncoder()
missing_encoder.read_csv_data(
    filepath_or_buffer=_MISSING_DATA_PATH,  # Here, add path to your file, this variable is for internal use only.
    index_col=[
        0
    ],  # Add relevant/neccessary parameters supported by pandas.read_csv, if required
)
missing_encoder.encode_data()
missing_encoder.X_encoded_data.head()
c5_b c6_b c7_b c8_b c9_b n0 n1 n2 n3 n4
0 0.0 0.0 1.0 0.0 0.0 0.530356 0.598345 0.519161 0.478557 0.620371
1 0.0 0.0 0.0 1.0 1.0 0.672618 0.423329 0.442055 0.449888 0.458838
2 0.0 0.0 0.0 1.0 1.0 0.567192 0.576429 0.505532 0.364614 0.550814
3 0.0 0.0 0.0 1.0 1.0 0.474236 0.566046 0.499081 0.416457 0.552283
4 1.0 0.0 0.0 1.0 0.0 0.363202 0.554642 0.495610 0.216550 0.397683
from stream_viz.data_encoders.strategy_data_encoder import KappaStrategyDataEncoder
from stream_viz.utils.constants import (
    _LEARNING_STRATEGY_DATA_PATH,
)  # Variable only for developers

kappa_encoder = KappaStrategyDataEncoder()
kappa_encoder.read_csv_data(
    filepath_or_buffer=_LEARNING_STRATEGY_DATA_PATH,  # Here, add path to your file, this variable is for internal use only.
    header=[
        0,
        1,
        2,
    ],  # Add relevant/neccessary parameters supported by pandas.read_csv, if required
    index_col=[0, 1],
)
kappa_encoder.encode_data()
kappa_encoder.encoded_data.head()
model_all model_optimal model_label model_feat model_nafa model_smraed_catc model_smraed_sumc model_smraed_prioc model_smraed_
Batch_Start
50 0.593128 0.593128 0.432892 0.593128 0.432892 0.257426 0.257426 0.432892 0.593128
100 0.447950 0.409449 0.294671 0.332810 0.296875 0.334898 0.296875 0.221184 0.334898
150 0.838710 0.919614 0.388254 0.676375 0.384236 0.592834 0.634146 0.592834 0.426230
200 0.880000 0.840000 0.720000 0.760000 0.360000 0.680000 0.680000 0.680000 0.520000
250 0.918831 0.959612 0.720000 0.708819 0.295775 0.672131 0.708819 0.672131 0.573379

1. Real Concept Drift

# --------------- Real Concept Drift --------------------------------------------------
from stream_viz.data_streamer import DataStreamer
from stream_viz.real_drift.r_drift_detector import RealConceptDriftDetector

# Initialize DataStreamer with drift detectors
dt_streamer = DataStreamer(
    rcd_detector_obj=RealConceptDriftDetector(),
)

# Stream data and apply drift detection
dt_streamer.stream_data(
    X_df=missing_encoder.X_encoded_data, y_df=missing_encoder.y_encoded_data
)

# Plot results
dt_streamer.rcd_detector_obj.plot(start_tpt=100, end_tpt=3000)

png


2. Feature Drift Detection

from stream_viz.data_streamer import DataStreamer
from stream_viz.feature_drift.f_drift_detector import FeatureDriftDetector

dt_streamer = DataStreamer(
    fd_detector_obj=FeatureDriftDetector(data_encoder=normal_encoder)
)
dt_streamer.stream_data(
    X_df=normal_encoder.X_encoded_data, y_df=normal_encoder.y_encoded_data
)
dt_streamer.fd_detector_obj.plot(feature_name="n0")

png

dt_streamer.fd_detector_obj.plot(feature_name="c5")

png

3. Velocity Plots

from stream_viz.velocity.velocity_charts import FeatureVelocity

feature_vel = FeatureVelocity(missing_encoder)
# Plot categorical feature
feature_vel.plot(
    features="c5", chunk_size=100, start_period=10, end_period=35, x_label_every=5
)

png

from stream_viz.velocity.velocity_charts import FeatureVelocity

feature_vel = FeatureVelocity(missing_encoder)
# Plott numerical features
feature_vel.plot(features=["n0", "n1"], window_size=10, start_tp=200, end_tp=500)

png

Stream Graphs

from stream_viz.velocity.velocity_charts import StreamGraph

streamgraph = StreamGraph(normal_encoder)
streamgraph.plot("c5_b")

png

4. Missingness Plots

from stream_viz.data_missingness.missingness import HeatmapPlotter

plotter = HeatmapPlotter(missing_encoder.X_encoded_data)
plotter.display()

HeatmapPlotter

from stream_viz.data_missingness.missingness import StackedBarGraph

# Create the StackedBarGraph object and plot it
stacked_bar = StackedBarGraph(missing_encoder)
stacked_bar.plot("c5_b", 1000)

png

Scatter Plot

from stream_viz.data_missingness.missingness import ScatterPlotter

scatter_plotter = ScatterPlotter(normal_encoder, missing_encoder)
scatter_plotter.plot_numerical("n0")

png

scatter_plotter.plot_categorical("c5_b")

png

Data Missingness HeatMap Plot (Additional/Optional)

from stream_viz.data_missingness.missingness import MarHeatMap

mar_hm = MarHeatMap(
    normal_encoder_obj=normal_encoder, missing_encoder_obj=missing_encoder
)
mar_hm.plot(start_tpt=0, end_tpt=1299, significance_level=0.05)

png


5. Learning Strategies Plot

Strategy Plot

A basic plot:

png

from stream_viz.learning_strategies.strategy_viz import LearningStrategyChart

# Create the learning strategy chart and plot it
LearningStrategyChart(kappa_encoder.encoded_data).plot(start_tpt=11000, end_tpt=12950)

png

About

Stream analysis and visualisation for data steams

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages