Welcome to stream_viz, our comprehensive data analysis and drift detection package! This library is designed to provide robust tools for encoding data, detecting various types of drifts, visualizing feature changes over time, and handling missing data with advanced visualization techniques. Whether you're dealing with normal data, data with missing values, or strategy data, our package has specialized encoders and detectors to meet your needs.
- Efficient and Flexible: Mechanisms for encoding various data types.
- Specialized Encoders:
NormalDataEncoder
: For normal data without missing values.MissingDataEncoder
: For normal data with missing values.KappaStrategyDataEncoder
: For strategy data without missing values.
-
Real Concept Drift: Algorithms for detecting changes in the conditional probability distribution
$p(y|X)$ .- McDiarmid Drift Detection Method (MDDM): Uses a sliding window approach with arithmetic, geometric, and Euler weighting schemes.
- Feature Drift: Detection using the Kolmogorov-Smirnov (KS) test and Population Stability Index (PSI) test.
- Feature Changes Over Time: Advanced techniques for analyzing feature changes.
- Data Velocity: Visualization of the velocity of the stream.
- Missingness Patterns: Techniques for visualizing missing data.
- Learning Strategies: Compare performance metrics of different learning strategies over the course of the stream.
- Research Stream Visualization Methods: Explore various methods for visualizing streaming data.
- Implement Drift Detection: At least one method for concept drift detection.
- Visualize Drift: Visualizations for both concept and feature drift.
- Visualize Data Velocity: Techniques for visualizing the velocity of the stream.
- Visualize Missing Values: Explore ways to visualize missing values.
- Compare Learning Strategies: Visualizations to compare performance metrics of different learning strategies.
- Integrating Implementations: All of the above implementations are integrated into a
- Python package in an object-oriented way.
For detailed explanations or implementation details, please refer to our User Guide in jupyter notebook format or you can check out the pdf too.
Please check the readme of our package to understand the implementation details for it.
To get started with stream-viz, follow these installation instructions:
-
Create a new Conda environment with Python version between 3.8.0 and 3.10.0:
conda create --name your_env_name python=3.9
Alternatively, ensure your machine has a compatible Python version within this range.
-
Clone the repository:
git clone https://github.com/aditya0by0/stream-viz.git
-
Navigate to the package directory and install the package:
cd stream-viz pip install .
Here’s a brief example to get you started:
from stream_viz import NormalDataEncoder, RealConceptDriftDetector, DataStreamer
# Encode data
encoder = NormalDataEncoder()
encoder.read_csv_data('path/to/your/data.csv')
encoder.encode_data()
# Detect drift
drift_detector = RealConceptDriftDetector()
streamer = DataStreamer(drift_detector)
streamer.stream_data(encoder.X_encoded_data, encoder.y_encoded_data)
# Visualize results
drift_detector.plot()
For more detailed usage examples, please refer to the User Guide.
We welcome contributions! Please see our Contributing Guide for more details.
This project is licensed under the MIT License. See the LICENSE file for details.
stream_viz: Advanced data analysis and drift detection made easy. Happy analyzing!
This package provides all the tools you need to effectively encode, detect drifts, and visualize streaming data, ensuring that your models remain robust and your insights stay accurate.
from stream_viz.data_encoders.cfpdss_data_encoder import NormalDataEncoder
from stream_viz.utils.constants import _NORMAL_DATA_PATH # Variable only for developers
normal_encoder = NormalDataEncoder()
# Here, add path to your file, the below variable is for internal use only.
# Add relevant/neccessary parameters supported by pandas.read_csv, if required
normal_encoder.read_csv_data(filepath_or_buffer=_NORMAL_DATA_PATH)
normal_encoder.encode_data()
normal_encoder.X_encoded_data.head()
c5_b | c6_b | c7_b | c8_b | c9_b | n0 | n1 | n2 | n3 | n4 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 1 | 0 | 0 | 0.528245 | 0.598345 | 0.558432 | 0.482846 | 0.612024 |
1 | 0 | 0 | 0 | 1 | 1 | 0.662432 | 0.423329 | 0.487623 | 0.454495 | 0.452664 |
2 | 0 | 0 | 0 | 1 | 1 | 0.562990 | 0.576429 | 0.545916 | 0.370166 | 0.543403 |
3 | 0 | 0 | 0 | 1 | 1 | 0.475311 | 0.566046 | 0.539992 | 0.421434 | 0.544852 |
4 | 1 | 0 | 0 | 1 | 0 | 0.370579 | 0.554642 | 0.536804 | 0.223743 | 0.392332 |
from stream_viz.data_encoders.cfpdss_data_encoder import MissingDataEncoder
from stream_viz.utils.constants import (
_MISSING_DATA_PATH,
) # Variable only for developers
missing_encoder = MissingDataEncoder()
missing_encoder.read_csv_data(
filepath_or_buffer=_MISSING_DATA_PATH, # Here, add path to your file, this variable is for internal use only.
index_col=[
0
], # Add relevant/neccessary parameters supported by pandas.read_csv, if required
)
missing_encoder.encode_data()
missing_encoder.X_encoded_data.head()
c5_b | c6_b | c7_b | c8_b | c9_b | n0 | n1 | n2 | n3 | n4 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.530356 | 0.598345 | 0.519161 | 0.478557 | 0.620371 |
1 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.672618 | 0.423329 | 0.442055 | 0.449888 | 0.458838 |
2 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.567192 | 0.576429 | 0.505532 | 0.364614 | 0.550814 |
3 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 0.474236 | 0.566046 | 0.499081 | 0.416457 | 0.552283 |
4 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.363202 | 0.554642 | 0.495610 | 0.216550 | 0.397683 |
from stream_viz.data_encoders.strategy_data_encoder import KappaStrategyDataEncoder
from stream_viz.utils.constants import (
_LEARNING_STRATEGY_DATA_PATH,
) # Variable only for developers
kappa_encoder = KappaStrategyDataEncoder()
kappa_encoder.read_csv_data(
filepath_or_buffer=_LEARNING_STRATEGY_DATA_PATH, # Here, add path to your file, this variable is for internal use only.
header=[
0,
1,
2,
], # Add relevant/neccessary parameters supported by pandas.read_csv, if required
index_col=[0, 1],
)
kappa_encoder.encode_data()
kappa_encoder.encoded_data.head()
model_all | model_optimal | model_label | model_feat | model_nafa | model_smraed_catc | model_smraed_sumc | model_smraed_prioc | model_smraed_ | |
---|---|---|---|---|---|---|---|---|---|
Batch_Start | |||||||||
50 | 0.593128 | 0.593128 | 0.432892 | 0.593128 | 0.432892 | 0.257426 | 0.257426 | 0.432892 | 0.593128 |
100 | 0.447950 | 0.409449 | 0.294671 | 0.332810 | 0.296875 | 0.334898 | 0.296875 | 0.221184 | 0.334898 |
150 | 0.838710 | 0.919614 | 0.388254 | 0.676375 | 0.384236 | 0.592834 | 0.634146 | 0.592834 | 0.426230 |
200 | 0.880000 | 0.840000 | 0.720000 | 0.760000 | 0.360000 | 0.680000 | 0.680000 | 0.680000 | 0.520000 |
250 | 0.918831 | 0.959612 | 0.720000 | 0.708819 | 0.295775 | 0.672131 | 0.708819 | 0.672131 | 0.573379 |
# --------------- Real Concept Drift --------------------------------------------------
from stream_viz.data_streamer import DataStreamer
from stream_viz.real_drift.r_drift_detector import RealConceptDriftDetector
# Initialize DataStreamer with drift detectors
dt_streamer = DataStreamer(
rcd_detector_obj=RealConceptDriftDetector(),
)
# Stream data and apply drift detection
dt_streamer.stream_data(
X_df=missing_encoder.X_encoded_data, y_df=missing_encoder.y_encoded_data
)
# Plot results
dt_streamer.rcd_detector_obj.plot(start_tpt=100, end_tpt=3000)
from stream_viz.data_streamer import DataStreamer
from stream_viz.feature_drift.f_drift_detector import FeatureDriftDetector
dt_streamer = DataStreamer(
fd_detector_obj=FeatureDriftDetector(data_encoder=normal_encoder)
)
dt_streamer.stream_data(
X_df=normal_encoder.X_encoded_data, y_df=normal_encoder.y_encoded_data
)
dt_streamer.fd_detector_obj.plot(feature_name="n0")
dt_streamer.fd_detector_obj.plot(feature_name="c5")
from stream_viz.velocity.velocity_charts import FeatureVelocity
feature_vel = FeatureVelocity(missing_encoder)
# Plot categorical feature
feature_vel.plot(
features="c5", chunk_size=100, start_period=10, end_period=35, x_label_every=5
)
from stream_viz.velocity.velocity_charts import FeatureVelocity
feature_vel = FeatureVelocity(missing_encoder)
# Plott numerical features
feature_vel.plot(features=["n0", "n1"], window_size=10, start_tp=200, end_tp=500)
from stream_viz.velocity.velocity_charts import StreamGraph
streamgraph = StreamGraph(normal_encoder)
streamgraph.plot("c5_b")
from stream_viz.data_missingness.missingness import HeatmapPlotter
plotter = HeatmapPlotter(missing_encoder.X_encoded_data)
plotter.display()
from stream_viz.data_missingness.missingness import StackedBarGraph
# Create the StackedBarGraph object and plot it
stacked_bar = StackedBarGraph(missing_encoder)
stacked_bar.plot("c5_b", 1000)
from stream_viz.data_missingness.missingness import ScatterPlotter
scatter_plotter = ScatterPlotter(normal_encoder, missing_encoder)
scatter_plotter.plot_numerical("n0")
scatter_plotter.plot_categorical("c5_b")
from stream_viz.data_missingness.missingness import MarHeatMap
mar_hm = MarHeatMap(
normal_encoder_obj=normal_encoder, missing_encoder_obj=missing_encoder
)
mar_hm.plot(start_tpt=0, end_tpt=1299, significance_level=0.05)
A basic plot:
from stream_viz.learning_strategies.strategy_viz import LearningStrategyChart
# Create the learning strategy chart and plot it
LearningStrategyChart(kappa_encoder.encoded_data).plot(start_tpt=11000, end_tpt=12950)