Automatic Exploratory Data Analysis Framework
A Python library for automating the exploratory data analysis (EDA) process. Provides tools for profiling, statistical testing, outlier detection, feature engineering, and more.
- Dataset Profiling: Basic information, missing values, data types
- Statistical Hypothesis Testing: Automated hypothesis generation and testing
- Clustering: Automatic clustering with K-means and DBSCAN
- Distribution Fitting: Fit probability distributions to numerical data
- Outlier Detection: Multiple methods (IQR, Z-score, Isolation Forest)
- Feature Engineering: Time-series features, transformations, interactions
- Comprehensive Analysis: Unified interface for complete EDA workflow
pip install -r requirements.txt
python setup.py installfrom autoeda import AutoEDA
import pandas as pd
# Initialize AutoEDA
eda = AutoEDA(random_state=42)
# Load your data
df = pd.read_csv('your_data.csv')
# Basic dataset info
info = eda.dataset_info(df)
# Automatic hypothesis testing
hypotheses = eda.suggest_and_test_hypotheses(df)
# Clustering analysis
clusters = eda.auto_cluster(df, method='kmeans')
# Time-series features
df_with_ts = eda.time_series_features(df, 'datetime_column')
# Comprehensive analysis
results = eda.comprehensive_analysis(df, 'datetime_col', 'target_col')autoeda/
├── core.py # Main AutoEDA class
├── analysis/ # Statistical analysis modules
│ ├── clustering.py
│ ├── distributions.py
│ └── hypothesis.py
├── preprocessing/ # Data preprocessing
│ ├── feature_engineering.py
│ └── outliers.py
├── utils/ # Utility functions
│ └── time_series.py
└── example_usage.py # Usage examples
Unified interface for all EDA functionality
Dataset information and profiling
Comprehensive analysis pipeline
Clustering: Automatic grouping of numerical data
Distributions: Fit and compare probability distributions
Hypothesis Testing: Automated statistical testing
Outlier Detection: Identify and handle anomalies
Feature Engineering: Create new features automatically
Time Series: Extract datetime features (seasonality, trends, etc.)