EDAauto

Automatic Exploratory Data Analysis Framework

A Python library for automating the exploratory data analysis (EDA) process. Provides tools for profiling, statistical testing, outlier detection, feature engineering, and more.

Features

Dataset Profiling: Basic information, missing values, data types
Statistical Hypothesis Testing: Automated hypothesis generation and testing
Clustering: Automatic clustering with K-means and DBSCAN
Distribution Fitting: Fit probability distributions to numerical data
Outlier Detection: Multiple methods (IQR, Z-score, Isolation Forest)
Feature Engineering: Time-series features, transformations, interactions
Comprehensive Analysis: Unified interface for complete EDA workflow

Installation

pip install -r requirements.txt
python setup.py install

Quick Start

from autoeda import AutoEDA
import pandas as pd

# Initialize AutoEDA
eda = AutoEDA(random_state=42)

# Load your data
df = pd.read_csv('your_data.csv')

# Basic dataset info
info = eda.dataset_info(df)

# Automatic hypothesis testing
hypotheses = eda.suggest_and_test_hypotheses(df)

# Clustering analysis
clusters = eda.auto_cluster(df, method='kmeans')

# Time-series features
df_with_ts = eda.time_series_features(df, 'datetime_column')

# Comprehensive analysis
results = eda.comprehensive_analysis(df, 'datetime_col', 'target_col')

Project Structure

autoeda/
├── core.py              # Main AutoEDA class
├── analysis/            # Statistical analysis modules
│   ├── clustering.py
│   ├── distributions.py
│   └── hypothesis.py
├── preprocessing/       # Data preprocessing
│   ├── feature_engineering.py
│   └── outliers.py
├── utils/              # Utility functions
│   └── time_series.py
└── example_usage.py    # Usage examples

Key Modules

Core (AutoEDA class)

Unified interface for all EDA functionality

Dataset information and profiling

Comprehensive analysis pipeline

Analysis

Clustering: Automatic grouping of numerical data

Distributions: Fit and compare probability distributions

Hypothesis Testing: Automated statistical testing

Preprocessing

Outlier Detection: Identify and handle anomalies

Feature Engineering: Create new features automatically

Utils

Time Series: Extract datetime features (seasonality, trends, etc.)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
autoeda		autoeda
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example_usage.py		example_usage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EDAauto

Features

Installation

Quick Start

Project Structure

Key Modules

Core (AutoEDA class)

Analysis

Preprocessing

Utils

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EDAauto

Features

Installation

Quick Start

Project Structure

Key Modules

Core (AutoEDA class)

Analysis

Preprocessing

Utils

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages