Skip to content

datannur/datannurpy

Repository files navigation

datannur logo

MIT License PyPI version Python CI codecov

datannurpy

datannurpy is the Python builder for datannur. It scans files and databases, extracts metadata and statistics, then generates a ready-to-use catalog bundled with the datannur app.

Key features:

  • Broad format support - CSV, Excel, Parquet, Delta Lake, Iceberg, SAS, SPSS, Stata
  • Database introspection - PostgreSQL, MySQL, Oracle, SQL Server, SQLite, DuckDB
  • Remote and cloud storage - SFTP, S3, Azure Blob, GCS via fsspec
  • Metadata extraction - Schemas, statistics, frequencies, enumerations, auto-tagging
  • Incremental scans - Only rescan what changed between runs
  • YAML or Python API - Declarative configuration or programmatic control

Quick start

pip install datannurpy
# catalog.yml
app_path: ./my-catalog
open_browser: true

add:
  - folder: ./data
    include: ["*.csv", "*.xlsx", "*.parquet"]

  - database: sqlite:///mydb.sqlite
python -m datannurpy catalog.yml

This command scans the configured sources, generates the catalog files, and opens the datannur app.

Documentation

📖 Full documentation: docs.datannur.com/builder

🗂️ datannur app: github.com/datannur/datannur

🌐 Website: datannur.com

🚀 Demo: dev.datannur.com

Contributing

For development documentation and contributing guidelines, see CONTRIBUTING.md.

License

MIT - see LICENSE. All dependencies are MIT/Apache 2.0/BSD compatible.