docs: Improve README.md with technical details and integrations#743
docs: Improve README.md with technical details and integrations#743luisremis wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refreshes README.md to present the project as the “ApertureDB Python SDK”, adding badges and expanding documentation around installation, integrations, development setup, and test execution.
Changes:
- Added PyPI/Python/License/CI badges and restructured the intro content.
- Documented integrations/capabilities (ML frameworks, embeddings, Dask, cloud storage, Croissant, SPARQL) and reorganized documentation links.
- Reworked development + testing instructions and runtime configuration environment variable table.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ad-claw000
left a comment
There was a problem hiding this comment.
This README refresh looks great! The addition of badges, clear integration outlines, and a clean environment variable table really improves the onboarding experience. Approving.
… install commands
ad-claw000
left a comment
There was a problem hiding this comment.
Addressed the review comments by Copilot: fixed the CI badge workflow link, corrected the integrations text regarding dependencies, quoted the pip install command, and updated the dbinfo instructions to use integers and ephemeral ports.
|
Addressed the remaining PR review comments: updated the integrations section to mention explicit dependency requirements for |
| ## 🚀 Integrations & Capabilities | ||
|
|
||
| The SDK is designed for modern ML workflows and offers seamless integrations with (some require additional dependencies): | ||
| * **Deep Learning Frameworks:** Seamless conversion of ApertureDB queries into `PyTorch` (`PyTorchDataset`) and `TensorFlow` (`TensorFlowDataset`) data loaders for immediate model training. |
| * **Vector Search & Embeddings:** First-class support for storing and retrieving high-dimensional descriptors, including native embedding extraction utilizing `CLIP` (requires `openai-clip`) and `Facenet`. | ||
| * **Distributed Data Processing:** Integration with `Dask` to handle parallelized data loading and large-scale query execution. | ||
| * **Cloud Storage Integrations:** Easy handling of assets stored remotely using `Boto3` (AWS S3) and Google Cloud Storage. | ||
| * **ML Croissant:** Native parsing and handling of datasets aligned with the ML Croissant metadata format. |
| ``` | ||
|
|
||
| or an installation with only the core part of the SDK | ||
| To install just the lightweight core client without the heavy ML dependencies: |
| ```bash | ||
| git clone https://github.com/aperture-data/aperturedb-python.git | ||
| cd aperturedb-python | ||
| pip install -e .[dev] |
| DB_TCP_PORT = 55551 # Replace with the mapped host port for lenz | ||
| DB_REST_PORT = 8443 # Replace with the mapped host port for nginx 443 |
| DB_REST_HOST = 'localhost' | ||
| DB_TCP_PORT = 55551 # Replace with the mapped host port for lenz | ||
| DB_REST_PORT = 8443 # Replace with the mapped host port for nginx 443 | ||
| VERIFY_HOSTNAME = False # Required when connecting to localhost to skip cert hostname validation |
| # Reporting bugs | ||
| Any error in the functionality / documentation / tests maybe reported by creating a | ||
| [github issue](https://github.com/aperture-data/aperturedb-python/issues). | ||
| | `ADB_DEBUGGABLE` | `boolean` | Allows the application to register a fault handler that dumps a trace when `SIGUSR1` is sent to the process. | *Not set* | |
This PR updates the
README.mdby adding: