In the ./code folder you'll find all the listings from the book. This README file explains how to set up the prerequisites.
Note: Each python file follows the book naming convention (Example: listing_3.4.py refers to the listing 3.4 you can find in the chapter 3).
Table of contents
- Creating the Python environment
- Specific libraries for chapter 5
- Leveraging Google AI
- Installing Alteryx
To leverage the resources provided in this repository it's first recommended to create a python virtual environment.
Using a virtual environment is crucial for managing dependencies in Python projects. It helps to:
- Avoid Conflicts: Different projects may require different versions of the same package. Virtual environments keep dependencies isolated.
- Ensure Reproducibility: When sharing your project, others can install the same versions of dependencies you used.
- Maintain a Clean Global Environment: Prevent clutter in your global Python installation by installing packages only in the virtual environment.
There are several ways to create and manage virtual environments in Python:
- Using
venv(Native Python): Lightweight and included in Python 3.3 and later.- Documentation: Python venv
- Using
virtualenv: A widely-used external tool, compatible with Python 2 and older versions of Python 3.- Documentation: Virtualenv
- Using
conda: An environment manager provided by Anaconda. Suitable for managing Python and non-Python dependencies.- Documentation: Conda
- Using
pipenv: Combinespipandvirtualenvfor better dependency management.- Documentation: Pipenv
Ensure Python 3.12 or later is installed on your system.
- Navigate to Your Project Directory
Open a terminal and navigate to the folder where your project is located:cd /path/to/your/project - Create a Virtual Environment
Run the following command to create a virtual environment named env:
This will create a directory called env containing the virtual environment files.
python3 -m venv env
- Activate the Virtual Environment
- On
Windows
.\env\Scripts\activate
- On
macOS/Linux
After activation, your terminal prompt will show (env) indicating the virtual environment is active.source env/bin/activate - On
- Install Dependencies
Use the requirements.txt that is available at the root of the repository and install all the needed libraries like this:
pip install -r requirements.txt
In chapter 5 (listings 5.5 and 5.6) we leverage the spacy library, but to make it work properly we need first to download the en_core_web_sm model. Please run the command below
python -m spacy download en_core_web_sm
We leverage Tesseract on the listing 5.10 and 5.11, but this library must also be installed beforehand.
Windows- Download: Download the installer from the official Tesseract website
- Installation:
- Run the installer.
- Choose the installation directory (e.g., C:\Program Files\Tesseract-OCR).
- Select the desired language data to install (e.g., English, German, etc.).
- Complete the installation.
- Environment Variable:
- Add the installation directory (e.g., C:\Program Files\Tesseract-OCR) to your system's PATH environment variable. This allows you to run Tesseract commands from any command prompt.
Mac- Install Homebrew (if not already installed) by running in a terminal:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" - Install Tesseract with Homebrew:
brew install tesseract
- Install language data:
brew install tesseract-lang/data/eng
- Install Homebrew (if not already installed) by running in a terminal:
Linux- Update package lists in a terminal:
sudo apt update
- Install Tesseract and language data:
sudo apt install tesseract-ocr tesseract-ocr-eng
- Update package lists in a terminal:
Note: Verify Installation by Opening a terminal or command prompt and running:
tesseract --versionIn chapter 8 we'll use Google AI in the code to illustrate the book's topics. These are the steps to get ready:
- Obtaining a Google Key
- Creating the
GEMINI_KEYenvironment variable
- Access to Google AI Studio
- Sign in by using your Google Account
- Create an API Key:
- Navigate to API Keys: Locate the section for API keys or credentials within Google AI Studio. As of today it's a simple button avaialbe on the top left.
- Create a New Key: Follow the provided instructions to generate a new API key.
- Restrict the Key (Optional): Enhance security by restricting key usage:
- Application Restrictions: Specify the allowed websites or applications.
- IP Address Restrictions: Limit usage to specific IP addresses or ranges.
- Copy the Key: Carefully copy the generated API key. Treat it like a password!
Important Notes:
- Security: Regularly rotate API keys to minimize the risk of compromise.
- Usage Limits: Be aware of any usage limits or quotas associated with your API key.
Disclaimer:
- This information is for general guidance. Always refer to the official Google AI Studio documentation for the most up-to-date and accurate instructions.
- Access to your System Properties:
WindowsSearch for "Environment Variables" in the Start Menu. Right-click "This PC" or "My Computer" and select "Properties". Go to "Advanced system settings" and click "Environment Variables".macOS/LinuxOpen your terminal.
- Create a New Variable:
WindowsIn the "System variables" section, click "New".- Enter the Variable name:
GEMINI_KEY. - Enter the Variable value (your actual API key you've created beforehand).
- Click "OK" to save.
- Enter the Variable name:
macOS/LinuxUse the following command in the terminal:export GEMINI_KEY="your_actual_api_key"
This sets the variable for the current terminal session. To make it persistent, add this line to your shell's configuration file (e.g., .bashrc, .zshrc).
Fortunately, Alteryx offers a free trial version, available for download here. The trial lasts for 30 days, providing ample time to explore its capabilities and evaluate its potential for data preparation and analysis. As of today, the Alteryx trial version installation process is straightforward and user-friendly, allowing us to get up and running quickly.
- Download the Installer: Start by visiting the Alteryx Designer Trial page. Fill out the registration form with basic details to access the download link for the trial version.
- Run the Installation: Once the installer file is downloaded, we can launch it and follow the guided setup process. The installer will prompt us to accept the license agreement and choose a destination folder for the installation.
- Initial Setup and Activation: After the installation completes, open Alteryx Designer. We will be prompted to activate the 30-day trial by signing in with our Alteryx account (created during the download process).
- Access Sample Workflows: Upon activation, Alteryx Designer provides access to sample workflows and built-in tutorials to help you get started quickly. These resources are useful for understanding key functionalities like data input, blending, and transformation.
SYSTEM REQUIREMENTS
Alteryx Designer requires a Windows operating system, with recommended specifications including at least 8GB of RAM and adequate disk space for processing large datasets.
The entire process is designed to take only a few minutes, ensuring you can begin exploring Alteryx’s features right away without needing any advanced technical setup.
Note: the exports provided in this repository have been created with Alteryx v2024.1.1.93 Patch: 3
These are the steps to Create a Databricks Community Account:
-
Visit the Community Edition page
- Go to databricks.com/try-databricks
- Look for the "Get started with Community Edition" option
-
Fill out the registration form
- Enter your name, email address, and create a password
- Accept the terms of service
- Complete any CAPTCHA verification if presented
-
Verify your email address
- Check your inbox for a verification email from Databricks
- Click the verification link in the email
-
Set up your workspace
- After verification, you'll be redirected to set up your Databricks workspace
- Choose a workspace name (optional)
- Select your preferred cloud provider region
-
Access your Community Edition workspace
- Once setup is complete, you'll be redirected to your new Databricks workspace
- The interface will include a sidebar with options for notebooks, data, clusters, etc.
-
Create your first cluster
- Navigate to "Compute" or "Clusters" in the sidebar
- Click "Create Cluster"
- Configure your cluster settings (Community Edition has limitations on resources)
- Start your cluster
-
Begin using Databricks
- Create a new notebook from the workspace homepage
- Attach your notebook to the cluster you created
- Start exploring Databricks' features and capabilities
Note that the Community Edition has some limitations compared to paid versions, including compute resources, storage, and certain features, but it's an excellent way to learn and experiment with Databricks.