Water Potability

1. Goal of the project

Develop a solution to predict the potability of the water based on the different water quality metrics.

The software will be implemnted in a hand device.

2. Business Understanding and Business Problem

Company occupation

Water For All is a global NGO specialized on granting access to potable water in disadvantaged areas.

Business Problem

Water For All wants to create a hand device to analyze the potability of water.

3. Dataset

The dataset shape is: 10 Columns & 3276 rows

Feature 1	Feature 2	Feature 3	Feature 4	Feature 5	Feature 6	Feature 7	Feature 8	Feature 9
pH	Hardness	Solids	Organic_carbon	Turbidity	Chloramines	Sulfate	Conductivity	Trihalomethanes

Target
Potability

Data Set found in Kaggle. Kaggle link

4. Proccess

4.1 Exploratory Data Analysis

The Target Varible "Potability" is imbalance. It was decided to keep the imbalance to better predict the non potable water. Non potable water could be a risk for the water consumer

Is it possible to predict the potability of water based on the confidence intervals of the features for potable and non potable datapoints?

We raised this question to check if it was possible to decrease the hardware costs of the hand device by using filters for the samples instead of Classications models.

The Confidence Interval ranges for the potable and non potable datapoints overlap. It is not possible predict the potability of water based on filters.

4.2 Modeling

Scores

The Scores which guide the decisions

Recall (sensitivity) is the ratio of correctly predicted positive (potable water) observations to the all observations in the actual class potable.
F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account.

The above scores have been chosen because we want to have a low number of False Positives and high number of True Positives.

Feature Engineering | Model Selection | Model Hypertunning | Loop

The Scaler used for the model selection was Standard Scaler.

Model 1	Model 2	Model 3	Model 4	Model 5	Model 6	Model 7
Decision Tree Scaled	Decision Tree	KNN	SVM	Random Forest	Random Forest Scaled	Logistic Regression

Conclussions Model Selection:

Decision Tree Scaled got the best scores for Recall and F1 Score.
The confusion Matrixs for Decision Tree Scaled and Decision Tree not Scaled are very similar.
KNN confusion Matrix gives a low number of False Posivite that was one of the model goals but the True Positives are too low. It will not be usefull for the Hand Device

Decision Tree Scaled is the best choice at this stage.

5 Presentation

To see the presentation, click in the below picture.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Draft_Notebooks		Draft_Notebooks
Reports_Insight_Tables		Reports_Insight_Tables
Viz		Viz
dataset		dataset
README.md		README.md
Water_Potability.ipynb		Water_Potability.ipynb
non_nan.csv		non_nan.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Water Potability

1. Goal of the project

2. Business Understanding and Business Problem

Company occupation

Business Problem

3. Dataset

4. Proccess

4.1 Exploratory Data Analysis

Is it possible to predict the potability of water based on the confidence intervals of the features for potable and non potable datapoints?

4.2 Modeling

Scores

Feature Engineering | Model Selection | Model Hypertunning | Loop

Conclussions Model Selection:

Decision Tree Scaled is the best choice at this stage.

5 Presentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Water Potability

1. Goal of the project

2. Business Understanding and Business Problem

Company occupation

Business Problem

3. Dataset

4. Proccess

4.1 Exploratory Data Analysis

Is it possible to predict the potability of water based on the confidence intervals of the features for potable and non potable datapoints?

4.2 Modeling

Scores

Feature Engineering | Model Selection | Model Hypertunning | Loop

Conclussions Model Selection:

Decision Tree Scaled is the best choice at this stage.

5 Presentation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages