Skip to content

StevenTapscott/SQL-Squirrel-Census-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploring Squirrel Census Data (SQL Server Project)

Overview

This project analyses the NYC Squirrel Census dataset using SQL Server to uncover patterns in squirrel population distribution and behaviour across different parks and environmental conditions.

The analysis focuses on population density, behavioural trends, and environmental influences such as location and vertical positioning.


Objectives

  • Analyse squirrel population distribution across parks and areas
  • Identify common behavioural patterns (running, foraging, eating, etc.)
  • Examine how environmental factors influence behaviour
  • Apply SQL for real-world data cleaning and analysis
  • Develop insights from an unconventional and unstructured dataset

Tools & Technologies

  • SQL Server Management Studio (SSMS)
  • T-SQL
  • CSV Data Import (Flat File Wizard)
  • Data Cleaning & Transformation
  • Aggregation & Analytical Queries

Dataset

NYC Squirrel Census Dataset (NYC Open Data)

Includes:

  • Location data (Park, Area, Coordinates)
  • Behavioural indicators (Running, Foraging, Eating, Climbing)
  • Environmental attributes (Location, Above Ground height)
  • Qualitative observations (notes and stories)

Data Preparation

Import Process

  • Imported datasets using SSMS Flat File Import Wizard
  • Ensured correct encoding (UTF-8) for text-heavy fields
  • Adjusted column data types to prevent truncation (e.g. nvarchar(MAX) for notes)

Data Cleaning

  • Resolved ParkID inconsistencies
    • Float values converted to integer-based ParkID_Clean
  • Handled invalid numeric values
    • Cleaned AboveGround_feet using TRY_CAST
  • Managed missing data
    • Categorised unknown or invalid values appropriately
  • Standardised structure
    • Clean column names and consistent data types

Data Modelling

Logical relationships established using: squirrel_data.ParkID_Clean → park_data.ParkID_Clean

Due to non-unique values in the park dataset, relationships were handled via validated SQL joins rather than enforced constraints.


📊 Analysis

1. Population Distribution

  • Analysed squirrel counts by:
    • Park
    • Area

➡️ Squirrel populations are concentrated in a small number of parks.


2. Behavioural Trends

  • Analysed:
    • Running
    • Foraging
    • Eating
    • Climbing

➡️ Foraging and movement behaviours dominate, reflecting survival-driven activity.


3. Environmental Impact

  • Behaviour analysed across:
    • Different parks
    • Location types
    • Vertical positioning

➡️ Behaviour varies depending on environmental context.


4. Vertical Position Analysis

  • Categories:
    • Above ground
    • Unknown

⚠️ Limitation:
A large proportion (~78%) of observations were classified as Unknown, limiting accurate vertical behaviour analysis.


📈 Key Insights

  • Squirrel populations are highly concentrated in specific parks
  • Foraging is the most common behaviour, followed by running and eating
  • Behaviour varies across locations, indicating environmental influence
  • Vertical behaviour analysis is limited due to missing data
  • Despite limitations, behavioural patterns remain consistent

⚠️ Challenges & Limitations

  • Data quality issues

    • ParkID imported as float → required cleaning
    • Invalid entries in height data
  • Missing data

    • High proportion of "Unknown" values
  • Schema limitations

    • Non-unique keys prevented PK/FK enforcement
  • Text-heavy fields

    • Required careful handling to avoid truncation and encoding issues

📊 Visualisation

Data was exported and visualised using Excel / Power BI:

  • Top Parks by Squirrel Count
  • Behaviour Distribution
  • Behaviour by Park
  • Vertical Position Analysis

🧠 Conclusion

This project demonstrates how SQL can be used to analyse real-world datasets with significant data quality challenges. The findings highlight the importance of environmental factors in shaping squirrel behaviour, while also emphasising the need for careful data cleaning and validation.


🚀 Skills Demonstrated

  • SQL data import and transformation
  • Data cleaning and validation
  • Aggregation and analytical querying
  • Handling imperfect real-world datasets
  • Deriving insights from behavioural and spatial data
  • Logical data modelling

About

SQL data analysis project exploring NYC Squirrel Census data using SQL queries, aggregations, filtering, and behavioural analysis to identify squirrel population patterns, activity trends, environmental influences, and location-based insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages