Sample CSV files

This repository contains sample Comma Separated Value (CSV) files. CSV is a generic flat file format used to store structured data. Datasets are split in 3 categories: Customers, Users and Organizations. For each, sample CSV files range from 100 to 2 millions records. Those CSV files can be used for testing purpose. They can be open by any application compatible with CSV files or with a CSV editor.

The datasets are generated using random values. Mosly using Python Faker package.

Customers CSV Sample

customers-100.csv - Zip version - Customers CSV with 100 records
customers-1000.csv - Zip version - Customers CSV with 1000 records
customers-10000.csv - Zip version - Customers CSV with 10000 records
customers-100000.csv - Zip version - Customers CSV with 100000 records
customers-500000.csv - Customers CSV with 500000 records
customers-1000000.csv - Customers CSV with 1000000 records
customers-2000000.csv - Customers CSV with 2000000 records

Customer Schema

Index
Customer Id
First Name
Last Name
Company
City
Country
Phone 1
Phone 2
Email
Subscription Date
Website

People CSV Samples

people-100.csv - Zip version - People CSV with 100 records
people-1000.csv - Zip version - People CSV with 1000 records
people-10000.csv - Zip version - People CSV with 10000 records
people-100000.csv - Zip version - People CSV with 100000 records
people-500000.csv - People CSV with 500000 records
people-1000000.csv - People CSV with 1000000 records
people-2000000.csv - People CSV with 2000000 records

People Schema

Index
User Id
First Name
Last Name
Sex
Email
Phone
Date of birth
Job Title

Organizations CSV Samples

organizations-100.csv - Zip version - Organizations CSV with 100 records
organizations-1000.csv - Zip version - Organizations CSV with 1000 records
organizations-10000.csv - Zip version - Organizations CSV with 10000 records
organizations-100000.csv - Zip version - Organizations CSV with 100000 records
organizations-500000.csv - Organizations CSV with 500000 records
organizations-1000000.csv - Organizations CSV with 1000000 records
organizations-2000000.csv - Organizations CSV with 2000000 records

Organization Schema

Index
Organization Id
Name
Website
Country
Description
Founded
Industry
Number of employees

Leads CSV Samples

leads-100.csv - Zip version - Leads CSV with 100 records
leads-1000.csv - Zip version - Leads CSV with 1000 records
leads-10000.csv - Zip version - Leads CSV with 10000 records
leads-100000.csv - Zip version - Leads CSV with 100000 records

Lead Schema

Index
Account Id
Lead Owner
First Name
Last Name
Company
Phone 1
Phone 2
Email 1
Email 2
Website
Source
Deal Stage
Notes

Products CSV Samples

products-100.csv - Zip version - Products CSV with 100 records
products-1000.csv - Zip version - Products CSV with 1000 records
products-10000.csv - Zip version - Products CSV with 10000 records
products-100000.csv - Zip version - Products CSV with 100000 records
products-1000000.csv - Zip version - Products CSV with 1000000 records
products-2000000.csv - Zip version - Products CSV with 2000000 records

Products Schema

Index
Name
Description
Brand
Category
Price
Currency
Stock
EAN
Color
Size
Availability
Internal ID

Local Set up to generate files

Python Environments

Create a Python virtual env:

python3 -m venv venv/sample-csv

Activate it

source venv/sample-csv/bin/activate

So you can install dependencies:

pip install -r requirements.txt

Run script

python src/main.py

AI Processing CSV Samples

The generator also creates datasets for testing AI workflows on spreadsheet rows:

support-tickets: ticket classification, sentiment, and priority routing.
customer-reviews: sentiment analysis and topic classification.
messy-company-data: company name cleanup and industry classification.
product-catalog-ai: ecommerce classification, translation, and attribute extraction.
product-translation-ai: AI translation testing with realistic product names, product descriptions, feature bullets, glossary terms, and target languages.
lead-scoring-ai: ICP fit and lead scoring prompts.
web-page-extraction-ai: structured extraction from page text.
research-questions-ai: AI agent and web research prompt testing.

AI datasets include synthetic Expected ... columns. Use them to compare prompt output with reference labels during testing.

Broken CSV Fixtures

The broken_csv generator creates intentionally malformed CSV files for parser and repair-tool testing:

broken encodings (Windows-1252, ISO-8859-1)
mixed or wrong delimiters
unescaped and missing quotes
unquoted multiline fields
ragged rows
duplicate headers
mixed line endings

When the repair is deterministic, a clean version is generated in files/broken_csv/expected-fixed.

Tests

Create and activate a local virtual environment:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run the test suite:

pytest

Generate all configured CSV files and upload manifests:

python src/main.py
python src/broken_csv.py

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sample CSV files

Customers CSV Sample

Customer Schema

People CSV Samples

People Schema

Organizations CSV Samples

Organization Schema

Leads CSV Samples

Lead Schema

Products CSV Samples

Products Schema

Local Set up to generate files

Python Environments

Run script

AI Processing CSV Samples

Broken CSV Fixtures

Tests

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sample CSV files

Customers CSV Sample

Customer Schema

People CSV Samples

People Schema

Organizations CSV Samples

Organization Schema

Leads CSV Samples

Lead Schema

Products CSV Samples

Products Schema

Local Set up to generate files

Python Environments

Run script

AI Processing CSV Samples

Broken CSV Fixtures

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages