This repo contains various datasets in Georgian for NLP or other purposes. These are entire text of "The Knight with the Panther skin" vefxistyaosani.txt, Georgian aphorisms aforizmebi.txt, first and last names of Georgian poets and writers poetswriters.txt, baby names in Georgian names.csv (© kids.ge), and full Georgian Alphabet anbani.csv with corresponding descriptions of the letters as it appears in Unicode.
Some of these datasets were fed to Neural Networks (char-rnn by Andrej Karpathy) to generate fake data, such as fake-aforizmebi.txt, fake-names.txt trained on Georgian (origin) subset, fake-poetswriters.txt.
| Name | Description | Source | Lines | URL |
|---|---|---|---|---|
| vefxistyaosani.csv | Labeled text of "The Knight with the Panther skin" | 6678 | GET | |
| quotes.csv | Quotes from 184 famous people in Georgian | ka.wikiquote.org | 3683 | GET |
| aforizmebi.txt | Georgian aphorisms | various sources | 132 | GET |
| poetswriters.txt | First and Last names of Georgian Poets and Writers | ka.wikipedia.org | 544 | GET |
| names.csv | Baby names in Georgian with various origins | kids.ge © | 2094 | GET |
| anbani.csv | Full Georgian alphabet with descriptions and char codes | unicode.org | 175 | GET |
| vefxistyaosani.txt | Raw text of "The Knight with the Panther skin" | 8524 | GET |
| Name | Description | Source | Lines | URL |
|---|---|---|---|---|
| fake-aforizmebi.txt | Georgian aphorisms generated using char-rnn | anbani.db | 17047 | GET |
| fake-poetswriters.txt | Fake poetic names trained on Georgian poets and writers | anbani.db | 2514 | GET |
| fake-names.csv | Fake names trained on Georgian subset of baby names | anbani.db | 60961 | GET |
| fake-vefxistyaosani.txt | Char-RNN mimicking Shota Rustaveli (not well) | anbani.db | 26032 | GET |
Here are some of the resources you might like.
Fake Georgian text and names generation is supported by anbani.js - a multifunctional Javascript library for working with Georgian Alphabet. Read more about the package here [anbani / anbani.js]
npm install anbanivar anbani = require('anbani')
anbani.core.convert("ანბანი", "მხედრული", "ასომთავრული")
// 'ႠႬႡႠႬႨ'
anbani.lorem.names(3)
// ['დამერ გაშვითელი', 'სიბო ყორთელია', 'გიმოლ ვაწოშვილი']
anbani.lorem.sentences(10)
// 'მოეხვიდეს სიტირენ გიშიხარნი. წეითო გამიზრიან, ჰქონთავისთან გემრუფენ, უკრთებოდემნი მესმანცა მყივნე.'For other awesome Georgian datasets, visit [bumbeishvili / awesome-georgian-datasets]
Datasets are available freely for non-commercial purposes only. For commercial purposes, contact the corresponding source.