Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/source/LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
orphan: true
---
```
{include} ../../LICENSE.md
```

101 changes: 0 additions & 101 deletions docs/source/code-docs.rst

This file was deleted.

30 changes: 30 additions & 0 deletions docs/source/code-docs/annotation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
Annotation
##########

Data Preparation
================

Preliminary Page Set Creation
------------------------------
.. automodule:: corppa.poetry_detection.annotation.create_pageset
.. Note: not including members for method docs, only top-level script usage

Add Metadata
------------
.. automodule:: corppa.poetry_detection.annotation.add_metadata
.. Note: not including members for method docs, only top-level script usage

Annotation Recipes
==================
.. automodule:: corppa.poetry_detection.annotation.annotation_recipes
.. Note: not including members for method docs, only top-level script usage

Command Recipes
===============
.. automodule:: corppa.poetry_detection.annotation.command_recipes
.. Note: not including members for method docs, only top-level script usage

Process Adjudication Data
=========================
.. automodule:: corppa.poetry_detection.annotation.process_adjudication_data
.. Note: not including members for method docs, only top-level script usage
10 changes: 10 additions & 0 deletions docs/source/code-docs/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Code Documentation
##################

.. toctree::
:maxdepth: 2

ocr
utils
annotation
poetry-detection
12 changes: 12 additions & 0 deletions docs/source/code-docs/ocr.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
OCR
###

.. automodule:: corppa.ocr.gvision_ocr
:members:


Collate Texts
=============
.. automodule:: corppa.ocr.collate_txt
.. Note: not including the members for the method docs, *but* we should we
.. make the top-level comment better.
29 changes: 29 additions & 0 deletions docs/source/code-docs/poetry-detection.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
Poetry Detection
################

Core objects
============

.. automodule:: corppa.poetry_detection.core
:members:

Reference Corpora
=================
.. automodule:: corppa.poetry_detection.ref_corpora
:members:



Scripts
=======

refmatcha
---------

.. automodule:: corppa.poetry_detection.refmatcha

Merge excerpts
--------------

.. automodule:: corppa.poetry_detection.merge_excerpts
:members:
27 changes: 27 additions & 0 deletions docs/source/code-docs/utils.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
Utils
#####

Filter Utility
==============
.. automodule:: corppa.utils.filter
.. Note: not including members for method docs, only top-level script usage

Path Utilities
==============
.. automodule:: corppa.utils.path_utils
:members:

Generate PPA Page Set
=====================
.. automodule:: corppa.utils.generate_page_set
.. Note: not including members for method docs, only top-level script usage

Add Image (Relative) Paths
==========================
.. automodule:: corppa.utils.add_image_relpaths
.. Note: not including members for method docs, only top-level script usage

Build Text Corpus
=================
.. automodule:: corppa.utils.build_text_corpus
.. Note: not including members for method docs, only top-level script usage
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from corppa import __version__

project = "corppa"
copyright = "2024,2025 Center for Digital Humanities, Princeton University"
copyright = "2024—2026 Center for Digital Humanities, Princeton University"
author = "Center for Digital Humanities RSE Team, Princeton University"
release = __version__

Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,5 @@ This repository is research software developed as part of the `Ends of Prosody <

Overview <readme.md>
Developer Notes <dev-notes.md>
code-docs
code-docs/index
eop-docs
33 changes: 24 additions & 9 deletions src/corppa/poetry_detection/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

from Bio.Align import PairwiseAligner

# Table of supported detection methods and their corresponding prefixes
#: Supported detection methods with corresponding prefixes
DETECTION_METHODS = {
"adjudication": "a",
"manual": "m",
Expand All @@ -27,8 +27,11 @@ class Span:
Span object representing a Pythonic "closed open" interval
"""

#: start index
start: int
#: end index
end: int
#: label for the span
label: str

def __post_init__(self):
Expand Down Expand Up @@ -87,12 +90,13 @@ def overlap_factor(self, other: "Span", ignore_label: bool = False) -> float:

def field_real_type(field_type) -> type:
"""Return the real type for a dataclass field type annotation.
For unions or optional values (e.g. `Optional[int]`), returns the first
non-None type; for type aliases (e.g. `set[str]`, returns the original type
For unions or optional values (e.g. ``Optional[int]``), returns the first
non-None type; for type aliases (e.g. ``set[str]``), returns the original type
that was used to create the alias. For example:
- int -> int
- Optional[int] -> int
- set[str] -> set

- ``int`` -> ``int``
- ``Optional[int]`` -> ``int``
- ``set[str]`` -> ``set``
"""
# if it's a regular type, return unchanged
if isinstance(field_type, type):
Expand Down Expand Up @@ -143,16 +147,21 @@ class Excerpt:
"""

# PPA page related
#: page id
page_id: str
#: ppa span start index
ppa_span_start: int
#: ppa span end index
ppa_span_end: int
#: ppa span text
ppa_span_text: str
# Detection methods
#: Detection methods
detection_methods: set[str]
# Optional notes field
#: Optional notes
notes: Optional[str] = None
# Excerpt id, set in post initialization
# Note: Cannot be passed in at initialization
#: excerpt identifier
excerpt_id: str = field(init=False)

def __post_init__(self):
Expand Down Expand Up @@ -336,14 +345,20 @@ class LabeledExcerpt(Excerpt):
"""

# Reference poem related
#: poem id
poem_id: str
#: reference corpus id
ref_corpus: str
#: reference span start index
ref_span_start: Optional[int] = None
#: reference span end index
ref_span_end: Optional[int] = None
#: reference span text
ref_span_text: Optional[str] = None
#: set of alternate poem ids, for merged excerpts with multiple ids
alt_poem_ids: Optional[set[str]] = None

# Identification methods
#: Identification methods
identification_methods: set[str]

def __post_init__(self):
Expand Down
Loading
Loading