|
1 | 1 | # SimpleITKSpellChecking |
2 | 2 |
|
| 3 | + |
| 4 | + |
| 5 | + |
3 | 6 | A script that automatically spell checks the comments of a code base. |
4 | 7 | It is intended to be run on the SimpleITK and ITK code bases. |
5 | 8 |
|
6 | 9 | Here is how it is typically run: |
7 | 10 |
|
8 | 11 | python codespell.py --exclude Ancillary $SIMPLEITK_SOURCE_DIR/Code |
9 | 12 |
|
10 | | -This will recursively find all the '.h' files in a directory, extract |
11 | | -the C/C++ comments from the code and run a spell checker on them. |
12 | | -The '--exclude' flag tells the script to ignore any file that has |
13 | | -'Ancillary' in it's full path name. This flag will accept any |
| 13 | +This command will recursively find all the '.h' files in a directory, |
| 14 | +extract the C/C++ comments from the code, and run a spell checker on them. |
| 15 | +The **'--exclude'** flag tells the script to ignore any file that has |
| 16 | +'Ancillary' in its full path name. This flag will accept any |
14 | 17 | regular expression. |
15 | 18 |
|
16 | | -In addition to pyenchant's English dictionary, we use the words in |
17 | | -**additional_dictionary.txt**. These are proper names and technical |
18 | | -terms harvest by hand from SimpleITK and ITK. |
| 19 | +In addition to pyenchant's English dictionary, we use the words in |
| 20 | +**additional_dictionary.txt**. These words are proper names and |
| 21 | +technical terms harvest by hand from the SimpleITK and ITK code bases. |
19 | 22 |
|
20 | | -In addition to checking each word against the dictionary, if a word |
21 | | -fails, we try two additional checks. |
| 23 | +If a word is not found in the dictionaries, we try two additional checks. |
22 | 24 |
|
23 | | -First, if the word starts with some know prefix, the prefix is removed |
24 | | -and the remaining word is checked. The prefixes currently checked |
25 | | -are 'sitk', 'itk', and 'vtk'. Additional prefixes can be specified |
26 | | -with the '--prefix' command line argument. |
| 25 | +1. If the word starts with some known prefix, the prefix is removed |
| 26 | +...and the remaining word is checked against the dictionary. The prefixes |
| 27 | +...used by default are **'sitk'**, **'itk'**, and **'vtk'**. Additional |
| 28 | +...prefixes can be specified with the **'--prefix'** command line argument. |
27 | 29 |
|
28 | | -Second, we attempt to split the word by capitalization and check each |
29 | | -sub-word. This is an attempt to detect camel-case words such as |
30 | | -'GetArrayFromImage', which would get split into 'Get', 'Array', 'From', |
31 | | -and 'Image'. Camel-case words are very commonly used for code elements. |
| 30 | +2. We attempt to split the word by capitalization and check each |
| 31 | +...sub-word against the dictionary. This method is an attempt to detect |
| 32 | +...camel-case words such as 'GetArrayFromImage', which would get split into |
| 33 | +...'Get', 'Array', 'From', and 'Image'. Camel-case words are very commonly |
| 34 | +...used for code elements. |
0 commit comments