Skip to content

Commit 3d3b1e2

Browse files
committed
appa
1 parent e240585 commit 3d3b1e2

7 files changed

Lines changed: 973 additions & 282 deletions

00_tokenization.Rmd

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ library(tidytext)
1515

1616
## Text
1717

18+
Poem by Emily Dickinson
19+
1820
```{r}
1921
text <- c("Because I could not stop for Death -",
2022
"He kindly stopped for me -",
@@ -33,7 +35,7 @@ text_df
3335
```
3436

3537

36-
## Todenization
38+
## Tokenization
3739

3840
```{r}
3941
text_df %>%

00_tokenization.nb.html

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,7 @@ <h4 class="date">2021-04-14</h4>
265265
<!-- rnb-text-begin -->
266266
<div id="text" class="section level2">
267267
<h2>Text</h2>
268+
<p>Poem by Emily Dickinson</p>
268269
<!-- rnb-text-end -->
269270
<!-- rnb-chunk-begin -->
270271
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxudGV4dCA8LSBjKFwiQmVjYXVzZSBJIGNvdWxkIG5vdCBzdG9wIGZvciBEZWF0aCAtXCIsXG4gICAgICAgICAgXCJIZSBraW5kbHkgc3RvcHBlZCBmb3IgbWUgLVwiLFxuICAgICAgICAgIFwiVGhlIENhcnJpYWdlIGhlbGQgYnV0IGp1c3QgT3Vyc2VsdmVzIC1cIixcbiAgICAgICAgICBcImFuZCBJbW1vcnRhbGl0eVwiKVxuXG50ZXh0XG5gYGAifQ== -->
@@ -301,8 +302,8 @@ <h2>A tidy table</h2>
301302
<!-- rnb-chunk-end -->
302303
<!-- rnb-text-begin -->
303304
</div>
304-
<div id="todenization" class="section level2">
305-
<h2>Todenization</h2>
305+
<div id="tokenization" class="section level2">
306+
<h2>Tokenization</h2>
306307
<!-- rnb-text-end -->
307308
<!-- rnb-chunk-begin -->
308309
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxudGV4dF9kZiAlPiVcbiAgdW5uZXN0X3Rva2Vucyh3b3JkLCB0ZXh0KVxuYGBgIn0= -->
@@ -321,7 +322,7 @@ <h2>Todenization</h2>
321322
<!-- rnb-text-end -->
322323
</div>
323324

324-
<div id="rmd-source-code">LS0tDQp0aXRsZTogInVubmVzdCB0b2tlbnMiDQphdXRob3I6ICJKb2huIExpdHRsZSINCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCINCmFic3RyYWN0OiAiPGJyPiBUaGlzIGRvY3VtZW50IGRlcml2ZWQgZnJvbSBDaGFwdGVyIDEgb2YgU2lsZ2UncyBhbmQgUm9iaW5zb24ncyBUZXh0IE1pbm5pbmcgd2l0aCBSPGJyPmh0dHBzOi8vd3d3LnRpZHl0ZXh0bWluaW5nLmNvbS90aWR5dGV4dC5odG1sIg0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCg0KDQpgYGB7ciBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpsaWJyYXJ5KHRpZHl0ZXh0KQ0KYGBgDQoNCg0KIyMgVGV4dA0KDQpgYGB7cn0NCnRleHQgPC0gYygiQmVjYXVzZSBJIGNvdWxkIG5vdCBzdG9wIGZvciBEZWF0aCAtIiwNCiAgICAgICAgICAiSGUga2luZGx5IHN0b3BwZWQgZm9yIG1lIC0iLA0KICAgICAgICAgICJUaGUgQ2FycmlhZ2UgaGVsZCBidXQganVzdCBPdXJzZWx2ZXMgLSIsDQogICAgICAgICAgImFuZCBJbW1vcnRhbGl0eSIpDQoNCnRleHQNCmBgYA0KDQojIyBBIHRpZHkgdGFibGUNCg0KYGBge3J9DQp0ZXh0X2RmIDwtIHRpYmJsZShsaW5lID0gMTo0LCB0ZXh0ID0gdGV4dCkNCg0KdGV4dF9kZg0KYGBgDQoNCg0KIyMgVG9kZW5pemF0aW9uDQoNCmBgYHtyfQ0KdGV4dF9kZiAlPiUNCiAgdW5uZXN0X3Rva2Vucyh3b3JkLCB0ZXh0KQ0KYGBgDQoNCg==</div>
325+
<div id="rmd-source-code">LS0tDQp0aXRsZTogInVubmVzdCB0b2tlbnMiDQphdXRob3I6ICJKb2huIExpdHRsZSINCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCINCmFic3RyYWN0OiAiPGJyPiBUaGlzIGRvY3VtZW50IGRlcml2ZWQgZnJvbSBDaGFwdGVyIDEgb2YgU2lsZ2UncyBhbmQgUm9iaW5zb24ncyBUZXh0IE1pbm5pbmcgd2l0aCBSPGJyPmh0dHBzOi8vd3d3LnRpZHl0ZXh0bWluaW5nLmNvbS90aWR5dGV4dC5odG1sIg0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCg0KDQpgYGB7ciBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFfQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpsaWJyYXJ5KHRpZHl0ZXh0KQ0KYGBgDQoNCg0KIyMgVGV4dA0KDQpQb2VtIGJ5IEVtaWx5IERpY2tpbnNvbg0KDQpgYGB7cn0NCnRleHQgPC0gYygiQmVjYXVzZSBJIGNvdWxkIG5vdCBzdG9wIGZvciBEZWF0aCAtIiwNCiAgICAgICAgICAiSGUga2luZGx5IHN0b3BwZWQgZm9yIG1lIC0iLA0KICAgICAgICAgICJUaGUgQ2FycmlhZ2UgaGVsZCBidXQganVzdCBPdXJzZWx2ZXMgLSIsDQogICAgICAgICAgImFuZCBJbW1vcnRhbGl0eSIpDQoNCnRleHQNCmBgYA0KDQojIyBBIHRpZHkgdGFibGUNCg0KYGBge3J9DQp0ZXh0X2RmIDwtIHRpYmJsZShsaW5lID0gMTo0LCB0ZXh0ID0gdGV4dCkNCg0KdGV4dF9kZg0KYGBgDQoNCg0KIyMgVG9rZW5pemF0aW9uDQoNCmBgYHtyfQ0KdGV4dF9kZiAlPiUNCiAgdW5uZXN0X3Rva2Vucyh3b3JkLCB0ZXh0KQ0KYGBgDQoNCg==</div>
325326

326327

327328

01_textmining.Rmd

Lines changed: 40 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,12 @@ title: "Refamiliarize with Silge"
33
author: "John Little"
44
date: "`r Sys.Date()`"
55
abstract: "SA = algorithmically mapping the emotion or opinion of a text.\n\n"
6-
output: html_notebook
6+
output:
7+
html_notebook: defult
8+
rmdformats::html_clean:
9+
highlight: kate
10+
lightbox: TRUE
11+
thumbnails: TRUE
712
---
813

914
Find this repository: https://github.com/libjohn/workshop_textmining
@@ -19,6 +24,17 @@ library(tidytext)
1924
library(wordcloud2)
2025
```
2126

27+
```{r echo=FALSE}
28+
htmltools::img(src = knitr::image_uri(here::here("images", "Rfun_logo.png")),
29+
alt = 'Rfun',
30+
style = 'position:absolute; bottom:15px; left:0; padding:5px; border:0px;')
31+
32+
htmltools::img(src = knitr::image_uri(here::here("images", "CDVS-logo_sm_Spring2020.png")),
33+
alt = 'Rfun',
34+
style = 'position:absolute; bottom:0; right:0; padding:5px; border:0px;')
35+
```
36+
37+
2238
## Data
2339

2440
We'll look at some books by [Jane Austen](https://en.wikipedia.org/wiki/Jane_Austen), an 18th century novelist. Austen explored women and marriage within the British upper class. The novelist has a unique and well earned following within literature. Her works is consistently discussed and honored. To this day, Austen's novels are the source of many adaptations, written and on-screen. Through the `janeaustenr` package we can access and mine the text of six Austen novels. We can call the collection of novels a corpra. An individual novel is a corpus.
@@ -116,7 +132,7 @@ bind_rows(get_stopwords(), stopwords_custom) # The default is "snowball"
116132

117133
### Calculate word frequency
118134

119-
How many Austen countable words are there if we remove _snowball_ stop-words? There are `r library(magrittr); matchwords_books %>% distinct(word) %>% nrow()` countable words.
135+
How many Austen countable words are there if we remove _snowball_ stop-words? There are `r nrow(dplyr::distinct(matchwords_books, word))` countable words.
120136

121137
```{r}
122138
matchwords_books %>%
@@ -289,7 +305,7 @@ head(get_sentiments("nrc"))
289305
head(get_sentiments("afinn"))
290306
291307
get_sentiments("nrc") %>%
292-
count(sentiment, sort = TRUE)
308+
count(sentiment, sort = TRUE)
293309
294310
```
295311

@@ -374,8 +390,25 @@ emma_afinn %>%
374390
- Data Visualization with ggplot2: ([video](https://warpwire.duke.edu/w/80YEAA/) | [workshop](https://rfun.library.duke.edu/portfolio/ggplot_workshop/))
375391

376392

377-
## License
393+
---
394+
395+
```{r include=FALSE}
396+
library(htmltools)
397+
tagList(rmarkdown::html_dependency_font_awesome())
398+
```
399+
400+
<center>
401+
[John Little](https://johnlittle.info/)
402+
[Rfun](https://Rfun.library.duke.edu/)
403+
[Center for Data & Visualization Sciences](https://library.duke.edu/data/)
404+
405+
<i class="fab fa-creative-commons fa-2x"></i> &nbsp; <i class="fab fa-creative-commons-by fa-2x"></i><i class="fab fa-creative-commons-nc fa-2x"></i>
406+
407+
CC BY-NC
408+
Creative Commons: Attribution, Non-commercial
409+
https://creativecommons.org/licenses/by-nc/4.0/
410+
</center>
411+
412+
&nbsp;
378413

379-
CC BY-NC
380-
Creative Commons: Attribution, Non-commercial
381-
https://creativecommons.org/licenses/by-nc/4.0/
414+
&nbsp;

01_textmining.html

Lines changed: 837 additions & 0 deletions
Large diffs are not rendered by default.

01_textmining.nb.html

Lines changed: 89 additions & 271 deletions
Large diffs are not rendered by default.

images/CDVS-logo_sm_Spring2020.png

20.1 KB
Loading

images/Rfun_logo.png

10 KB
Loading

0 commit comments

Comments
 (0)