Skip to content

Commit ac715a8

Browse files
committed
tokenization
1 parent 8dabd90 commit ac715a8

2 files changed

Lines changed: 430 additions & 0 deletions

File tree

00_tokenization.Rmd

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
title: "unnest tokens"
3+
author: "John Little"
4+
date: "`r Sys.Date()`"
5+
abstract: "<br> This document derived from Chapter 1 of Silge's and Robinson's Text Minning with R<br>https://www.tidytextmining.com/tidytext.html"
6+
output: html_notebook
7+
---
8+
9+
10+
```{r message=FALSE, warning=FALSE}
11+
library(tidyverse)
12+
library(tidytext)
13+
```
14+
15+
16+
## Text
17+
18+
```{r}
19+
text <- c("Because I could not stop for Death -",
20+
"He kindly stopped for me -",
21+
"The Carriage held but just Ourselves -",
22+
"and Immortality")
23+
24+
text
25+
```
26+
27+
## A tidy table
28+
29+
```{r}
30+
text_df <- tibble(line = 1:4, text = text)
31+
32+
text_df
33+
```
34+
35+
36+
## Todenization
37+
38+
```{r}
39+
text_df %>%
40+
unnest_tokens(word, text)
41+
```
42+

00_tokenization.nb.html

Lines changed: 388 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)