|
| 1 | +--- |
| 2 | +title: "<br>Text Mining" |
| 3 | +subtitle: "R case study" |
| 4 | +author: "John Little" |
| 5 | +institute: "Cntr for Data & Viz" |
| 6 | +date: "April 13, 2021" |
| 7 | +output: |
| 8 | + xaringan::moon_reader: |
| 9 | + lib_dir: libs |
| 10 | + css: |
| 11 | + - mystyles/xaringan-themer.css # fonts I do want https://pkg.garrickadenbuie.com/xaringanthemer/articles/xaringanthemer.html |
| 12 | + - mystyles/adirondack/story.css # https://story.xaprb.com/slides/adirondack/ |
| 13 | + - mystyles/adirondack/apron.css # layout |
| 14 | + #- mystyles/adirondack/adirondack.css # fonts I don't want |
| 15 | + - mystyles/adirondack/descartes.css # image positon |
| 16 | + - mystyles/adirondack/tachyons.min.css # color, font weights, boxes |
| 17 | + # - mystyles/adirondack/monoblock.css # part of story/adirondack |
| 18 | + - mystyles/my-theme.css |
| 19 | + nature: |
| 20 | + ratio: '16:9' |
| 21 | + highlightStyle: github |
| 22 | + highlightLines: true |
| 23 | + countIncrementalSlides: true |
| 24 | +--- |
| 25 | + |
| 26 | +```{r setup, include=FALSE} |
| 27 | +options(htmltools.dir.version = FALSE) |
| 28 | +library(tidyverse) |
| 29 | +library(htmltools) |
| 30 | +tagList(rmarkdown::html_dependency_font_awesome()) |
| 31 | +# library(xaringanthemer) # run once; or use the pre-run css found in mystyles (xaringan-themer.css) |
| 32 | +# style_duo_accent( |
| 33 | +# primary_color = "#012169", |
| 34 | +# secondary_color = "#005587", |
| 35 | +# header_font_google = google_font("Josefin Sans"), |
| 36 | +# text_font_google = google_font("Montserrat", "300", "300i"), |
| 37 | +# code_font_google = google_font("Fira Mono") |
| 38 | +# ) |
| 39 | +``` |
| 40 | + |
| 41 | +## Duke University: Land Acknowledgement |
| 42 | + |
| 43 | +.f4[I would like to take a moment to honor the land in Durham, NC. Duke University sits on the ancestral lands of the Shakori, Eno and Catawba people. This institution of higher education is built on land stolen from those peoples. These tribes were here before the colonizers arrived. Additionally this land has borne witness to over 400 years of the enslavement, torture, and systematic mistreatment of African people and their descendants. Recognizing this history is an honest attempt to breakout beyond persistent patterns of colonization and to rewrite the erasure of Indigenous and Black peoples. There is value in acknowledging the history of our occupied spaces and places. I hope we can glimpse an understanding of these histories by recognizing the origins of collective journeys.] |
| 44 | + |
| 45 | + |
| 46 | +--- |
| 47 | + |
| 48 | +layout: true |
| 49 | + |
| 50 | +.footercc[ |
| 51 | +<i class="fab fa-creative-commons"></i> <i class="fab fa-creative-commons-by"></i><i class="fab fa-creative-commons-nc"></i> <a href = "https://JohnLittle.info"><span class = "opacity30">https://</span>JohnLittle<span class = "opacity30">.info</span></a> |
| 52 | +<span class = "opacity30"> | <a href="https://github.com/libjohn/workshop_textmining">https://github.com/libjohn/workshop_textmining</a> | `r Sys.Date()` </span> |
| 53 | +] |
| 54 | + |
| 55 | +--- |
| 56 | + |
| 57 | +## Demonstration Goals |
| 58 | + |
| 59 | +- Gather some tweets |
| 60 | + |
| 61 | +- Define APIs and the Twitter Developer portal (Academic Use) |
| 62 | + |
| 63 | +- Rudimentary text analysis and visualization |
| 64 | + |
| 65 | +- Point out useful documentation / resources |
| 66 | + |
| 67 | + |
| 68 | +*** |
| 69 | + |
| 70 | +.f6.i.moon-gray.center[This is not a text analysis workshop. The foundations of text analysis require considerably more time that we have. |
| 71 | +This is a demonstration on leveraging the following tidy packages (tidyverse, and tidytext) and sharing resources. ] |
| 72 | + |
| 73 | + |
| 74 | +--- |
| 75 | + |
| 76 | +class: img-right-full |
| 77 | + |
| 78 | + |
| 79 | + |
| 80 | +# Three tenets |
| 81 | + |
| 82 | + |
| 83 | +- Just numbers |
| 84 | +- Benefits of review |
| 85 | +- Dashboard fatigue is a real thing |
| 86 | + |
| 87 | + |
| 88 | +??? |
| 89 | + |
| 90 | +- The implications of dashboard fatigue might be the most interesting thing to discuss in the QA |
| 91 | + |
| 92 | +--- |
| 93 | +layout: false |
| 94 | +class: img-left-full |
| 95 | + |
| 96 | + |
| 97 | + |
| 98 | +## Drivers |
| 99 | + |
| 100 | +- Goal: create a dashboard of workshop attendance |
| 101 | +- CDVS motivated by the possibility of exploring data |
| 102 | +- Dashboard can be the basis of another workshop |
| 103 | + |
| 104 | +.footercc[ |
| 105 | +<i class="fab fa-creative-commons"></i> <i class="fab fa-creative-commons-by"></i><i class="fab fa-creative-commons-nc"></i> <a href = "https://JohnLittle.info"><span class = "opacity30">https://</span>JohnLittle<span class = "opacity30">.info</span></a> |
| 106 | +<span class = "opacity30"> | <a href="https://github.com/libjohn/workshop_textmining">https://github.com/libjohn/workshop_textmining</a> | `r Sys.Date()` </span> |
| 107 | +] |
| 108 | + |
| 109 | +??? |
| 110 | + |
| 111 | +- These are not exactly the best drivers for creating a dashboard. They’re not bad either. |
| 112 | + |
| 113 | + |
| 114 | +--- |
| 115 | +layout: false |
| 116 | +class: middle, center |
| 117 | + |
| 118 | +<br> |
| 119 | + |
| 120 | +.bg-washed-blue.b--navy.ba.bw2.br3.shadow-5.ph4.mt5[ |
| 121 | + |
| 122 | + |
| 123 | + |
| 124 | +## John R Little |
| 125 | + |
| 126 | +.prussian[ |
| 127 | +.f5[Data Science Librarian |
| 128 | +Center for Data & Visualization Sciences |
| 129 | +Duke University Libraries |
| 130 | +] |
| 131 | +] |
| 132 | + |
| 133 | +.f7[https://johnlittle.info |
| 134 | +https://Rfun.library.duke.edu |
| 135 | +https://library.duke.edu/data |
| 136 | +] |
| 137 | +] |
| 138 | + |
| 139 | + |
| 140 | + |
| 141 | +<i class="fab fa-creative-commons fa-2x"></i> <i class="fab fa-creative-commons-by fa-2x"></i><i class="fab fa-creative-commons-nc fa-2x"></i> |
| 142 | +.f6.moon-gray[Creative Commons: Attribution-NonCommercial 4.0] |
| 143 | +.f7.moon-gray[https://creativecommons.org/licenses/by-nc/4.0] |
| 144 | + |
| 145 | +--- |
| 146 | +class: inverse |
| 147 | + |
| 148 | +# Appendix |
| 149 | + |
| 150 | +## screen shots |
| 151 | + |
| 152 | +--- |
| 153 | +layout: true |
| 154 | + |
| 155 | +.footercc[ |
| 156 | +<i class="fab fa-creative-commons"></i> <i class="fab fa-creative-commons-by"></i><i class="fab fa-creative-commons-nc"></i> <a href = "https://JohnLittle.info"><span class = "opacity30">https://</span>JohnLittle<span class = "opacity30">.info</span></a> |
| 157 | +<span class = "opacity30"> | <a href="https://github.com/libjohn/workshop_textmining">https://github.com/libjohn/workshop_textmining</a> | `r Sys.Date()` </span> |
| 158 | +] |
| 159 | + |
| 160 | +--- |
| 161 | + |
| 162 | + |
| 163 | + |
| 164 | +## Technology stack |
| 165 | + |
| 166 | + |
| 167 | + |
| 168 | +- R |
| 169 | + - R is a data-first coding language |
| 170 | + - R can be a universal interface for analysis and workflow |
| 171 | +- Tidyverse is a well developed approach to workflow & the data lifecycle |
| 172 | +- Bias towards enabling reproducibility |
| 173 | + - scripting |
| 174 | + - reporting |
| 175 | + |
| 176 | + |
| 177 | + |
| 178 | + |
| 179 | + |
| 180 | + |
| 181 | + |
| 182 | + |
| 183 | + |
| 184 | +??? |
| 185 | + |
| 186 | +- Reuse analysis code to produce reports, email alerts, interactive dashboards, etc. |
| 187 | + |
| 188 | +--- |
| 189 | + |
| 190 | +## Lesson |
| 191 | + |
| 192 | +.fl-10.w-60.bg.b.ba.bw1.br3.shadow-5.ph4.mt4.center.prussian[The last thing you should do is |
| 193 | +build the dashboard |
| 194 | +] |
| 195 | + |
| 196 | +- Identify target audience and scope |
| 197 | +- Create summary reports |
| 198 | +- Build a static analysis |
| 199 | +- Generate push-reports based on dynamic thresholds |
| 200 | +- Advanced: Build a reporting application |
| 201 | + |
| 202 | +??? |
| 203 | +Or, in this case, build a workshop attendance application |
| 204 | + |
| 205 | +--- |
| 206 | +## Other important question(s) |
| 207 | + |
| 208 | +- If developing the dashboard in R... |
| 209 | + - Flexdashboard (dashboards) |
| 210 | + - Shiny (Web applications) |
| 211 | + |
| 212 | +Not mutually exclusive but Flexdashboards has a significantly lower barrier to entry |
| 213 | + |
| 214 | +.center[] |
| 215 | + |
| 216 | +--- |
| 217 | +## Actual Goals |
| 218 | + |
| 219 | +- Host **cleaned and disaggregated data** |
| 220 | + |
| 221 | +- Provide a **summary of attendance** |
| 222 | + |
| 223 | + |
| 224 | + |
| 225 | + |
| 226 | + |
| 227 | +??? |
| 228 | + |
| 229 | +- Host **cleaned and disaggregated data** |
| 230 | + - A data archive for clean data |
| 231 | + - exported from the SpringShare registration system |
| 232 | + - accounts for attendance |
| 233 | +- Provide a **summary of attendance** so that staff can |
| 234 | + - Assess their workshop’s impact over time (as measured by attendance and registration) |
| 235 | + - See current semester attendance totals within the context of multi-year totals |
| 236 | + |
| 237 | +--- |
| 238 | +class: center |
| 239 | + |
| 240 | + |
| 241 | + |
| 242 | + |
| 243 | + |
| 244 | + |
| 245 | +.prussian[ |
| 246 | +.absolute.w-5-12th.pa-3.l-4-12th.t-8-12th.b.ba.bw-4.br-4.shadow-5.bg-white-80[ |
| 247 | +Collage of dashboard screens |
| 248 | +] |
| 249 | +] |
| 250 | + |
| 251 | + |
0 commit comments