class: center, middle, inverse, title-slide .title[ # Quantitative Methods for LEL ] .subtitle[ ## Week 3 ] .author[ ### Dr Stefano Coretta ] .institute[ ### University of Edinburgh ] .date[ ### 2023/10/03 ] --- <iframe allowfullscreen frameborder="0" height="100%" mozallowfullscreen style="min-width: 500px; min-height: 355px" src="https://app.wooclap.com/events/SQQFXB/questions/651ac0766e9fba81222d2941" width="100%"></iframe> --- ## Data visualisation .center[  ] --- ## Good data visualisation .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ Alberto Cairo has identified four common features of good data visualisation ([Spiegelhalter 2019](https://www.penguin.co.uk/books/294857/the-art-of-statistics-by-spiegelhalter-david/9780241258767):64-66): 1. It contains **reliable information**. 2. The design has been chosen so that relevant **patterns become noticeable**. 3. It is presented in an **attractive** manner, but appearance should not get in the way of **honesty, clarity and depth**. 4. When appropriate, it is organized in a way that **enables some exploration**. ] --- layout: false <iframe allowfullscreen frameborder="0" height="100%" mozallowfullscreen style="min-width: 500px; min-height: 355px" src="https://app.wooclap.com/events/SQQFXB/questions/650c1c85fc9e77ddc2cf39e6" width="100%"></iframe> --- <iframe allowfullscreen frameborder="0" height="100%" mozallowfullscreen style="min-width: 500px; min-height: 355px" src="https://app.wooclap.com/events/SQQFXB/questions/650c1d1c6829de60e682066b" width="100%"></iframe> --- ## Endangerment status ```r glot_status ``` ``` ## # A tibble: 8,345 × 18 ## ID Language_ID Parameter_ID Value Code_ID Comment Source codeReference status Name Macroarea ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <lgl> <fct> <chr> <chr> ## 1 kolp1236… kolp1236 aes 3 aes-sh… Kol (1… hh:he… NA shift… Kol … Papunesia ## 2 tana1288… tana1288 aes 3 aes-sh… Tanahm… hh:he… NA shift… Tana… Papunesia ## 3 touo1238… touo1238 aes 3 aes-sh… Touo (… hh:he… NA shift… Touo Papunesia ## 4 bert1248… bert1248 aes 3 aes-sh… Fadash… hh:he… NA shift… Berta Africa ## 5 sius1254… sius1254 aes 6 aes-ex… Siusla… hh:he… NA extin… Sius… North Am… ## 6 cent2045… cent2045 aes 6 aes-ex… Jalaa … <NA> NA extin… Jalaa Africa ## 7 else1239… else1239 aes 3 aes-sh… Elseng… hh:he… NA shift… Else… Papunesia ## 8 taia1239… taia1239 aes 4 aes-mo… Taiap … hh:he… NA morib… Taiap Papunesia ## 9 pyuu1245… pyuu1245 aes 3 aes-sh… Pyu (4… hh:he… NA shift… Pyu Papunesia ## 10 mato1253… mato1253 aes 6 aes-ex… Arára … hh:he… NA extin… Mato… South Am… ## # ℹ 8,335 more rows ## # ℹ 7 more variables: Latitude <dbl>, Longitude <dbl>, Glottocode <chr>, ISO639P3code <chr>, ## # Countries <chr>, Family_ID <chr>, Language_ID.y <chr> ``` --- ## Bar chart <img src="index_files/figure-html/status-bar-1.png" width="60%" style="display: block; margin: auto;" /> ??? Bar charts are great for counts (of anything). The *x*-axis includes the level of status, while the *y*-axis shows the number of languages per status level. --- layout: true ## Stacked bar chart --- <img src="index_files/figure-html/status-stack-1-1.png" width="60%" style="display: block; margin: auto;" /> ??? In this plot I separated endangered vs non-endangered languages. Within the endangered languages I further show the counts of different status levels. --- <img src="index_files/figure-html/status-stack-2-1.png" width="60%" style="display: block; margin: auto;" /> ??? Here, the *x*-axis corresponds to the language macro-areas in the data. Within each bar, the counts for each of the status levels is given. --- layout: false ## Stacked proportion (filled) bar chart <img src="index_files/figure-html/status-filled-1.png" width="60%" style="display: block; margin: auto;" /> ??? So far we have seen raw counts. What about proportions? You can show proportions by using a "filled" bar chart. Each bar is stretched so that covers the entire range from 0 to 1. Note that proportions are between 0 and 1, while percentages are between 0 and 100%. By default, `geom_bar()` adds a *y* label "counts" so you have to manually change the label to "proportions". --- ## Dot matrix chart <img src="index_files/figure-html/status-matrix-1.png" width="45%" style="display: block; margin: auto;" /> --- ## Mosaic plot <img src="index_files/figure-html/status-mosaic-1.png" width="60%" style="display: block; margin: auto;" /> --- ## Formant values ```r formants ``` ``` ## # A tibble: 24,012 × 22 ## speaker file word time f3 f0 language gender glottocode item ipa c1 c1_phonation ## <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> ## 1 it01 it01-001 pugu 0.111 2280. 137. Italian m ital1282 20 pugu p voiceless ## 2 it01 it01-001 pugu 0.111 2280. 137. Italian m ital1282 20 pugu p voiceless ## 3 it01 it01-001 pugu 0.222 2124. 134. Italian m ital1282 20 pugu p voiceless ## 4 it01 it01-001 pugu 0.222 2124. 134. Italian m ital1282 20 pugu p voiceless ## 5 it01 it01-001 pugu 0.333 2314. 134. Italian m ital1282 20 pugu p voiceless ## 6 it01 it01-001 pugu 0.333 2314. 134. Italian m ital1282 20 pugu p voiceless ## 7 it01 it01-001 pugu 0.444 2374. 135. Italian m ital1282 20 pugu p voiceless ## 8 it01 it01-001 pugu 0.444 2374. 135. Italian m ital1282 20 pugu p voiceless ## 9 it01 it01-001 pugu 0.556 2307. 137. Italian m ital1282 20 pugu p voiceless ## 10 it01 it01-001 pugu 0.556 2307. 137. Italian m ital1282 20 pugu p voiceless ## # ℹ 24,002 more rows ## # ℹ 9 more variables: vowel <chr>, anteropost <chr>, height <chr>, c2 <chr>, c2_phonation <chr>, ## # c2_place <chr>, formant <chr>, value <dbl>, id <chr> ``` --- layout: true ## Line plot --- <img src="index_files/figure-html/forms-line-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/forms-point-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/forms-line-point-1.png" width="60%" style="display: block; margin: auto;" /> --- layout: false ## Infant gestures ```r gestures ``` ``` ## # A tibble: 1,620 × 11 ## dyad background months task gesture count_raw count ct_raw ct pro_rata id ## <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> ## 1 b01 Bengali 10 five reach 5 5 1 1 no 3 ## 2 b01 Bengali 10 five point 0 0 0 0 no 2 ## 3 b01 Bengali 10 five ho_gv 0 0 0 0 no 1 ## 4 b01 Bengali 10 tp1 reach 0 0 0 0 no 6 ## 5 b01 Bengali 10 tp1 point 0 0 0 0 no 5 ## 6 b01 Bengali 10 tp1 ho_gv 0 0 0 0 no 4 ## 7 b01 Bengali 10 tp2 reach 0 0 0 0 no 9 ## 8 b01 Bengali 10 tp2 point 0 0 0 0 no 8 ## 9 b01 Bengali 10 tp2 ho_gv 0 0 0 0 no 7 ## 10 b01 Bengali 11 five reach 7 8 3 3 yes 3 ## # ℹ 1,610 more rows ``` --- layout: true ## More line plots --- <img src="index_files/figure-html/gest-line-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/gest-line-facet-1-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/gest-line-facet-2-1.png" width="60%" style="display: block; margin: auto;" /> --- layout: false layout: true ## Connected dots plot --- <img src="index_files/figure-html/gest-conn-1-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/gest-conn-2-1.png" width="60%" style="display: block; margin: auto;" /> --- layout: false ## Phonetics of politeness ```r polite ``` ``` ## # A tibble: 224 × 27 ## subject gender birthplace musicstudent months_ger scenario task attitude total_duration ## <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <dbl> ## 1 F1 F seoul_area yes 18 6 not inf 55.2 ## 2 F1 F seoul_area yes 18 6 not pol 28.5 ## 3 F1 F seoul_area yes 18 7 not inf 60.3 ## 4 F1 F seoul_area yes 18 7 not pol 40.8 ## 5 F1 F seoul_area yes 18 1 dct pol 18.4 ## 6 F1 F seoul_area yes 18 1 dct inf 13.6 ## 7 F1 F seoul_area yes 18 2 dct pol 5.22 ## 8 F1 F seoul_area yes 18 2 dct inf 4.25 ## 9 F1 F seoul_area yes 18 3 dct pol 6.79 ## 10 F1 F seoul_area yes 18 3 dct inf 4.13 ## # ℹ 214 more rows ## # ℹ 18 more variables: articulation_rate <dbl>, f0mn <dbl>, f0sd <dbl>, f0range <dbl>, inmn <dbl>, ## # insd <dbl>, inrange <dbl>, shimmer <dbl>, jitter <dbl>, HNRmn <dbl>, H1H2 <dbl>, ## # breath_count <dbl>, filler_count <dbl>, hiss_count <dbl>, nasal_count <dbl>, sil_count <dbl>, ## # ya_count <dbl>, yey_count <dbl> ``` --- layout: true ## Strip chart --- <img src="index_files/figure-html/pol-strip-f0-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-strip-hnr-1.png" width="60%" style="display: block; margin: auto;" /> --- layout: false layout: true ## Density plot --- <img src="index_files/figure-html/pol-dens-1-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-dens-2-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-dens-3-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-dens-4-1.png" width="60%" style="display: block; margin: auto;" /> --- layout: false layout: true ## Violin plot --- <img src="index_files/figure-html/pol-vio-1-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-vio-2-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-vio-3-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-vio-4-1.png" width="60%" style="display: block; margin: auto;" /> --- layout: false layout: true ## Scatter plot --- <img src="index_files/figure-html/pol-sca-1-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-sca-2-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-sca-3-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-sca-4-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/pol-sca-5-1.png" width="60%" style="display: block; margin: auto;" /> --- <img src="index_files/figure-html/mald-1-1.png" width="60%" style="display: block; margin: auto;" /> --- layout: false class: center middle reverse ## DO'S AND DON'TS --- layout: true ## DO --- <img src="index_files/figure-html/mald-bar-1-1.png" width="60%" style="display: block; margin: auto;" /> ??? Bar charts should be used for discrete numeric variables, not for continuous variables. --- <img src="index_files/figure-html/mald-bar-2-1.png" width="60%" style="display: block; margin: auto;" /> ??? If you want to show proportions, instead of raw counts, use proportion bar charts (aka filled bar chart). --- <img src="index_files/figure-html/mald-bar-3-1.png" width="60%" style="display: block; margin: auto;" /> ??? To show proportions from multiple subjects/items, use strip charts. --- layout: false ## DON'T <img src="index_files/figure-html/mald-dont-1.png" width="60%" style="display: block; margin: auto;" /> ??? Never ever ever use bar charts with error bars to show mean proportions. They are misleading: - The bars do not indicate a discrete numeric values: mean proportions are continuous variables. - Error bars mask the true variability of the data: show raw proportions instead. For more see: https://www.data-to-viz.com/caveat/error_bar.html, https://stats.stackexchange.com/questions/349422/does-it-make-sense-to-add-error-bars-in-a-bar-chart-of-frequencies/367889#367889 --- ## DO <img src="index_files/figure-html/pol-do-1.png" width="60%" style="display: block; margin: auto;" /> ??? For continuous variables, like acoustic measures or reaction times, use violins with overlaid strip charts. You can include very narrow box plots, but remember that box plots mask variability in the raw data. --- ## DON'T <img src="index_files/figure-html/pol-dont-1.png" width="60%" style="display: block; margin: auto;" /> ??? Can you see what difference it makes to use box plots only? --- ## Summary .bg-washed-blue.b--dark-blue.ba.bw2.br3.shadow-5.ph4.mt2[ - Carefully think about which type of variable you are working with: **continuous or discrete**? - The type of variable allows you to select appropriate types of plots. Your **go-to plots** are: - Bar charts (and variants). - Strip charts. - Line plots. - Density plots. - Violin plots. - Be mindful of the **DOs and DON'Ts** of plotting. ]