+ - 0:00:00
Notes for current slide
Notes for next slide

Durational reduction in English (and Polish) words shows a contextual frequency effect

APAP 2019

Kamil Kaźmierski, WA at AMU in Poznań

June 21st, 2019

1 / 35

Overall frequency effect

  • Higher lexical frequency → Shorter duration (e.g. Jurafsky et al. 2001)

    • Perhaps caused by speakers: frequency of use causes articulatory reduction

    • Perhaps moderated by listeners' expectations: highly frequent forms don't have to be pronounced carefully

  • Such reduction could result from online computation, performed on abstract phonological representations (cf. Levelt et al. 1999)

2 / 35

Contextual frequency effect on variation

Language Effect Source
English Prevocalic word-final /t/ glottalized more in words typically followed by consonants Eddington & Channer (2010)
English Word-final /t,d/ deletion more likely in words typically followed by consonants Raymond et al. (2016)
English Unstressed ING more likely to be /ɪn/ in words frequently occurring in /ɪn/ favoring contexts Forrest (2017)
Spanish Latin /fV-/ words frequently occurring after word-final non-high vowels likely to be /ØV-/ in MSS Brown & Raymond (2012)
New Mexican Spanish Word-initial /s/ more likely to be reduced ([s][h][Ø]) in words often preceded by word-final non-high vowels Raymond & Brown (2012)
English Words typically predictable reduce in duration more Seyfarth (2014)
3 / 35

A table formatted this way will look OK

html 'entities' taken from here: https://onlineutf8tools.com/convert-utf8-to-html-entities

Predictability and Informativity

4 / 35

Predictability

How likely is the current word given the word the speaker said right before it?

  • Transitional/conditional probability (cf. Jurafsky et al. 2001): P(wi|wi1)=C(wi1wi)C(wi1)

  • One problem: bigrams with 0 occurrences in the corpus

  • A solution: add 1 to each bigram count

  • A better solution: Modified Kneser-Ney smoothing (cf. Chen & Goodman 1998) (r-cmscu R package)

5 / 35

High vs. low predictability

nice home

Lower (< 0.001) predictability of home given nice

fortress-like home

Higher (0.412) predictability of home given fortress-like

6 / 35

Higher predictability → More reduction

nice home

Lower predictability of home given nice

Less reduction in home

fortress-like home

Higher predictability of home given fortress-like

More reduction in home

7 / 35

Higher predictability → More reduction

Right-context predictability: How likely is this word, given the word the speaker is about to say next?

home wrecker

Higher predictability of home given wrecker

More reduction in home

home course

Lower predictability home given course

Less reduction in home

8 / 35

Two different predictability "profiles"

9 / 35

Informativity :: How unpredictable from its context is this word on average?

10 / 35

Calculating Informativity

P(W=w|C=ci)

11 / 35

Take Kneser-Ney smoothed probability of a particular word token in a particular context

Calculating Informativity

logP(W=w|C=ci)

12 / 35

Seyfarth used bans (base 10 logarithm) as is apparently usual, I used natural logarithm (R's default) in nsp, corrected it to bans for gpsc

Calculating Informativity

i=1NlogP(W=w|C=ci)

13 / 35

Calculating Informativity

1Ni=1NlogP(W=w|C=ci)

14 / 35

Calculating Informativity

1Ni=1NlogP(W=w|C=ci)

Steven T. Piantadosi, Harry Tily, & Edward Gibson 2011. "Word lengths are optimized for efficient communicationTitle", Proceedings of the National Academy of Sciences 108(9), 3526-3529.

15 / 35

Informativity vs. Frequency

16 / 35

Predictability vs. Informativity

Word frequently occurs in low-predictability contexts → high informativity

17 / 35

Predictability vs. Informativity

Word frequently occurs in low-predictability contexts → high informativity

Word frequently occurs in high-predictability contexts → low informativity

17 / 35

Target study: Seyfarth (2014)

  • Durational reduction in Buckeye (Pitt et al. 2007) and Switchboard-1 Release 2 (Calhoun et al. 2009; Godfrey & Holliman 1997)

  • Predictability and informativity estimateted from Fisher Part 2 corpus (Cieri et al. 2005)

  • Findings:

    • Higher left-context and right-context predictability → More reduction

    • Higher right-context (both corpora) or left-context (Switchboard) informativity → Less reduction

  • Implications:

    • Predictability and reduction: could be an online effect

    • Informativity and reduction: suggests storage of reduced forms

18 / 35

Research questions

RQ1: Do the results of Seyfarth (2014) replicate on another English dataset?

19 / 35

Research questions

RQ1: Do the results of Seyfarth (2014) replicate on another English dataset?

RQ2: Do these effects generalize to Polish?

19 / 35

Method

20 / 35

Source of English data

Nationwide Speech Project Corpus

(Clopper & Pisoni 2006)

21 / 35

Model architecture (N = 7,158)

  • Response:
    • Word duration
22 / 35

Model architecture (N = 7,158)

  • Response:
    • Word duration
  • Predictors of theoretical interest:
    • Predictability given previous, Informativity given previous, Predictability given following, Informativity given following
22 / 35

Model architecture (N = 7,158)

  • Response:
    • Word duration
  • Predictors of theoretical interest:
    • Predictability given previous, Informativity given previous, Predictability given following, Informativity given following
  • "Control" predictors:
    • Part of speech, Orthographic length, No. of syllables, Dialect, Average speaking rate, Rate deviation
22 / 35

Model architecture (N = 7,158)

  • Response:
    • Word duration
  • Predictors of theoretical interest:
    • Predictability given previous, Informativity given previous, Predictability given following, Informativity given following
  • "Control" predictors:
    • Part of speech, Orthographic length, No. of syllables, Dialect, Average speaking rate, Rate deviation
  • Random terms:
    • (1|Word), (1 + Informativity given following + Informativity given previous | Speaker)
22 / 35

Results

23 / 35

Predictability given following

β = -0.029, p < 0.001

24 / 35

Informativity given following

β = 0.025, p < 0.001

25 / 35

Source of Polish data (N = 10,588)

Greater Poland Spoken Corpus

wa.amu.edu.pl/korpuswlkp

(Kaźmierski, Kul & Zydorowicz in press)

26 / 35

Predictability given previous

β = -0.007, p < 0.001

27 / 35

Predictability given following

β = -0.016, p < 0.001

28 / 35

Informativity given following

β = 0.02, p < 0.001

29 / 35

Summary of results

RQ1: Do the results of Seyfarth (2014) replicate on another English dataset?

✓ Yes

Differences: predictability given previous; informativity given previous (Switchboard)

30 / 35

Summary of results

RQ2: Do these effects generalize to Polish?

✓ Yes

local predictability given both previous and following word, as well as informativity given following word are significant predictors of word duration in Polish words

31 / 35

Conclusions

→ The effect of local predictability is stronger for right-hand context than left-hand context

→ On top of local predictability, both English and Polish show the effect of informativity

→ The latter effect suggests phonological storage of reduced forms

32 / 35

Thank you!

Right-context informativity influences word durations in English and in Polish

kamil.kazmierski@wa.amu.edu.pl

33 / 35

Model summary: English

34 / 35

Model summary: Polish

35 / 35

Overall frequency effect

  • Higher lexical frequency → Shorter duration (e.g. Jurafsky et al. 2001)

    • Perhaps caused by speakers: frequency of use causes articulatory reduction

    • Perhaps moderated by listeners' expectations: highly frequent forms don't have to be pronounced carefully

  • Such reduction could result from online computation, performed on abstract phonological representations (cf. Levelt et al. 1999)

2 / 35
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow