## Abstract

Homogeneous Markov chains with discrete state-space and time are very straightforward to simulate, due to the memorylessness property. In this talk, we consider sampling from the normalised restriction of a Markov chains joint distribution to a subset defined by two kinds of constraints: univariate constraints, restricting individual chain variables to belong to certain sets, and multivariate constraints, restricting non-adjacent groups of chain variables to be unequal but equivalent under a predefined equivalence relation. The multivariate constraints, in particular, present some challenges. An efficient sampling method involving Metropolis-Hastings with some pre-tabulated distributions is described.

We demonstrate an application to text generation, and in particular the random generation of rhyming, scanning lyrics, as a component of an algorithmic songwriting project. In this setting, the states of the Markov chain are syllables (that know which words they are from, so that for instance the final syllables of perilous and marvellous are two different states). The univariate constraints enforce the metre of the text, by indicating which syllables must be stressed or unstressed (for correct scansion) and which syllables must be word-final (so that words do not straddle lines). The multivariate constraints enforce a rhyming scheme, by requiring certain groups of syllables to rhyme without being equal.

This work relies on the Carnegie Mellon Pronouncing Dictionary (CMUdict; Weide, 1998) and sample text. CMUdict contains over 130000 entries, comprising words, proper nouns et cetera as used in spoken American English. This dictionary indicates the phonemic and stress pronunciation of each entry.

A Markov chain including the entire dictionary would require over 330000 syllable states. We have used sample texts, both to reduce the state space, and to build a first-order Markov model of word sequence, or equivalently, syllable sequence. The problem of interest is to sample from the Markov chain, subject to the constraints imposed by a given rhyming scheme.

Our algorithm firstly samples the multivariately constrained chain variables. After location of a feasible subsequence of rhymed syllables, ratios of joint probabilities are calculated and compared in a Metropolis-Hastings algorithm, where each candidate arises from the incumbent by replacing one set of rhyming syllables with another drawn from a pre-tabulated distribution. After a burn-in period, the skeleton provided by the rhymed syllables is fleshed out by sampling the unrhymed syllables according to the Markov chain, subject to metrical and boundary conditions.

We demonstrate an application to text generation, and in particular the random generation of rhyming, scanning lyrics, as a component of an algorithmic songwriting project. In this setting, the states of the Markov chain are syllables (that know which words they are from, so that for instance the final syllables of perilous and marvellous are two different states). The univariate constraints enforce the metre of the text, by indicating which syllables must be stressed or unstressed (for correct scansion) and which syllables must be word-final (so that words do not straddle lines). The multivariate constraints enforce a rhyming scheme, by requiring certain groups of syllables to rhyme without being equal.

This work relies on the Carnegie Mellon Pronouncing Dictionary (CMUdict; Weide, 1998) and sample text. CMUdict contains over 130000 entries, comprising words, proper nouns et cetera as used in spoken American English. This dictionary indicates the phonemic and stress pronunciation of each entry.

A Markov chain including the entire dictionary would require over 330000 syllable states. We have used sample texts, both to reduce the state space, and to build a first-order Markov model of word sequence, or equivalently, syllable sequence. The problem of interest is to sample from the Markov chain, subject to the constraints imposed by a given rhyming scheme.

Our algorithm firstly samples the multivariately constrained chain variables. After location of a feasible subsequence of rhymed syllables, ratios of joint probabilities are calculated and compared in a Metropolis-Hastings algorithm, where each candidate arises from the incumbent by replacing one set of rhyming syllables with another drawn from a pre-tabulated distribution. After a burn-in period, the skeleton provided by the rhymed syllables is fleshed out by sampling the unrhymed syllables according to the Markov chain, subject to metrical and boundary conditions.

Original language | English |
---|---|

Title of host publication | 22nd International Congress on Modelling and Simulation |

Editors | G. Syme, D. Hatton MacDonald, B. Fulton, J. Piantodosi |

Place of Publication | Hobart |

Publisher | Modelling & Simulation Society Australia & New Zealand |

Pages | 466-472 |

Number of pages | 7 |

ISBN (Electronic) | 9780987214379 |

Publication status | Published - Dec 2017 |

Event | International Congress on Modelling and Simulation (22nd : 2017) - Hobart, Australia Duration: 3 Dec 2017 → 8 Dec 2017 |

### Conference

Conference | International Congress on Modelling and Simulation (22nd : 2017) |
---|---|

Country/Territory | Australia |

City | Hobart |

Period | 3/12/17 → 8/12/17 |

## Keywords

- Constrained Markov chain
- Metropolis
- rhyme
- text generation