Problems With Text Setting

What's up, people.
Miggy Torres here.

I’m officially calling out text setting as problematic. You heard me: text setting is cancelled!

Just kidding.

BUT. I did want to talk a little bit about some issues regarding text setting that have been rolling around in my head for a while, as they relate to both choral music and vocal music in general. I have a few æsthetic questions that I’ve found myself asking again and again, and I wanted to share them with you. Grab your take.

Okay so. The biggest problem—

Well, wait. Actually, I’m gonna go small-to-big.

The Smallest Problem: Lost in Transmutation

The smallest problem. Is that when someone sets a text to music—usually a poem, but can be whatever; can be a newspaper article, can be a fortune cookie, can be anything—there’s information that’s lost. This is especially true with regard to poetry. There’s tons of information there that’s lost as the work gets transmigrated into the realm of music: the rhythm and meter of the verse, for example, are totally lost. Well. Maybe they’re not totally lost, but they’re certainly transformed. And depending on the poem and how it’s treated musically, there’s a tradeoff. As the composer injects the work with musical metaphor, it loses its speech-rhythm identity and takes on a new note-rhythm identity.

For example—and I use this example not because I think it’s a particularly good poem, but because it’s a poem with a setting that, love it or hate it, pretty much everyone reading this knows—in Sleep by Eric Whitacre, Chuck Silvestri’s poem (does he go by Chuck?) is obviously a sort of modification, trope, literary parody of Stopping by Woods on a Snowy Evening, by Bob Frost (does he go by Bob?). Like the original, Sleep is in iambic tetrameter:

the EVE-ning HANGS be-NEATH the MOON.

Da-DUM da-DUM da-DUM da-DUM!

When you listen to the song Sleep, it’s anacrusic—it starts on an upbeat to put the stressed syllable on the following downbeat—, but other than that?

All quarter notes.
It’s all the same rhythm.
Barring cadences, every note is the same length.

Nothing against Sleep. It’s obviously a very effective piece for various reasons. But the natural rhythmic complexities inherent in the spoken verse are completely bulldozed when it’s set to music.1 Incidentally, I should mention that in the case of Sleep, things are a little complicated since the music was originally a setting of Stopping by Woods which then received new words by Tony Silverstri—I’m sure we all know the story. While the meter and rhyme scheme match the original, it’s worth mentioning that these words weren’t set to music in the traditional manner. That is, the sounds of the words themselves weren’t taken into account before the music was written. Rather they were fitted into a casting molded from another—similarly shaped—verse. And I don’t just mean the iambic tetrameter. When reading poetry—unless the meter is super in-your-face—you generally don’t read it as it scans. That is to say, if you were reading the first line of Sleep, you wouldn’t say, “The EVE-ning HANGS be-NEATH the MOON.” In fact, the natural prosodic rhythm would be just as steamrolled—if not more so—if it were set to an inégale, hyper-iambic rhythm. The idea, of course, is just speak it as you would normally, allowing the natural stresses of the line to emerge in counterpoint with the implied meter.

In the case of Sleep—and Stopping by Woods—the meter is emphasized in the opening since the first two lines each encapsulate a full phrase. The pause at the end of each phrase is in consonance with each line break.

The evening hangs beneath the moon,
A silver thread on darkened dune.
With closing eyes and resting head
I know that sleep is coming soon.

The following line, however, flows into the next uninterrupted creating a hemiolic rhythmic counterpoint against the line break (this counterpoint is more noticeable in Bob’s original poem where there’s virtually no noticeable pause in “He will not see me standing here/To watch his woods fill up with snow.” Probably because it makes more sense to start a phrase with “I know” than with “to see,” so the “to see” flows more directly from the previous line). This large-scale rhythmic counterpoint is captured by Whitacre’s setting—he delays the rhythmic cadence until “coming soon”—but there is a smaller-scale counterpoint, between how the line scans and how the line reads, that disappears when the length of every syllable is homogenized. The way the natural rhythm of the text flexes and moves—the way it inflects and fluctuates over the course of a line—is lost.

Below I have three readings of the first line of Toni’s Tune (does he spell it Toni?). None of the three readings is necessarily better than another. I just want to illustrate how much rhythm—and rhythmic variation—exists in the spoken text compared to the version with quarter notes.

There’s so much there! It’s not even that good a poem! Like, even for being strictly iambic, there’s already so much there! It’s implying a very specific and regular underlying meter, and yet we’re still able to draw out a lot of rhythmic complexity from the text. If we were examining a free-verse poem, or even prose, we might find even more rhythmic variation.

But when we set a text to music, so often we default to quantizing all the rhythms into a bunch of quarter notes or a gaggle of eighth notes. To make matters worse, many composers who do this claim to be drawing the rhythm of the music from the “natural rhythm of the text.” A cursory glance at their music, however, reveals nothing but consecutive eighth notes (with maybe a quarter note triplet thrown in for good measure), and the realization sets in: aside from setting unstressed syllables anacrusically, the composer’s claim is little short of excreta odifera. One might argue that as long as the stresses are on downbeats, conductor’s rubato is enough to imbue streams of successive eighths with a more speech-like lilt. But I wouldn’t be so sure. To explain why, I would like to introduce you to linguistic isochrony.

Linguistic Isochrony: How Speech in Language is Timed

Alright.
So.

Linguistic isochrony is a fancy-sounding phrase that you can use at parties and composition seminars to describe the rhythmic divisions of prosody in different languages. In other words, when we talk about a certain language’s isochrony, we’re describing how speakers of that language spread sounds out over time during sentence-long stretches of normally-spoken prose. There are three main ways that linguists have described how languages are “timed”:

  1. Speakers of syllable-timed languages spend a relatively even amount of time speaking each syllable of a phrase.
  2. Speakers of mora-timed languages spend a relatively even amount of time speaking each mora of a phrase.
  3. In stress-timed languages the amount of time between stressed syllables is relatively even, with all other syllables getting squashed into the intervals between stresses.

English is stress-timed.

This is huge. English is stress-timed. I cannot stress this enough.2lmao To elaborate further, this short except from Wikipedia illustrates the idea nicely:

In a stress-timed language, syllables may last different amounts of time, but there is perceived to be a fairly constant amount of time (on average) between consecutive stressed syllables. Consequently, unstressed syllables between stressed syllables tend to be compressed to fit into the time interval: if two stressed syllables are separated by a single unstressed syllable, as in delicious tea, the unstressed syllable will be relatively long, while if a larger number of unstressed syllables intervene, as in tolerable tea, the unstressed syllables will be shorter.3Wikipedia, s.v. “Isochrony.”

Examples of syllable-timed languages include Italian and Spanish. You’ll note the toccata-like rhythm of these languages as the duration of every individual syllable is almost exactly the same as the duration of every other. As someone who grew up listening to Puerto Rican Spanish, I find the Mexican accent to be particularly syllable-timed.

Japanese is a mora-timed language. Very similar to syllable-timed languages, except that certain sounds that may not constitute a syllable can still be a mora (e.g., double vowels). Languages like Spanish and Japanese are, in many ways, more suited to constant streams of eighth notes—at least insofar as the composer claims that they replicate the “natural rhythm of the text”—, but with a stress-timed language like English, it just doesn’t make any sense.

Note that these three categories of isochrony are theoretical archetypes. Real living languages don’t fit neatly into one category or another, but may exhibit degrees of syllable-timing or stress-timing depending on the context. Still, looking at language this way can help us think differently about how rhythmic features of a language can be re-coded musically.

To draw your rhythms from the ‘natural rhythm of the text’ in English, you need to take into account the stress-timing. The time between stressed syllables in a sentence is relatively even, and all the unstressed syllables get squished in between these stresses. Moreover, this squishing doesn’t always occur evenly. That is to say, one would think that the first sentence of this paragraph,

To draw your rhythms from the ‘natural rhythm of the text’ in English, you need to take into account the stress-timing.

could be approximated simply by putting all the stressed syllables on downbeats as prosodic pillars buttressed by evenly distributed unstressed syllables, as below:

But even this isn’t quite right. English is a language of incredible rhythmic variation, and a close listen reveals that while stressed syllables occur at relatively even intervals, unstressed syllables are often distributed asymmetrically across these linguistic pylons. Accounting for this, the transcription becomes:

As we can see, this is a far cry from the usual straight eighths and quarters (and obligatory triplet 😌):

An ostensibly astute reader may rebut, “But Miggy, English is famously known to be an iambic language, so if a passage is iambic and you place all your stresses on downbeats, then it makes sense that each stressed syllable would be written on a downbeat and every other syllable would be on the intervening offbeat,” to which I would reply, “The simple fact is—as exemplified by this very sentence—English is not iambic.” Moreover, as described above, even if a certain line scans iambic, there is much more rhythmic variation when it’s actually read than an even oscillation of short-long-short-long-short-long.4One of the greatest disappointments in the English language is that the word iamb is not an iamb, but a trochee. Similarly, palindrome’s unidirectionality elicits comparable dismay, as does the fact that perhaps nothing could be less onomatopœic than onomatopœia. Thankfully, we may take solace in the fact that trochee is, in fact, a trochee.

The rhythmic variation of normal speech can be heard especially well in harmonizer-style videos on YouTube, where people create harmonic accompaniment to the inherent rhythm (and pitch) of famous Vine videos and such.5RIP Vine. 😞 As you can see from the example below, the musician is constantly changing tempo and rhythmic feel to account for the natural changes in the prosodic rhythm.

Note the flexing and stretching of English’s stress-timed isochrony. Skip to 1:07 for the a particularly vivified example.

I’d also like to share an example of speech-like rhythm in textless music. During the opening of Many, Many Cadences by Sky Macklay, the string quartet sounds like it’s talking to you—an effect that’s achieved partly by the rhythmic variation of the notes (articulation and register also play a big part). The way Macklay varies the duration of each note from one gesture to the next gives the passage a stress-timed rhythmic feel. Note that the rhythms sound complex, but when you look at the score, they’re easy to count—it’s just 8ths, 16ths, and triplet 8ths (which don’t even cross barlines). If this were just 8th notes, it wouldn’t have the same jittery, chit-chatty character to it. It would probably just sound like a 20th-Century counterpoint exercise.

• • •

There may be perfectly good artistic reasons, of course, for reducing the natural speech-rhythm of a text into straight, pixelated eights, quarters, and the like. Maybe you want to evoke an ecclesiastic feel. Maybe you want a mechanical or Baroque sound. Maybe you’re more interested in playing with metric variation than with syncopation (though the two go hand-in-hand). Maybe you’re working within a regular grid-like groove. Maybe you want to slow the text waaay down. Maybe you just don’t care about having interesting rhythms and want to make the rhythm really easy and feel that eighths are good enough (not really an artistic reason, but a reason nonetheless, I suppose). Maybe you simply think, that’s just the way I hear it.

There are tons of reasons for writing simple rhythm—and I’ll talk more about issues with rhythm in vocal music in an upcoming article—but if you’re going to claim that the rhythmic design of the music is based on the natural rhythms of the spoken text, then, linguistically, there’s just no way to back that up in English if it’s just eight notes (or something similar).

In any case, whether or not a composer chooses to attempt to preserve some of the original rhythm of the spoken text, no matter how you look at it, they can’t get around the fact that information embedded in the original work becomes lost upon setting it to music. Going further, one could argue that the act of prescribing the rhythm at all—even in a really intricate transcriptional way—removes a certain improvisational vitality from the rhythm of the text. I’m not sure I would go this far, mainly because at a certain point, a performer’s rubato can make up for this. But it’s an interesting idea with powerful implications:

In order to set a poet’s work to music, one must first destroy a part of it.

This brings me to my second problem with text setting, namely that of authorial intent.

Composer as Illuminator: Poetry and Authorial Intent

If, in setting a work to music, you are changing it such that a certain amount of information is lost, the question then becomes, “Does that matter?” We have this idea as composers of choral music that our settings are all written “in service to” the text, that our job is to illuminate the poet’s words.

But like… is it?

While the precept that “the composer’s first duty is to the text” has permeated the ethos of the choral community, it raises some serious questions:

Are you trying to enhance what the poet was originally trying to say? How do you know that your setting is doing that? How can you be sure? Won’t the poet mind that you’ve bulldozed their speech-rhythm and replaced it with your own note-rhythm? Setting a work to music inherently changes its form. Does that matter? Does it matter how the poem looks on the page (more information that is lost upon musical transmutation)? How can you be sure you even know what the poet originally meant? Some poets are quite strict with what they feel their poetry means, whereas others allow for more multivalent interpretations. How do you know you’re getting the interpretation of this poem “right”? One could argue (dubiously, of course) that if the poem you’ve chosen to set isn’t complex enough to warrant such questions about the interpretation of textual ambiguities, well, then, maybe it’s a superficial poem.

That might not be a fair judgement. I guess that’s the other thing: in literature studies there’s this thing called reader-response theory, which is basically the idea that a work of art is not complete until it’s been interpreted—and therefore paraphrased—by a reader, listener, observer, etc. If art only exists as meaning, and meaning only exists in our own minds, then the artwork cannot be completed extrinsically. It can only be completed inside your head.

Works interpreted this way are viewed through the lens of the personal experiences of the observer. When this observation occurs, it solidifies the so-called meaning of the artwork as an idea in the observer’s mind—one that’s personal to that observer and that observer alone. In this way, the work of art can become extremely multivalent, as each observer creates their own valid version of the work, simply through the act of thoughtful interpretation. So, then, under this idea, authorial intent will only get you so far, since everyone’s interpretation of a text is going to be different anyway—sometimes only slightly different, sometimes radically different, but different nonetheless.

Continuing, one could argue that if for a poem to be complete it must be interpreted, then a composer’s setting of a work completes it, insofar as it’s a valid interpretation by a reader. Seems extremely self-important to presume that I could “complete” another artist’s work by decorating it with my own. But isn’t that exactly what we do every time we set a work of poetry to music?

Maybe. An important point of reader-response theory is that the reader’s interpretation supersedes the author’s intent. So the composer’s setting would no longer be about what the poet meant. It would be about the composer’s interpretation of what the poet meant. And taking this idea one step further, for an audience member listening to the setting, the art would only exist as their interpretation of the composer’s interpretation of the poet’s work. Even the composer’s ostensible completion of the poet’s work would be rendered incomplete until finally interpreted by an audience member.

Finally, at the end of the day, we have to face the inescapable fact that had the poet originally wanted their poem to be a song, they would have written it as a song—as all of the tens, perhaps hundreds, of thousands of poet-composers we dub songwriters do every day—or they would have worked with a composer to write one. For example, Emily Dickinson was known by her family and friends to be an expert improvisor (composer?) at the piano. She attended concerts, sang in church, took voice and piano lessons, collected sheet music, and by all accounts was intelligent enough and musically literate enough to set any of her poems as art songs or choral works had she wanted.6George Boziwick, “MY Business Is to Sing: Emily Dickinson, Musician and Poet,” The New York Public Library (The New York Public Library, December 10, 2015). But (as far as we know) she didn’t.

So maybe authorial intent isn’t such a big deal! If you set a certain poem a certain way that contradicts or distorts the poet’s original intent, and you have good reasons for doing so, is that a bad thing? Is that a good thing? I don’t know. It depends how much the poet cares, I suppose. And how much you care that the poet cares. But because different poets have different attitudes toward the interpretation of their works, unless you sit down with one of them and ask them, it’s virtually impossible to know, for any given poem, whether your recasting of it exceeds the threshold of alteration with which that poet is comfortable.

So at this point we kind of have to throw out the somewhat self-aggrandizing idea that our work is done “in service to the poet” or “in service to the text.” The truth is, our work is done in service to our idea of the text, our idea of what we think the poet wants or means, or—perhaps more accurately—our idea of what we want or mean. If you give six different composers the same poem and ask them to “illuminate the text,” you’re going to be left with six different pieces, each informed by what the composer thinks the poet wants, and each perhaps saying something completely different.7But if they’re Choral Composers, then all the works will sound the same no matter what you give them! Zinggg! 😎

That’s not to say that the author of the poem wouldn’t be happy with all six recontextualizations of their words. But as composers, we don’t constantly check in with the poets as we transmute their works into song,

Do you like this? What about this? Is this too fast? Is it too long? Tell me what you like. I wanna give it to you. Don’t stop.

Nor do most composers feel comfortable with poets backseat-driving the compositional process,

Don’t do that. Do it like this. It needs to go faster. Not that fast. Yes. Higher. Oh yes, right there!

Rather, when we actually write the damn thing, we work to realize our own conceptions of a work inspired by the poetry, which often includes the use of the text itself.

Of course, we may still want to respect the perceived will of authors, to try to communicate what we think they probably meant. A lot of poetry is quite complex and nuanced, but a lot is also simple and straightforward. There are plenty of poems where the reader can grasp the central thesis of the work, even if they might not be able to read the poet’s mind, verbatim. Surely in cases like this, the composer has a duty to serve the text.

Once again—however—this is dubious. It’s perfectly acceptable to know full well what an author generally meant. It’s also perfectly acceptable to disagree with the poet’s ideas and frame their work critically or even satirically, underscoring their words with a big musical eye-roll. If they’re a dead poet whose works have since crossed the great divide into the public domain—Shakespeare, Teasdale, early Pound, Horace, Wordsworth, early Yeats, Petrarch, early Frost, etc.—you’ll probably accomplish this with little trouble. If it’s a living poet, however,—or a poet with a living estate—you may run into some problems. Still, from a purely æsthetic point of view there’s nothing wrong with what essentially amounts to taking responsibility for your own artistic decisions and thinking for yourself.

This brings us to the third, final, and largest of my issues with text setting. And that is this: at the end of the day, whenever we set another artist’s work to music, we are creating a derivative work. Cutting our way through the forest of linguistic isochrony, hacking through the thicket of authorial intent, we find ourselves on the brink—staring across a deep æsthetic mire. To wade through it, join me for Part 2 of this sub-series on text setting!

Cheers!

• • •

Notes

1 Incidentally, I should mention that in the case of Sleep, things are a little complicated since the music was originally a setting of Stopping by Woods which then received new words by Tony Silverstri—I’m sure we all know the story. While the meter and rhyme scheme match the original, it’s worth mentioning that these words weren’t set to music in the traditional manner. That is, the sounds of the words themselves weren’t taken into account before the music was written. Rather they were fitted into a casting molded from another—similarly shaped—verse.

2 lmao

3 Wikipedia, s.v. “Isochrony.”

4 One of the greatest disappointments in the English language is that the word iamb is not an iamb, but a trochee. Similarly, palindrome’s unidirectionality elicits comparable dismay, as does the fact that perhaps nothing could be less onomatopœic than onomatopœia. Thankfully, we may take solace in the fact that trochee is, in fact, a trochee.

5 RIP Vine. 😞

6 George Boziwick, “MY Business Is to Sing: Emily Dickinson, Musician and Poet,” The New York Public Library (The New York Public Library, December 10, 2015), https://www.nypl.org/blog/2014/12/09/my-business-sing-emily-dickinson.

7 But if they’re Choral Composers, then all the works will sound the same no matter what you give them! Zinggg! 😎.