Thursday, May 21, 2015

The Colloquial and Dysfunctional Two Gentlemen

In a previous post, Two Late Dates for The Two Gentlemen of Verona, I discussed the curious fact that two recent stylometric studies had come up with composition dates for The Two Gentlemen of Verona completely at odds with the commonly held belief that the play is a very youthful Shakespeare work, possibly even his first. In one study, MacDonald Jackson demonstrated that using speech length as a chronological indicator for Shakespeare’s plays, Two Gentlemen, far from being his first play, was ranked as Shakespeare’s fifteenth play, putting it roughly around the date of late 1597/early 1598 I had proposed in my paper Why A Dog? A Late Date For The Two Gentlemen Of Verona. In the other study, Neal Fox, Omran Ehmoda and Eugene Charniak, using a different stylometric technique that classified Shakespeare’s plays into two categories - ‘early or ‘late’ - placed Two Gentlemen in the late category, meaning it had more characteristics in common with plays written in the second half than the first half of Shakespeare’s career.

The now conventional wisdom that Two Gentlemen may have been Shakespeare's first play (check the blurb that accompanies every new production of the play) is in large part due to the authority invested by the casual observer in the 'Oxford Chronology' put forward by Gary Taylor in his (with Stanley Wells) William Shakespeare: A Textual Companion. Under the Oxford chronology, Two Gentlemen is dated 1590-1, along with The Taming of the Shrew, and is either Shakespeare’s first or second play (Taylor places it as the first).
Given the influence of Taylor's chronology, I recently decided to revisit the The Canon and Chronology of Shakespeare's Plays section of the Textual Companion to see how Two Gentlemen had brushed up in the various stylometric studies that Taylor mentions there. Much to my surprise I found that in addition to referencing a range of studies by others, Taylor had conducted two studies himself, and in both of them Two Gentlemen came out as a serious outlier. Somehow, over the last few years I'd managed to forget these two studies, so I'm going to belatedly discuss them now.
One of Taylor's studies involved the application of a "colloquialism-in-verse" test to determine the chronology of Shakespeare's works (for the relevance of colloquialisms as a chronology test see my Stylistic Markers in Guy of Warwick). Based on the use of colloquialisms like ‘t, th’, ‘em, ‘ll etc, Taylor calculated a “colloquialism quotient" for each play, with As You Like It as a reference point. A negative quotient meant that a play was calculated to be earlier than As You Like It, and a positive quotient that it was later. The more negative the quotient, the earlier the play was, and the more positive the quotient, the later the play was.
As a broad chronological test this seems fine, as can be seen in Taylor’s own graph of the results shown below (ordered by the dates in Taylor's chronology, so Two Gentlemen appears first). There is clearly a general movement towards a greater use of colloquialisms as Shakespeare's career progresses.



Now let's look specifically at Two Gentlemen. I have highlighted in red its exact quotient value under Taylor's test. You should be able to see clearly enough that there are actually a large number of plays with a more negative quotient than Two Gentlemen i.e. based on this colloquialism-in-verse test, they are earlier than Two Gentlemen. To spare you needless squinting at the graph, I have listed all the plays ranked earlier than Two Gentlemen below:

  • 1 Henry 6
  • King John
  • Titus Andronicus
  • 1 Henry 4
  • 2 Henry 4
  • Merchant of Venice
  • Richard 3
  • Merry Wives of Windsor
  • Midsummer Night’s Dream
  • Romeo and Juliet
  • Much Ado About Nothing
  • Henry 5
  • Richard 2
  • Comedy of Errors
  • Duke of York (3 Henry 6)
  • Julius Caesar
Under Taylor’s chronological test, there are sixteen Shakespeare plays ranked as being earlier than Two Gentlemen. Now I know that tests like these can't be expected to produce pin-point accuracy with the ranking of individual plays, and Taylor himself would never claim as much. But sixteen plays ranked earlier? I wonder if this gave Taylor pause when he decided that Two Gentlemen was Shakespeare's first play. If it did, he doesn't mention it. His only comment is that the result is 'ambiguous'. Personally, I'd call the result highly suspicious, and point out that it just happens to place Two Gentlemen pretty much in the same late position as MacDonald Jackson's chronological test using an entirely different approach.

Taylor’s other test was an authorship test, not a chronological test, but the result for Two Gentlemen was such a massive outlier that it raises serious questions about the play in general. Taylor used a function word test, a common and reasonably reliable test of authorship. Concentrating on ten function words, he highlighted cases where the frequency of a particular function word in a play was well outside the norm for Shakespeare (defined as outside two statistical deviations). The result for Two Gentlemen was quite remarkable. Out of ten function words examined three were outside the norm for Shakespeare: ‘for’ (extremely high frequency), ‘that’ (extremely high frequency) and ‘the’ (extremely low frequency). How remarkable this is can be seen when you realize that the vast majority of the other plays have no function words outside the norm, and the handful that do (As You Like It, Henry 5, Twelfth Night, King Lear and Coriolanus) have only one function word outside the norm. That Two Gentlemen has three makes it a very, very serious outlier.

How do we explain why Two Gentlemen is so extraordinarily abnormal here compared to the rest of the canon? Taylor's methodology did come in for some criticism, but none of it impacts on these particular results. Why is the use of 'for' and 'that' so high, but the use of 'the' so low? And why is the use of 'for' so high? It is almost three standard deviations away from the Shakespearean norm, a level of deviation that Taylor himself describes as "highly suspicious".

Given that Taylor's function word test was for authorship, the obvious conclusion is that the results for Two Gentlemen are pointing to collaboration. Surprisingly, Taylor himself didn't suggest anything of the kind, even as a possibility. But it's hard to avoid the thought. Maybe the disintegrationists were right after all, and there are parts of the play written by someone other than Shakespeare? Someone with different function word proclivities.

Frankly, that's not a conclusion I'm comfortable with, if only because the arguments for it are usually subjective and involve a crude sectioning of the play into the parts the arguers like (therefore by Shakespeare) and the parts they don't like (therefore by anyone-other-than-the-immortal-Bard). Still, Taylor's function word test is an objective analysis, so we have to at least countenance the possibility that it is genuinely pointing to collaborative authorship in Two Gentlemen. This intriguing play just got more intriguing.

Monday, June 10, 2013

Two Words I Hate

The words are ‘Snapsack’ and ‘Oleo’. Not ones you think about much, I know, but for our  understanding of Guy of Warwick they may be very important.

Helen Cooper noted in Guy of Warwick, Upstart Crows and Mounting Sparrows that these two words found in Guy were " first recorded by the OED (both editions) only in the seventeenth century”. ‘Snapsack' was “first attested in 1632 but possibly current in dialect before that”. 'Oleo' (OED s.v. 'olio'), meaning ‘any mixture of many heterogeneous elements; a hotchpotch, medley, jumble [OED 2a]’ was “ a word frequently recorded towards the middle of the seventeenth century but unattested elsewhere in the sixteenth".

These middle seventeenth century first dates for ‘snapsack’ and ‘oleo’ are a serious problem for the prevailing opinion, including mine, that Guy is from the 1590s. Cooper herself did not dwell too long on the issue, other than to say that the existence of ‘snapsack’ and ‘oleo’ in Guy "tends to push” the date of the play “forwards" i.e. later. But that’s an understatement. The earliest date for ‘oleo’, in particular, is – let’s be clear about this - over a half a century later than the 1590s. That’s too big a gap to ignore, much as we might like to.

I’m reasonably unstressed about the existence of the word ‘snapsack’ in Guy. It’s clearly related to the earlier 'knapsack', and may well have been current in dialect before 1632, as Cooper suggests. Even if it wasn't, it’s easy enough to imagine a silent or unconscious changing of 'knapsack' to 'snapsack'  during printing.

'Oleo', though, is a worry. It occurs in Time’s chorus to Act 2 of Guy:

Enter Time. Devotion and Divine Atchievments cause
Great Guy of Warwick to neglect all Lawes,
Of Nuptial League, he leaves his pregnant VVife,
Countrey and Kindred for a holy Life,
But in his progresse, makes himself a prize
To multitudes of matchlesse miseries;
By which it may be justly understood,
He is not truly great, that is not good:
In Holy Lands abroad his spirits roame
And not in Deanes and Chapters lands at home,
His sacred fury menaceth that Nation,
VVhich hath Indea under Sequestration:
He doth not strike at Surplices and Tippits,
(To bring an Oleo in of Sects in Sippits) [my italics]
But deales his warlike and dead-doing blowes,
Against his Saviours and his Soveraigns foes;
That Coat of Armour fears no change of weather,
Where sanctity and souldier go together:
So doth our Champion march up to the fight,
Sit, silent, pray, Time will bring all to light.

An ‘oleo’ (or ‘olio’/’oglio’) was  'A spiced meat and vegetable stew of Spanish and Portuguese origin. Hence: any dish containing a great variety of ingredients' [OED 1]. However, the word later took on the figurative meaning of 'any mixture of many heterogeneous elements etc', the sense in which it is used in Guy:

He doth not strike at Surplices and Tippits,
(To bring an Oleo in of Sects in Sippits)

The earliest OED example of the use of ‘oleo’ in this figurative sense comes from the Eikon Basilike, The Pourtrature of His Sacred Majestie in His Solitudes and Sufferings, a series of meditations supposedly written by King Charles I of England (the authorship is disputed) , and published very shortly after his beheading in 1649:

'Tis strange that so wise men, as they would be esteemed, should not conceive, That differences of perswasion in matters of Religion may easily fall out, where there is the samenesse of duty, Allegiance, and subjection. The first they owne as men, and Christians to God; the second, they owe to Me in Common, as their KING; different professions in point of Religion cannot (any more than in civill Trades) take away the community of relations either to Parents, or to Princes: And where is there such an Oglio or medley of various Religions in the world again, as those men entertain in their service (who find most fault with me) without any scruple, as to the diversity of their Sects and Opinions!

Hmm. Two things concern me here. First, the fact that the Eikon Basilike refers to “Oglio or medley”, rather than just “Oglio”, which suggests that the author (whoever he was) thought the use of the word 'oleo' in this sense was sufficiently new in or about 1649 that it needed to be explained as 'medley'. Second, Guy and the Eikon Basilike just happen to use ‘oleo’ in the same context i.e in reference to religious division arising from ‘sects’:

He doth not strike at Surplices and Tippits,
(To bring an Oleo in of Sects in Sippits)

And where is there such an Oglio or medley of various Religions in the world again, as those men entertain in their service (who find most fault with me) without any scruple, as to the diversity of their Sects and Opinions!

If you didn’t know any better, you’d have to suspect that Time’s lines in Act 2 of Guy are alluding to this passage in the Eikon Basilike. Truth is, I don’t know any better. I think it’s a distinct possibility. Guy was printed in 1661, shortly after the restoration of the monarchy in 1660 - a perfect time to make an allusion to the Eikon Basilike. This leads to the disquieting thought that Time’s lines in Act 2 of Guy may have been written sometime during the period 1649 to 1661.

Strictly speaking, of course, we need only conclude that the couplet containing the word ‘oleo’ was written during that period, not the passage as a whole. We could then just see the two lines as late topical additions to a play that was itself much older. Though I’d like to believe this, I’ve got my doubts. The passage seems of a piece. You can’t really detach the couplet from the surrounding lines, so I think we have to accept the possibility that Time’s chorus to Act 2 in its entirety was written sometime around the middle of the seventeenth century. If so, perhaps all of Time’s choruses in Guy were written around the middle of the seventeenth century. Perhaps the whole play was.

As you can see, the implications of this single word ‘oleo’ can lead to a cascading series of possibilities, none of which are at all palatable to those who, like myself, argue that Guy of Warwick is a play from the 1590s. At this point, though, I’m going to say no more on the subject. I just don’t have the time at the moment to work through all the complexities raised by these possibilities. Maybe later.

What I'd prefer is for someone out there to tell me that I actually don't need to return to the subject, because that someone has found a usage of 'oleo' decades earlier than 1649, and therefore I have made a big issue out of nothing. I can't find a single such usage, but if you can, please let me know!

Saturday, January 19, 2013

The Speech Length of The Two Gentlemen of Verona

In my previous post I looked at two recent stylometric studies that suggested that The Two Gentlemen of Verona was probably written later than usually thought, certainly later than the dates proposed by those who think it may have been Shakespeare’s first play. One of the studies I discussed was MacD. P. Jackson’s analysis of Hartmut Ilsemann’s use of speech length as a chronological indicator for Shakespeare’s plays.

In this post, I am going to look at one of Ilsemann’s own papers on the subject, where he analyses his speech length data in a very different way to Jackson, and comes up with some interesting results. In More statistical observations on speech lengths in Shakespeare's plays, Ilsemann graphs the speech length distribution of all Shakespeare's plays to determine whether there are any patterns linking individual plays. To minimise stylistic differences due to genre, he analyses the Histories, Comedies and Tragedies separately. Below are his graphs for the comedies (the white line is the average of the individual curves):


As you can see from the graphs, Ilsemann determined that there were three distinct patterns of speech length distribution for the comedies. The first and largest group comprised plays generally believed to have been written in the late 1590s to early 1600s. Their speech length distribution is characterised by (using Ilsemann’s words) “a steep rise [that] goes up to four words, followed by the gentle decline towards the value twenty.” The second group comprises just two plays: The Two Gentlemen of Verona and Love’s Labour’s Lost. The broad shape of their speech length distribution is similar to that of the first group, but with a more gradual rise to a peak of nine instead of four, then a very sharp drop to longer speech lengths.

The big outlier is the third group, comprising The Taming of the Shrew and The Comedy of Errors, which “have two clearly distinct maxima, a smaller maximum at four words and the dominant one at nine words.” From Ilsemann’s graph, it is quite obvious that these two plays have remarkably similar speech length distributions, and ones that are significantly different to all the other comedies. This is a particularly interesting result, because Shrew and Errors - along with Two Gentlemen - are the plays which have most often been put forward as Shakespeare’s first comedy, perhaps even his first play.

How much can we deduce from Ilsemann's analysis? It does seem to indicate that at the broad level speech length distribution is a useful chronological indicator. There is a very marked difference between The Merry Wives of Windsor, Much Ado About Nothing, As You Like It, Twelfth Night, Measure For Measure and All's Well That Ends Well as a group and The Two Gentlemen of Verona, Love's Labour's Lost, The Taming of the Shrew and The Comedy of Errors taken together as another group.  Virtually all proposed chronologies for the comedies, including my own, would see the second group of plays as having been written before the first group (with some doubt, perhaps, about The Merry Wives of Windsor).

What can we deduce from the distribution patterns of the small second group? Not too much. We're on far less firm ground here, because there are only four plays in the group, and the more you drill down with any dataset, the less reliable the data becomes. Ilsemann says that "the shape of the distribution of speech lengths suggests a kinship between The Two Gentlemen of Verona, and Love's Labour's Lost" i.e. Two Gentlemen has a kinship with a play usually dated in the range 1594-6. By itself, it's not enough to prove anything, but it's something you just might want to keep in mind the next time you read any claim that Two Gentlemen was Shakespeare's first play and originated in the 1580s*.

[* Roger Warren makes this claim in his 2008 Oxford Shakespeare edition of The Two Gentlemen of Verona. You can read Gabriel Egan's review of Warren's edition here.]

Sunday, November 11, 2012

Two Late Dates for The Two Gentlemen of Verona

In my paper Why A Dog? A Late Date For The Two Gentlemen Of Verona,  in the September 2007 issue of Notes and Queries, I put forward the hypothesis that Shakespeare’s The Two Gentlemen of Verona was written later than scholars had previously suspected. In particular, I challenged the notion, espoused most notably by Stanley Wells and Gary Taylor in their Oxford William Shakespeare: A Textual Companion, that Two Gentlemen is a very early work, perhaps even Shakespeare’s first play. Wells and Taylor proposed a date of 1590-91 for Two Gentlemen. Roger Warren, in the latest Oxford edition of the play, even speculates that it may be as early as 1587.

In the same issue of Notes and Queries as Why a Dog? there also appeared a paper by MacD. P. Jackson, A New Chronological Indicator for Shakespeare's Plays and for Hand D of Sir Thomas More, based on the work of Hartmut Ilsemann, who had noted that “in plays written up to 1599 the speech-length most frequently used was of nine words and that thereafter it fell to four words”. Ilsemann's point was a broad one - that the opening of the Globe in 1599 changed the way Shakespeare wrote - but Jackson analyses his data in more detail to show that speech-length provides a useful chronological indicator for Shakespeare's plays overall.

To illustrate this, Jackson divided Shakespeare’s plays into six groups, based on the chronological order given in the Oxford Textual Companion. He then calculated for each group the percentage of speeches 3-6 words long of all speeches 3-10 words long. The results are shown in his Table 1 below.

Table 1
(a)
(b)
(c)
The Two Gentlemen of Verona to Titus Andronicus
(1590-1 to 1592)
33.6
Richard III to A Midsummer Night’s Dream
(1592-3 to 1595)
37.4
King John to Much Ado About Nothing
(1595 to 1598)
46.9
Henry V to Troilus and Cressida
(1598-9 to 1602)
58.8
Measure for Measure to Macbeth
(1603 to 1606)
62.7
Antony and Cleopatra to Henry VIII
(1606 to 1613)
65.0
(a) plays in groups of six in chronological order (last group contains seven plays)
(b) dates of composition
(c) speeches of 3-6 words as percentage of all speeches of 3-10 words.

Presented in these broad groupings, speech-length data certainly does seem to provide a strong chronological indicator for Shakespeare’s plays. Jackson then goes on to look at the data for each individual play, to see how well it matches the Oxford chronology. The results are shown in his Table 2 below.

Table 2
(a)
(b)
(c)
(d)
(e)
1
The Two Gentlemen of Verona
8
46.2
15
2
The Taming of the Shrew
8
43.7
13
3
2 Henry VI
9
35.9
7
4
3 Henry VI
9
14.5
1
5
1 Henry VI
8
19.7
2
6
Titus Andronicus
9
26.2
3
7
Richard 3
9
32.2
5
8
Comedy of Errors
10
28.4
4
9
Love's Labour's Lost
9
45.4
14
10
Richard 2
9
37.1
9
11
Romeo and Juliet
9
40.5
10
12
A Midsummer Night's Dream
9
32.9
6
13
King John
9
37.0
8
14
The Merchant of Venice
8
42.5
12
15
1 Henry IV
6
49.1
16
16
The Merry Wives of Windsor
5
51.8
18
17
2 Henry IV
6
52.7
19
18
Much Ado About Nothing
8/9
42.0
11
19
Henry V
5
54.0
20
20
Julius Caesar
4
55.3
21
21
As You Like It
5
51.4
17
22
Hamlet
4
65.7
32
23
Twelfth Night
6
56.0
23
24
Troilus and Cressida
4
62.9
28
25
Measure for Measure
4
60.5
25
26
Othello
4
63.6
29
27
All’s Well That Ends Well
4
55.7
22
28
Timon of Athens
5
62.8
27
29
King Lear
4
65.1
30
30
Macbeth
4
69.2
37
31
Antony and Cleopatra
4
66.1
34
32
Pericles
4
57.1
24
33
Coriolanus
4
66.0
33
34
The Winter’s Tale
4
61.6
26
35
Cymbeline
4
68.0
36
36
The Tempest
4
65.4
31
37
Henry VIII
4
67.9
35
(a) position of play in Oxford Textual Companion’s chronological order
(b) title of play
(c) most frequently used speech length in terms of number of words in speech
(d) speeches of 3-6 words as percentage of all speeches of 3-10 words
(e) position of play in order of size of figure in previous column.

In general, while the exact chronological position usually varies, each play sits within the same broad position. There are, however, two major exceptions: The Two Gentlemen of Verona and The Taming of the Shrew, which, as Jackson notes, “are placed much later on the ‘short-speeches’ scale than in the Oxford chronology”.

Let’s look more closely at The Two Gentlemen of Verona. Firstly, the play is clearly the biggest outlier when comparing the Oxford and speech-length based chronologies. Listed as Shakespeare’s first play in the Oxford chronology, Two Gentlemen comes up as the fifteenth play in the speech-length chronology. Secondly, its speech-length ratio of 46.2 clearly aligns it with the group ‘King John to Much Ado About Nothing (1595 to 1598)’, which has a ratio of 46.9. Lastly, under the speech-length chronology, it comes after both Romeo and Juliet and The Merchant of Venice, a more natural position for it than in the Oxford chronology, where the clear affinities between the plays need to be tortuously explained as Two Gentlemen 'anticipating' the 'later' works.

Obviously, I was very pleased to read Jackson’s results, and I was even more pleased recently to discover another study where an entirely different stylometric approach to dating Shakespeare’s plays also points in the direction of Two Gentlemen being later than usually thought. The paper is Statistical Stylometrics and the Marlowe-Shakespeare Authorship Question, by Neal Fox, Omran Ehmoda and Eugene Charniak. As the title implies, the paper is mainly concerned with a stylometric approach to determining authorship (it was joint winner of the 2011 Calvin & Rose G Hoffman Prize). However, it also looks at whether the approach can be used for dating, which is the part I will concentrate on here.

Fox, Ehmoda and Charniak divided Shakespeare's plays into two corpuses, defined as 'Early' (plays written up to and including 1601) and 'Late' (plays written after 1601). The date for each play was taken from the Third Edition of the Annals of English Drama, 975 - 1700 (Harbage, 1989). They then examined each individual play using two different approaches ('General Vocabulary' and 'Generative Model', details provided in their paper), and compared the results against the early/late corpus to which the play belonged. In the majority of cases, there was no difference. For example, Romeo and Juliet, a member of the 'Early' corpus, was also defined as 'Early' using both the General Vocabulary and Generative Model approaches. Six plays, however, diverged from the corpus of which they were a member, as shown below:


Play
Predicted Date:
General Vocabulary
Predicted Date: Generative Model
Corpus Date
The Two Gentlemen of Verona (1593)
Late
Early
Early
The Merry Wives of Windsor (1597)
Late
Early
Early
As You Like It (1599)
Late
Early
Early
Julius Caesar (1599)
Late
Early
Early
Hamlet (1601)
Late
Late
Early
Twelfth Night (1601)
Late
Late
Early


Fox et al describe the results for these plays as “misclassifications”, which, strictly speaking, is correct. However, apart from The Two Gentlemen of Verona, all the plays are dated near, or actually in, 1601 i.e. they are near or on the cusp of the year used to divide Shakespeare's plays into ‘early’ and ‘late’ corpuses. In some ways, this more of a vindication of the approach used than a cause to doubt it.

The only serious outlier here is Two Gentlemen. Although a member of the 'Early' corpus, the General Vocabulary test classifies it as 'Late'. To put this into perspective, remember that ‘late’ here means 1602 or later. The result is saying that Two Gentlemen has characteristics that make it more compatible with a corpus of Shakespeare’s plays written after 1601 than a corpus of those written earlier – surprising for a play the Oxford chronology lists as Shakespeare’s first play, written 1590/1!

We are left, then, with the interesting situation that since I suggested that The Two Gentlemen of Verona was written later than usually thought, two independent stylometric studies have come up with results suggesting that … The Two Gentlemen of Verona may have been written later than usually thought.

‘Curious’, isn't it?