phrase NON HEAD-DTRS HD-DTR word SYNSEM SUBJ SUBJ synsem-list VFORM AGR index finite NUM

Save this PDF as:

Størrelse: px
Begynne med side:

Download "phrase NON HEAD-DTRS HD-DTR word SYNSEM SUBJ SUBJ synsem-list VFORM AGR index finite NUM"


1 Elementary Prncples of HPSG Georga M. Green January, 000 Introducton Ths chapter descrbes the theoretcal foundatons and descrptve mechansms of Head-Drven Phrase Structure Grammar (HPSG), as well as proposed treatments for a number of famlar grammatcal phenomena. The antcpated reader has some famlarty wth syntactc phenomena and the functon of a theory of syntax, but not necessarly any expertse wth modern theores of phrase-structure grammar. The goal of ths chapter s not so much to provde a tutoral n some consstent (and nevtably dated) verson of HPSG as t s to explcate the phlosophy and technques of HPSG grammars. In my opnon, the best way to fully understand ths approach, to be able to wrte and read HPSG grammars, s to buld an HPSG grammar from scratch, nventng and revsng the detals as one goes along, n accordance wth the constrants mposed by the formal model (but not necessarly by every constrant stpulated n the language of that model). Secton 1 of ths chapter descrbes the character of HPSG grammars, and the elements and axoms of the system. Secton descrbes how lngustc enttes are modelled, and how grammars descrbe the modelled enttes. The thrd secton descrbes the ontology of feature-structure descrptons n HPSG, and Secton deals wth the expresson of constrants, especally those nvolvng the noton `same' or `matchng'. Secton descrbes the compostonal treatment of semantcs n HPSG. Secton dscusses the representaton of consttuent structure, and Secton addresses the treatment of the order of elements wthn consttuents. HPSG s very much a lexcon-drven theory, and Secton 8 descrbes the organzaton of the lexcon, relatons among lexcal tems, and the nature of lexcal rules relatng them. Secton 9 descrbes treatments of complementaton, ncludng the treatment of Equ and Rasng constructons, and ther nteracton wth expletve noun phrases. Secton 10 descrbes varatons on the treatment of so-called extracton constructons and other unbounded dependences (e.g., ped ppng), wth some attenton to multple extractons and so-called parastc gaps, as well as the nature of alleged empty categores lke traces and zero pronouns. It concludes wth a dscusson of constrants on where extracton gaps can occur. The last secton descrbes the HPSG account of the bndng of pronouns and anaphors. Two appendces summarze salent aspects of the sort nhertance herarches dscussed, and the constrants embedded wthn them. 1 Grammars, types, constrants Two assumptons underle the theory denng head-drven phrase structure grammars. The rst s that languages are systems of sorts of lngustc objects at a varety of levels of abstracton, not just Ths work was supported n part by the Beckman Insttute for Advanced Scence and Technology at the Unversty of Illnos at Urbana-Champagn. Some parts are reworked versons of materal that appears n Green & Levne I am grateful to Ash Asudeh, Bob Borsley, Bob Levne, Carl Pollard and Ivan Sag for comments on earler versons, and useful advce about consstency and clarty n descrbng a theory that (lke all theores, and maybe all organsms) evolves pecemeal, a few systems at a tme. 1

2 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE collectons of sentence(-type)s. Thus, the goal of the theory s to be able to dene the grammars (or I-languages) that generate the sets of sentences (E-languages) that represent the set of natural human languages, assgnng emprcally satsfactory structural descrptons and semantc nterpretatons, n a way that s responsble to what s known about human sentence processng. 1 The other s that grammars are best represented as process-neutral systems of declaratve constrants (as opposed to constrants dened n terms of operatons on objects, as n transformatonal grammar). Thus, a grammar (and for that matter, a theory of unversal grammar) s seen as consstng of an nhertance herarchy of sorts (an s-a herarchy), wth constrants of varous knds on the sorts of lngustc object n the herarchy. A smple sort herarchy can be represented as a taxonomc tree representng the sort to whch belong all the lngustc enttes wth whch the grammar deals. For each local tree n the herarchy, the sort names whch label the daughter nodes partton the sort whch labels the mother; that s, they are necessarly dsjont subsorts whch exhaust the sort of the mother. For example, subsorts of the sort head can be `parts of speech' (not words!) of varous knds, and some of those sorts are further parttoned, as llustrated n (1). part-of-speech (1) n verbal a p d... v complementzer A partal nhertance herarchy for `parts of speech' A multple-nhertance herarchy s an nterlockng set of smple herarches, each representng a dmenson of analyss that ntersects wth other dmensons. Ths s partcularly strkng n the lexcon: verbs are usefully classed by the number and syntactc characterstcs of the arguments they requre, but they may also need to be classed accordng to nectonal class (conjugaton), and by semantc propertes of the relatons they descrbe (e.g., whether they represent states or propertes or events; whether ther subjects represent agents or experencers, and so on (Green 19, Levn 199). A grammar s thus a system of constrants, both unque and nherted, on sorts of lngustc objects. It would be nave to assume that all grammars have the same sorts or the same constrants on whatever sorts they mght have n common. Nevertheless, all grammars are herarches of sorts of phrases and words and the abstract lngustc enttes that need to be nvoked to dene them. All grammars constran these varous sorts n terms of propertes of ther component parts. One may speculate that grammars are as alke as they are as a result of there beng only a small number of economcal solutons to the problems posed by competng forces present n languages generally. For example, languages wth free word order enable subtle (non-truthcondtonal) dstnctons to be expressed by varaton n phrase order, whle languages wth xed word order smplfy the task of parsng by lmtng the possbltes for subsequent phrases. An elaborate nectonal system reduces ambguty (especally temporary ambguty), whle relatvely unnected languages smplfy 1 See Chomsky (198). More exactly, t s a multple-nhertance herarchy; see Secton.

3 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE the choces that have to be made n speech producton. At the same tme, whatever psychologcal propertes and processes gude the ncremental learnng about the world that s unversal among human bengs n ther rst years of lfe must contrbute to constranng grammars to be systems of nformaton that can be learned ncrementally. Sorts can be atomc (unanalyzed) lke masc, fem, +, and sg, or they can be complex. Complex sorts of lngustc objects are dened n terms of the attrbutes they have (represented as features), and by the value-types of those features. In HPSG, a feature's value may be dened to be one of four possble types: an atomc sort (lke +, or nte) a feature structure of a partcular sort a set of feature structures lst of feature structures If a value s not speced n a feature-structure descrpton, the value s stll constraned by the sort-declaratons to be one of the possble values for that feature. That s, t amounts to specfyng a dsjuncton of the possble values. Thus, f the possble values for the feature num are the atomc sorts sg and pl, then specfyng ether NP[num] or NP amounts to specfyng NP[num sg _ pl], and smlarly for all the other possble attrbutes of NPs (.e., all the features they can have). Sort declaratons are expressed n formulae of a logc for lngustc representatons (Kng 1989, Pollard 1999), and can be perspcuously abbrevated n labelled attrbute-value matrces (AVMs) as n (), where F1,..., Fn are feature names and sort,..., sort k are sort names. () sort0 F1 sort F sort j... Sort dentons thus specfy what attrbutes an nstance of the sort has, and what knds of thngs the values of those attrbutes can be, and sometmes what partcular value an attrbute must have (ether absolutely, or relatve to the value of some other attrbute ). Sorts nhert all of the attrbutes of ther supersorts and all of the restrctons on the values of those attrbutes. The set of feature-structures dened by a grammar s a partal subsumpton orderng,.e., a transtve, reexve, and ant-symmetrc relaton on the subsumpton relaton. Thus, lngustc expressons, or sgns, are words or phrases, and ths s reected as the fact that the sort sgn subsumes both phrase and word and no other sort. In fact, snce the speccatons for phrase and word are mutually exclusve (phrases have attrbutes whch specfy ther mmedate consttuents, and words don't), the sorts phrase and word partton the sort sgn. Sorts whch have no subsorts are termed `maxmal sorts' because they are maxmally nformatve or specc. See Green 199, n prep., for some dscusson. Set values are represented as sequences wthn curly brackets: slash f 1, g. The empty set s denoted: f g, whle f[ ]g denotes a sngleton set. Lst values are represented as sequences wthn angled brackets: subcat h NP, VP[nf]. The empty lst s denoted: h, whle h[ ] denotes a sngleton lst. See Secton below.

4 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE Feature structures, feature structure descrptons All lngustc enttes (ncludng both expresson types, and the abstract objects that are nvoked to descrbe them) are modelled n HPSG as feature structures. A feature structure s a complete speccaton of all the propertes of the object t models. To keep the dstncton clear between a feature structure, whch models a maxmal sort, and the feature structure descrptons whch are used to descrbe grammatcally{relevant classes of feature structures n the generalzatons that consttute the statements of the grammar, feature structures themselves are represented as drected graphs. A feature structure for a smpled account of the Englsh verb phrase sleeps s gven below: SYNSEM phrase HD-DTR NON HEAD-DTRS nl synsem word SYNSEM COMPS SUBJ synsem HEAD nl SUBJ COMPS HEAD synsem-lst nl v AGR VFORM ndex fnte PER NUM rd sg () The feature structure n () reects the followng nformaton: the phrase n queston has \nternal" syntactc and semantc propertes represented by the feature synsem, as well as the property of havng a head daughter (head-dtr) but no other subconsttuents; ts nh-dtrs (non-head daughters) attrbute has the value nl. Its \part of speech" (head) value s of subsort verb, of nte nectonal form, and requred to agree wth somethng whose agreement (agr) value s rd person and sngular, and ts head daughter's part of speech (head) value s exactly the same. In addton, the phrase subcategorzes for (.e., requres) a subject, but no complements, and the phrase t sub- For dscusson see Pollard and Sag 199: 8, 1{18, Pollard 1999, and for background, Sheber 198, Pollard and Mosher 1990.

5 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE categorzes for s precsely the phrase ts head daughter subcategorzes for. As s clear from ths example, the drected graphs that represent feature structures der from the drected graphs conventonally used to represent consttuent structure dagrams, n that dstnct nodes can be the source for paths to (.e. to \domnate") a sngle node. Ths stuaton, as ndcated by the convergence of the arrows n (), represents the fact that the part-of-speech (of subtype v) of the head daughter the same feature-structure as the part-of-speech of the phrase tself. Graph representatons of feature structures are awkward both to dsplay and to read, so descrptons of feature structures n the form of attrbute value matrces (AVMs) are commonly used nstead. Attrbute or feature names are typcally wrtten n upper case n AVMs, and values are wrtten to the rght of the feature name, n lower case talcs f they are atomc, as n (). () per rd num sg gen fem Feature structures are the enttes constraned by the grammar. It s crucally mportant to dstngush between feature structures (fully speced objects that model lngustc expressons) and feature structure descrptons, representatons (usually underspeced) that (partally) descrbe feature structures, and whch feature structures allowed by a grammar must satsfy. Feature structure descrptons characterze classes of objects. For example, The NP she could be represented by a fully speced feature structure (representable as a drected graph), but \NP" s (an abbrevaton for) a feature structure descrpton, and could not be so represented. Put another way, a partal descrpton such as a feature structure descrpton represented by an AVM s a constrant on members of a class of feature structures, whle a total descrpton s a constrant whch lmts the class to a sngle member. For the most part, grammar speccaton deals wth generalzatons over classes of words and phrases, and therefore wth (partal) feature structure descrptons. Sgns and ther attrbutes Head-drven phrase structure grammars descrbe languages n terms of the constrants on lngustc expressons (sgns) of varous types. Sgns are, as n the Saussurean model, assocatons of form and meanng, and have two basc subsorts: phrases, whch have mmedate consttuents; and words, whch don't. Sgns are abstractons, of course; an act of utterng a lngustc expresson that s modelled by a partcular sgn amounts to ntentonally producng a sound, gesture, or graphcal object that satses the phonologcal constrants on that sgn, wth the ntent that the product of that act be understood as ntended to have syntactc, semantc, and contextual propertes that are modelled by the respectve attrbutes of that sgn. 8 Sgns have phonologcal, syntactco{semantc, and contextual propertes, each represented by the value of a correspondng feature. Thus all sgns have phon and synsem attrbutes, recordng ther phonologcal and syntactco-semantc structures, respectvely. 9 The value of the synsem attrbute s a feature structure whch represents the constellaton of propertes that can be grammatcally selected for. It has a loc(al) attrbute, whose value (of type local) has cat(egory), cont(ent), and context attrbutes. Local values are what s shared by ller and gap n so-called extracton constructons. The synsem value also has a nonloc(al) attrbute, whch n eect encodes nformaton about all types of unbounded dependency constructons (UDCs). 8 For more on the nature of ths modellng, see Pollard & Sag 199: -10, 8. 9 phon values are usually represented n standard orthography, solely for the sake of convenence and readablty.

6 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE The category attrbute takes as ts value an entty of the sort category, whose attrbute head has a part-of-speech as ts value and whose subj, comps, and spr attrbutes have lsts of synsems as ther values. These type declaratons, and others dscussed n ths chapter, are summarzed n Appendx A. Wthn words, categores also have an argument structure (arg-st) feature whose value s a lst of the synsems whch denote the sgn's arguments. They are ordered by the oblqueness of the grammatcal relatons they represent, 10 and the arg-st lst represents the oblqueness record that s nvoked n constranng bndng relatons (cf. Pollard & Sag 199: Ch., or Sag & Wasow 1999: Ch. ). 11 The valence attrbutes of a sgn (subj, spr, comps) record what the subcategorzaton requrements of the sgn are. These attrbutes take lsts of synsems as ther values; n S's and referental NPs, all the lsts are empty lsts. In most cases, the arg-st lst s the concatenaton of the contents of the subject, specer and complements lsts, n that order. The exceptons are that so-called null pronouns and the gap-synsems 1 representng \extracted" elements are on arg-st lsts but not on valence lsts. In Pollard & Sag 199, the value of the content attrbute s a nomnal-object f the sgn s a referrng expresson, but a parameterzed state-of-aars (or psoa) f t s a predcatve expresson, or a quanter. Psoas are representatons of propostonal content as feature-structures. They are sub-typed by the relaton they express, and have attrbutes for the roles of ther arguments, whle nomnal-objects have ndex-valued ndex attrbutes and psoa-set valued restr(cton) attrbutes. As llustrated n (), 1 more current versons of the theory 1 nclude an ndex and a restr attrbute for predcatve as well as referrng expressons. () a. ndex restr ndex 8 >< nomnate-rel nomnator >: nomnee 9 >= ndex ndex >; 10 Arguments are ordered from least oblque to most oblque on the rankng famlar snce Keenan & Comre (19): subject > drect object > secondary object > oblque argument. 11 N.b.: n certan other respects the bndng theory presented n Sag & Wasow reects ts character as an ntroductory textbook, and does not correspond to the bndng theory n Pollard & Sag Gap-synsems and PRO-synsems, whch descrbe \extracted" elements and mplct Equ (or PRO) subjects, respectvely, are never the synsem value of a syntactc consttuent. Syntactc consttuents are sgns, and sgns are constraned to have synsem values of the subsort canoncal, wth a nonempty value for phon. Null pronouns also have a non-canoncal synsem type, and thus can be present n arg-st lsts, but can never satsfy a subj or comps requrement, snce only sgns can do that. 1 Ths representaton of propostonal content does not reect an essental property of HPSG. It would make no derence f some other knd of coherent representaton of a semantc analyss was substtuted, as long as t provded a way of ndcatng what propertes can be predcated of whch arguments, how arguments are lnked to ndvduals n a model of the unverse of dscourse, and how the meanng of each consttuent s a functon of the meanng of ts parts. In other words, the exact form of the representaton s not crucal as long as t provdes a compostonal semantcs. 1 For example, Copestake et al., To appear, Sag & Wasow 1999, Gnzburg & Sag To appear.

7 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE b. ndex restr ndex 8 persuade-rel >< persuader persuadee >: state-of-affars 9 ndex >= ndex psoa >; c. ndex >< coconut-rel >= restr stuaton ndex >: nstance 1 >; Indces for expressons that denote ndvduals are of the subsort ndv-nd, whle ndces for expressons that denote propertes or propostons (stuatons) are of the sort st-nd. 1 Indces for nomnal enttes n turn have attrbutes for per(son), num(ber), and gen(der). For perspcuty, n abbrevated AVMs, ndex values are often represented as subscrpts on category desgnatons: NP, for example. The content speccaton s abbrevated as a tag followng a colon after a category desgnaton; VP: represents a VP wth the content value. Fnally, the context attrbute records ndexcal nformaton (n the values of the speaker, addressee, and locaton features), and s supposed to represent, n the value of ts background attrbute, lngustcally relevant nformaton that s generally consdered pragmatc. For some dscusson, see Green (199). Constrants, Structure-sharng, and Subcategorzaton As ndcated n Secton, constrants on feature structures are expressed n terms of featurestructure descrptons. The more underspeced a descrpton s, the larger the class of objects that satsfy t, and the greater the generalzaton t expresses. Anythng that s entaled n sort dentons (ncludng lexcal representatons) or n unversal or language-specc constrants 1 does not have to be explctly mentoned n the constrants on (.e., descrptons of) classes of lngustc objects. For example, snce the Head Feature Prncple requres that the head value of the headdaughter of a phrase be the same as the head value of the phrase tself, the detals of ths value only need to be ndcated once n each representaton of a phrase. The noton of the values of two attrbutes beng the same s modelled n feature structures as the sharng of structure. Ths s represented n feature structures by means of dstnct paths of arcs termnatng at the same node as n (), 1 and n descrptons by means of dentcal boxed ntegers (tags lke ) prexed to feature-structure descrptons, denotng that they are constraned to descrbe the same structure, as llustrated above n (c). 18 Structure{sharng s a crucal property of HPSG. Because t refers to token-dentty, and not just type-matchng, t does not have a drect counterpart n transformatonal theores. Structure-sharng amounts to the clam that the value of some nstance of an attrbute s the same feature-structure as the value of some other 1 See Appendx A for more detals. 1 These may be expressed as sort dentons for hgher-level sorts. Sag (199) s an example of ths approach. 1 Ths property s sometmes referred to as re-entrancy for ths reason. 18 Techncally, a tag refers to feature structures descrbed by the uncaton of all of the feature structure descrptons o wth the same tag. The uncaton of two feature structures descrptons s a consstent feature structure descrpton that contans all of the nformaton n each one.

8 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE 8 nstance of an attrbute,.e., t s the same thng not somethng that just shares sgncant propertes wth t, or a derent thng whch happens to have all the same propertes. Thus, the followng three AVMs are equvalent descrptons of the same feature structure (a representaton of a thrd person sngular noun phrase consstng of a sngle head noun). 19 () a. per rd synsemjcontentjndex 1 num sg gen fem head{dtr jsynsemjcontjndex 1 non-hd{dtrs h b. synsem jcontentjndex 1 per rd head{dtrjsynsemjcontjndex 1 num sg gen fem non-hd{dtrs h c. h synsemjcontentjndex 1 num sg " # per rd head{dtrjsynsemjcontjndex 1 gen fem non-hd{dtrs h All three descrptons convey the same nformaton, snce there s only one way to satsfy the token{denttes n the three descrptons. Structure-sharng s a key descrptve mechansm n HPSG. For example, the structure{sharng requred by the descrpton of a topcalzaton structure, gven n (), requres the synsemjlocal value of the ller daughter to be the same as the sngle member of the slash value of the head daughter, as explaned n Secton 10. () hd-ller-ph phrase head verb head{dtr local j cat synsem subj h comps h n o non{local j slash 1 * phrase + " # non-hd{dtrs local 1 synsem nonlocal j slash 19 In the followng AVMs, sort annotatons are omtted, and feature{name pathways lke [A [B [C x]]] are represented as AjBjC x, as s conventonal. For perspcuty, sometmes values are labelled wth the name (n talcs) of the sort that structures ther content, but such nformaton s usually omtted when predctable. The attrbutes head-dtr and non-head-dtrs organze nformaton about the consttuent structure of phrases. Ther propertes are descrbed n subsequent paragraphs and elaborated n Secton.

9 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE 9 There are a varety of restrctons that generalze across varous subtypes of phrases. These are, n general, constrants on the hghest type they apply to n the phrase-type herarchy. Several depend on the noton of structure-sharng to constran feature-value correspondences between ssters, or between mother and some daughter, for partcular features. These nclude famlar prncples lke the Head-Feature Prncple (whch constrans the head value of a phrase to be the same as the head value of ts head-daughter) and the Valence Prncple, as well as a prncple (e.g., the Nonlocal Feature Prncple of Pollard & Sag 199) whch governs the projecton of the unbounded dependency features (slash, rel, and que). Prncples whch constran the content value of a phrase to have a certan relaton to the content values of ts daughters, dependng on what subtype of phrase t s, are speced n the sort declaratons for partcular subsorts of phrase. An example of a prncple whch s represented as part of the descrpton of a subsort of phrase s the Valence Prncple. 0 It constrans subcategorzaton relatons of every object of the sort headed-phrase so that the value of each valence feature corresponds to the respectve valence value of ts head daughter, mnus elements whch correspond to elements wth the same synsem values n the non-hd{dtrs lst for that phrase. In other words, the Valence Prncple says that the subj, comps and spr values of a phrase correspond to the respectve subj, comps and spr values of ts head daughter except that the synsems on those lsts that correspond to phrases consttutng any non-head daughters are absent from the valence attrbutes of the mother. In versons of HPSG wth defaults, 1 the Valence Prncple can be formulated as a constrant on headed phrases to the eect that the values of the valence features of a phrase are the same as those of ts head daughter, except where speced to be derent, as n (8). (8) headed-phrase subj / 1 synsem j local j cat spr / comps / subj / 1 hd{dtr j synsem j local j cat spr / comps / Valence features of the phrase would only be speced to be derent n sort declaratons for partcular headed-phrase types where the synsems of the sgns that are ssters to the head are absent from the approprate valence feature on the phrase, as dscussed n Secton. Other aspects of selecton also rely on the noton of structure-sharng. Adjuncts select heads va a head feature mod, and determners (and markers) select heads va a head feature spec n very much the same way that heads select arguments by valence features. Structure-sharng s the essence of the Head-Feature Prncple. The HFP s descrbed n Pollard & Sag 199 as an ndependent 0 A reformulaton of the Subcategorzaton Prncple of Pollard & Sag (199). 1 I.e., n nonmonotonc formulatons such as Sag 199 or Gnzburg & Sag (To appear). The rght-leanng slash n a default speccaton has the nterpretaton `unless otherwse speced, has the followng value.' The logc of default nhertance and the notaton are descrbed n Lascardes & Copestake (1999). If valence s an attrbute of categores and ts valence value s a feature-structure wth subj, comps and spr values as descrbed, then the Valence Prncple can be very smply represented as: headed-phrase synsem j local j cat j valence / 1 hd{dtr j synsem j local j cat j valence / 1

10 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE 10 constrant, but s perhaps more perspcuously represented as part of the sort declaraton for the sort headed-phrase. as shown n (9). (9) headed-phrase synsem j local j category j head 1 h head{dtr synsem j local j category j head 1 HPSG lcenses phrase types ether through Immedate Domnance Schemata (Pollard & Sag 199), or through dentons of partcular sorts of phrasal constructons (Sag 199). It does not treat consttuent-structure trees as formal objects, although consttuent structure s represented by the varous daughters attrbutes of phrasal sgns, and trees are used as a convenent graphc representaton of the mmedate consttuents and lnear order propertes of phrasal sgns. In nformal arboreal representatons, nodes are labelled by analyzable category names n the form of avms, lnear order s mposed, and branches may be annotated to ndcate the relaton (e.g., head, adjunct, complement) of daughter to mother. The avms are usually abbrevated, wth predctable parts of paths suppressed as n (10). Or very underspeced abbrevatons for them, lke NP or NP[sg].

11 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE 11 (10) Km can swm. phon Km can swm synsem head 1 subj h comps h subj head phon Km synsem h head n phon can swm synsem head 1 subj D E comps h head comp phon can synsem head 1 v aux + nv { subj D E comps D E phon swm synsem head v aux { nv { subj D E comps h Semantcs As mentoned n Secton, every sgn has a content feature whch represents ts semantcs, and as HPSG has been commtted from the start to provdng an account of how the meanng of each phrase s a functon of the meanngs of ts parts (compostonalty), prncples denng ths functon have been a part of every verson of HPSG. Naturally, what ths functon s depends on what the representatons of the semantcs of the derent types of phrases look lke. In P&S (199), the sort that s the value of the content feature vares accordng to whether the sgn's head value s of type noun, quanter, or somethng else. For most parts of speech, the content value s a parameterzed-state-of-aars (psoa), a structure representng a partcular knd of relaton, wth attrbutes for each of the roles dened for t. The values of those attrbutes are ndexes or psoas that must be structure-shared wth partcular tems on the arg-st lst, as n the abbrevated representaton n (11).

12 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE 1 (11) word head v D E subj D E comps h h arg-st NP contjndex 1, NP contjndex choose-relaton contjrestr agent 1 undergoer In (11) choose-relaton s the name of a subsort of psoa that represents the relaton that the verb choose refers to. The mmense and elaborate cross-classfyng taxonomy of psoa-sorts s n eect a representaton of encyclopedc (world) knowledge, as seen through the eyes of a language. As llustrated n (1) the value of content for a [head noun] sgn s a feature-structure of sort nomnal-object. (1) sgn cat j head n nom-object head cont ndex restr ndex set(psoa) As mentoned, ndexes have person, number and gender attrbutes. Number and Gender values may be a functon of the object n the world that s referenced by the utterance of the nomnal expresson (the object t s anchored to), or reect an arbtrary property of the word, or both, dependng on the language. Nomnal-objects also have a restrcton attrbute, whose value s a set of psoas restrctng the referent of the nomnal-object to have certan propertes. Quanters have a thrd knd of feature structure as ther content value. As llustrated n (1), they have an attrbute det, whose value s an object of sort semantc-determner, and an attrbute restnd (for Restrcted Index), whch s a nomnal-object of the subsort nonpronoun, whose ndex value s always a referental ndex (as opposed to an expletve ndex). (1) word cat det quant det semdet head cont restnd npro ndex restr ref set(psoa) See Davs 199 for a cogent dscusson of the complexty of the factors enterng nto the classcaton, and an llustraton of how ther nteracton can be represented to reect the correct generalzatons about how semantc roles are lnked to syntactc argument ndexes. One major achevement of ths work s the multple-nhertance herarchy of relatons accordng to entalments and role-types (remnscent of Dowty's (1991) proto-roles, but reed). For some dscusson, see Green 199.

13 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE 1 The rst formulaton of the compostonalty prncple that constrans the content of a phrase n terms of the content values of the phrase's mmedate consttuents s relatvely smple: the content of a phrase s structure-shared wth the content of ts head-daughter. Because of the nature of psoa repesentatons for all predcatve expresson types (phrases headed by verbs, and predcatve prepostons, adjectves, and nouns), ths works ne for phrases consstng of a head and complements, or a head and a subject. It doesn't work at all for phrases consstng of a head and a moder, lke [[eats peas] slowly]. Ths necesstates a second, dsjunctve formulaton whch adds the condton that f the non-head daughter s an adjunct, then the content of the phrase s structure-shared wth the content of the adjunct daughter. Ths appears to gve exactly the rght results for moders of VPs, as long as those moders are analyzed lke functons whch take ther heads as arguments, as s famlar from many semantc tradtons. However, t necesstates a somewhat straned analyss of attrbutve adjectves. Because the semantcs of a head-adjunct phrase s the same as that of the adjunct, the content value of attrbutve adjectves has to have the same semantc type as nouns and NPs have (nomnal objects). That means they have an ndex attrbute (whose value has to be stpulated to be the same as that of the nomnal head they modfy), and a restrcton attrbute (whose value has to be stpulated to be whatever propertes are contrbuted by the adjectve, unoned wth whatever propertes are contrbuted by the nomnal expresson that s moded). In any case, that clause of the P&S (199) Semantcs Prncple has been convncngly shown to make ncorrect predctons (Kasper 199). The clause denng the contrbuton of quanters and attemptng to account for the ambguty of scope s so complex (P&S: ) that the nformal verson of t requres four to sx clauses to state, and I wll not try to summarze t. At the same tme as the emprcal problems nvolved n adjunct semantcs were beng brought to lght, work on computatonal mplementatons showed a need for semantc representatons that mnmzed recurson. One motvaton for mnmzng recurson relates to the fact that n many of the natural language processng applcatons whch utlze mplementatons of HPSG, t s unnecessary to resolve ambgutes of quanter scope, as n (1). (1) A $000 nvestment s enough to become a shareholder n thousands of mutual funds. Mnmal Recurson Semantcs (referred to as MRS see Copestake, Flcknger, Pollard and Sag (To appear)) enables semantc representatons whch are underspeced wth respect to quanter scope. Another motvaton relates to the fact that computng wth semantc representatons wth unrestrcted recurson consumes nordnate quanttes of computatonal resources. As a consequence of mnmzng recurson, the content values n Sag & Wasow (1999) are more unform, and allow a smpler compostonalty statement, though at the cost of addtonal typed ndces. In addton, the factorng entaled by the mnmal recurson approach to semantcs enables a feature geometry whch enforces the Localty Constrant (that only mmedate complements can be selected, not arguments embedded wthn complements); the lst of semantc objects nvolved n a representaton just has to be a sgn-level attrbute, rather than an attrbute wthn content. In MRS-style analyses, content values have three attrbutes: mode, ndex and restrcton. The possble values of mode are the atomc modes proposton, drectve, nterrogatve, and reference. References have ndex values whch are ndces to ether enttes (n the case that the expresson s a referrng expresson) or stuatons (n the case that the expresson s predcatve (e.g., a verbal, adjectval, or predcatve prepostonal or nomnal expresson) lke the talczed expressons n (1)). Sag & Wasow (1999) do not attempt an analyss of quanters.

14 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE 1 (1) a. Km laughed. b. Km s funny. c. Km s under the table. d. Km s a pedatrcan. Proposton-, drectve-, and nterrogatve-valued contents always have a stuaton-valued ndex. Restrcton values are sets of typed predcatons, smlar n structure to psoas n P&S (199), except that each one has a stuaton attrbute wth a st-nd value. To llustrate, Km, pedatrcan and Km s a pedatrcan n (1d) would have the content values n (1a, b, c), respectvely. (1) a. mode ref ndex 1 ndv-nd 8 9 called >< st >= restr entty 1 >: name Km >; b. mode prop ndex 8 9 >< pedatrcan >= restr st >: nstance 1 >; c. mode prop ndex n o restr, The theory as sketched and the representaton n (1b) ental that t s a representaton of a proposton whose ndex s of type stuaton-ndex, and the stuaton ndexed s requred to be descrbed by the one-place predcaton that somethng satses the predcate of pedatrcanhood. In the representaton of Km s a pedatrcan n (1c), that somethng s requred to be whatever satses the predcaton n (1a) that somethng bears the name Km. The content value n (1c) llustrates conformty to the prncples of Semantc Compostonalty and Semantc Inhertance: Semantc Compostonalty: A phrase's restr value s the unon of the RESTR values of the daughters. Semantc Inhertance: A headed-phrase's mode and ndex values are structure shared wth those of the head daughter. Ths analyss s a synthess of the analyss of Pollard & Sag (199) as rened n Secton 8.. (pp. -), and the Mnmal Recurson Semantcs analyss as smpled n Sag & Wasow's (1999) ntroductory-level textbook. The termnology of Mnmal Recurson Semantcs s explaned n detal n Copestake, Flcknger, Pollard & Sag (To appear).

15 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE 1 These amount to addtonal constrants n the denton of phrase and headed-phrase, respectvely. 8 (1) (18) phrase synsem j local j cont j restr 0 [... n h head{dtr synsem j local j cont j restr 0 h h non-hd{dtrs...restr 1,......restr n headed-phrase synsem j local j cont head{dtr " mode 1 ndex synsem j local j cont " mode 1 # # ndex Consttuent structure As wth many other aspects of grammar, HPSG allows both monotonc and default-logc accounts of consttuent structure. In the monotonc account of Pollard & Sag (199), nformaton about the consttuent structure of phrases (as well as nformaton about the relaton of the consttuent parts to each other) s recorded n the varous daughters attrbutes (head-dtr, comps-dtrs, subjdtr, fller-dtr, adjunct-dtr, spr-dtr (specfer-dtr)) of partcular phrase types. These features are all lst-valued, enablng them to be present but empty, though head-dtr, subj-dtr, and spr-dtr seem to have to be lmted to beng no longer than sngleton lsts. Thus, a descrpton lke (19) ndcates a tree wth three daughters: a verb head daughter, and two complement daughters (an NP and a PP). 8 In a path-descrpton (.e., n a descrpton of a path n a feature-structure from a node through attrbutes n complex values), \...fsd..." denotes any vald path through a feature-structure satsfyng the constrant fsd. In a set-descrpton, \..., fsd,..." denotes a feature-structure-descrpton that s satsed by any feature structure whch contans an arc to a node that satses the descrpton fsd.

16 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE 1 (19) phrase D E phon gves, a, book, to, Sandy head v * synsem + subj head n synsem j local j cat spr h comps h spr h subj App-synsems ( 1, ) head-dtr j synsem j local j cat comps App-synsems (, ) spr App-synsems (, ) subj-dtrs 1 h D E comp-dtrs NP, PP spr-dtrs h Ths analyss employs an App(end)-synsems functon whch appends ts second argument (a lst of synsems) to a lst of the synsem values of ts rst argument (whch s a lst of phrases). In the case of (19), appendng the lst of synsems to the lst of the synsems of the elements of the lst 1 yelds the lst of synsems, because the lst 1 s the empty lst. Appendng the lst of synsems to the lst of the synsems of the elements of the lst yelds the lst of synsems, because the lst s the empty lst. Appendng the lst of synsems to the lst of the synsems of the elements of the lst yelds an empty lst, because t amounts to appendng the empty lst to the empty lst of synsems of the elements of the empty lst. It s mportant to note that \NP" and \PP" are abbrevatons for phrases, not synsems, snce the values of subj-dtrs, comps-dtrs, etc. are lsts of phrases, whle the values of subj, comps etc. are synsems. Sag (199) oers an alternatve representaton whch elmnates the redundancy of daughtersfeatures wth the valence features by dstngushng subtypes of phrases n terms of relatons between the values of ther valence features and a non-head{daughters lst. Consderng the Valence Prncple to constran the values of the valence features of the phrases relatve to ther values on the head daughter and to the synsem values of the non-head daughters, as descrbed n Secton, a head-subject phrase (e.g., a nte declaratve clause) s dened as n (0a), and a head-complement phrase as n (0b). 9 (0) a. hd-subj-ph h synsem j loc j cat subj h D E subj 1 head-dtr j synsem j loc j cat spr h comps h h non-hd-dtrs synsem 1 9 The sequence unon of lsts m and n, m n, s the lst consstng of n appended to m.

17 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE 1 b. hd-comps-ph h synsem j loc j cat comps h word head-dtr j synsem j loc j cat spr h comps h 1... h n h h non-hd-dtrs synsem 1,..., synsem n As shown n (0), some valence values of the phrase and the head daughter are requred to be derent, but the valence constrant n (8) ensures that all the valence values not speced to be derent wll be the same. As a consequence, n ths approach, the analyss of the verb phrase gves a book to Sandy s as n (1). (1) head-comps-ph D E phon gves, a, book, to, Sandy head v synsem j local j cat subj 1 comps h * synsem + subj 1 head n head-dtr j synsemjlocaljcat spr h spr h D E comps, h h non-head-dtrs NP synsem, PP synsem Consttuent order The general outlnes of the HPSG approach to consttuent order derve from the theory of lnear precedence rules developed wthn the tradton of Generalzed Phrase Structure Grammar, as motvated n Gazdar & Pullum (1981), and summarzed n Gazdar, Klen, Pullum, & Sag (198). There are generalzatons about the order of consttuents n a phrase relatve to one another that standard versons of X-Bar theory (cf. Pullum 198, Korna & Pullum 1990) are too restrctve to capture wthout postng a multtude of empty nodes and forced movement chans. The theory of Lnear Precedence (LP) rules allows orderng constrants to be stated n terms of any attrbutes of consttuents, as long as the orderng relatons hold of every set of sster consttuents lcensed by the grammar, and ths provso of Exhaustve Constant Partal Orderng (ECPO) mposes a very restrctve constrant on possble grammars. Because these orderng constrants are partal, they allow unconstraned pars of elements to be ordered freely relatve to each other. Thus, as n GPSG, so-called \free word order" (.e., free phrase order) s a consequence of not constranng the order of consttuents at all. (Genunely free word order, where (any) words of one phrase can precede (any) words of any other phrase requres a word-order functon that allows consttuents of one phrase to be nterleaved wth consttuents of a sster phrase (Pullum (198a), Pollard & Sag (198)).

18 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE 18 LP rules for HPSG were dscussed at some length n Pollard & Sag (198, Ch. ). It was envsoned that lnear precedence constrants would be constrants on the phon 0 values of phrases wth content along the followng lnes: A lexcal head precedes all of ts ssters (or follows, dependng on the language). Fllers precede phrasal heads. Less oblque complements not headed by V precede more oblque phrasal complements. Presumably an Order-phon functon would apply to the lst consstng of the (append of the) headdaughter and the lst of the non-head-daughters to constran the orderng n the phon value of the phrase n terms of the relevant propertes of the varous daughter phrases. Such a functon mght amount to somethng paraphrasable as: The phon value of the ller-daughter precedes the phon value of the head-daughter, and the phon values of daughters that are words precede those of daughters that are phrases, etc. As serous grammar development for a number of languages (especally notably, German and French) has made clear, word order constrants are not always compatble wth the semantc and syntactc evdence for consttuency. German s a \conguratonal" language, verb-second n man clauses, verb-nal n subordnate clauses. However, the consttuents of certan types of phrases may be ordered dscontntuously{nterleaved rather than concatenated{wth sster consttuents so that the poston of complements (and parts of complements!) s remarkably (to Englsh speakers, anyway) free. The German sentences glossed n () provde representatve examples of the problem. () a. Kaufen glaube ch ncht, dass Mara das Auto wll. buy beleve I not that Mara the car wants I don't beleve Mara wants to buy the car. b. [Das Manuskrpt gezegt] hat Mara dem Studenten. the manuscrpt shown has Mara the student-dat Mara has shown the manuscrpt to the student. c. Ich glaube dass der Mann das Led hat sngen koennen. I beleve that the man the song has sng-nf can-nf. I beleve that the man has been able to sng the song. Thus, n (a), the head of an embedded complement appears n ntal poston n the man clause, wth ts arguments and the nte verb of the complement n ther canoncal places. 1 0 Order s represented n phon values because phon s the feature that represents the physcal reectons of the consttuents, whch are abstract, postulated objects of type sgn. 1 In (b), the head of the complement and one of ts arguments appears n the ntal poston of the man clause, wth the other two arguments n ther normal poston after the verb. In (c), the head (konnen) of the complement of the man verb of the hghest embedded clause follows the nte (man) verb hat, whle the complement das Led of the more deeply embedded verb sngen precedes hat.

19 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE 19 The resoluton to ths dlemma consttutes a lvely topc n current research. Kathol (199), Nakazawa & Hnrchs (1999) and Reape (199, 199) explore these ssues n more detal (see also Dowty (199)). 8 The lexcon, lexcal relatons, lexcal rules As n Lexcal Functonal Grammar, most of the detal n ndvdual Head-Drven Phrase-Structure grammars s encoded n the lexcal entres for partcular lexcal elements everythng that sn't n the (mostly) unversal dentons of phrase-types (whch nclude most of the varous named Prncples (e.g., the head Feature Prncple, the Valence Prncple)). But f the speccaton of phrase types s herarchcal and mult-dmensonal, the lexcon s herarchcal and mult-dmensonal wth a vengeance. 8.1 Organzaton of the lexcon What knd of phrase wll lcense a partcular lexcal head s, as descrbed n Sectons {, a functon of the argument structure of that head (lterally, of the arg-st value of the lexcal head): what sort of arguments t needs as ts complements, subject and/or specer, whether any of them has a gap n t (see Sec. 10), whether the subject of an nntve complement must be the same as, or the same n reference as some other argument of the predcate (see Sec. 9). Ths nformaton s to a large extent predctable: verbs of a certan class requre at least an NP drect object; verbs of a certan subclass of that class requre that NP object to be dented wth the unexpressed subject of an nntve VP complement. Thrd-person sngular present tense verbs der systematcally from other present tense verbs, and past tense and partcpal forms of verbs der systematcally from stem (base) forms. In addton, auxlary verbs der systematcally from man verbs, but ths dstncton cross-cuts several other ones. Smlarly, nouns are classed by whether or not they allow or requre a determner when they are sngular (pronouns and proper nouns don't allow a determner), and what sort of determner t can be. For example, the quanter much goes only wth a mass noun, many goes only wth a plural count noun, a/an requres a sngular count noun, and the and some are not selectve. Facts such as these motvate classfyng the elements of the lexcon n multple dmensons. Argument-takng predcates are classed by transtvty, by argument-coreference wth a VP complement's subject (Equ-predcates), by argument-dentty wth a VP complement's subject (Rasng predcates). Nomnal expressons are classed by properness, pronomnalty, and so on. The fact that verbs of every class have nectonal forms drawn from the same set, the fact that derent forms of ndvdual verbs encode the same semantc roles, whether or not any argument s unexpressed, and the fact that, n more-nected languages, nouns (and adjectves) of every class have the same sets of morphologcal cases motvate lexcal representatons whch lnk lexemes wth ther regular and dosyncratc nectonal characterstcs (Mller & Sag (199), Abelle, Godard & Sag (1999) Sag & Wasow (1999)). The lexeme dmenson of the lexcal herarchy encodes syntactc and semantc nformaton that dstngushes the lexcal element from others, ncludng nformaton nherted from hgher sorts, and nformaton specfyng how semantc roles are lnked to grammatcal relatons and morphologcal cases (see Davs 199), as well as any dosyncratc syntactc or semantc nformaton. The nectonal dmenson relates to nformaton that mght be reected n morphologcal necton: for example, on a verb, the person, number, and pronomnalty of ts arguments, as well as the presence of all arguments; on a determner, whether t s count or mass, sngular or plural, or nderent to ether dstncton, and so on. Thus, there s a lexeme sort gve, and whole famles of words of the form gve, gves, gvng, gave, gven. Fnally, there s no pretense that the lexcon s qute as systematc as the foregong descrpton makes t sound. There s no denyng that some propertes of lexemes can be descrbed at the most

20 Draft., Jan. 000{COMMENTS SOLICITED; PLEASE DO NOT QUOTE 0 general level only f provson s made for them to have occasonal exceptons ether ndvdual lexemes, or partcular subclasses whose requrements are contrary to what s true of the class as a whole. Consequently, t s assumed that at least some speccatons n the herarchcal lexcon should be represented as havng default values, whch s to say that nhertance wthn the lexcon s not strctly monotonc; values speced for a supersort can be contradcted n speccatons for a partcular subsort (ncludng an ndvdual lexeme) of the sort bearng the default value. For example, the overwhelmng majorty of nomnal lexemes are non-reexve ([ana {]), but reexve pronouns have to be represented as such ([ana +]) so that the bndng theory (see Sec. 11) can refer to them and constran ther dstrbuton. Thus, the speccatons for the sort noun-lexeme ndcate that the property of beng non-reexve s a default: the speccaton [ana / {] can be overrdden n the speccaton for the reexve pronoun subsort of pronoun-lexeme. 8. Relatons among lexcal entres Three knds of relatons among lexemes motvate lexcal rules. Frst, as anyone who has ever done an electronc search, compled a concordance, or dealt wth word frequency lsts knows, words that have the same stem and der only n ther necton (e.g., for number, tense, agreement) count n one sense as \nstances of the same word," snce they have the same meanng, requre the same arguments llng the same roles, and so on. In another sense, of course, they are clearly not nstances of \the same word," snce they have derent nectons. Lexcal rules allow the shared characterstcs to be stated once n a relatvely underspeced lexeme, wth the non-shared characterstcs speced by lexcal rules that depend on the class (or classes) to whch that lexeme belongs. Lexemes can be (somewhat less freely) related dervatonally, as well as by necton. Thus, languages typcally have means of makng nouns that correspond n regular ways to verbs (e.g., agent nomnalzatons (do-er), zero-ax deverbal result nouns (kck, spt), as well as deverbal nomnalzatons n -ton. Languages also have means of makng verbs that correspond n regular ways to adjectves (e.g., de-adjectval causatve verbs n -fy and -ze), and to nouns (e.g., enprexed denomnal verbs (enact, empower, emplane, engulf)), and to zero-axed nstrumental verbs (hammer) and change-of-state verbs (tle, bone, bottle). These relatons typcally embed the meanng of a lexeme wth one part of speech wthn the meanng of a lexeme wth a very general meanng, often wth a derent part of speech. The derved lexeme has a phonology that s a functon of both factors. A thrd, much more restrcted knd of relaton nvolves valence alternatons, where two verbs wth the same phonology and roughly the same meanng map the same semantc relaton to derent lsts of syntactc categores. Some famlar examples are: () datve alternaton: causatve alternaton: telc alternatons: Km sent Lee a letter. / Km sent a letter to Lee. Fdo walked. / Km walked Fdo. Dale walked. / Dale walked a mle. Dale ate. / Dale ate lunch. Levn (199) oers the most complete descrpton of Englsh valence alternatons. In the case of verb alternatons, one of the alternants may have more semantc restrctons n addton to havng In zero-axaton cases, the matrx lexeme contrbutes no phonology at all. When denomnal verbs get formed from nouns wth rregular plurals, or deverbal nouns get formed from verbs wth rregular thrd sngular present tense forms, ther categoral propertes are determned by the matrx lexeme, whle ther nectons are completely regular: one leaf, two leaves but to leaf, t leafs to do, he [dz] but a do, two [duz]