Empirical Likelihood Ratio in Terms of Cumulative Hazard Function for Censored Data

Joural of Multivariate Aalysis 80, 6688 (2002) doi0.006jmva.2000.977, available olie at httpwww.idealibrary.com o Empirical Likelihood Ratio i Terms of Cumulative Hazard Fuctio for Cesored Data Xiao-Rog Pa ad Mai Zhou Uiversity of Ketucky ad Searle E-mail maims.uky.edu Received April 7, 2000; published olie July 9, 200 It has bee show that (with complete data) empirical likelihood ratios ca be used to form cofidece itervals ad test hypotheses about a liear fuctioal of the distributio fuctio just like the parametric case. We study here the empirical likelihood ratios for right cesored data ad with parameters that are liear fuctioals of the cumulative hazard fuctio. Martigale techiques make the asymptotic aalysis easier, eve for radom weightig fuctios. It is show that the empirical likelihood ratio i this settig ca be easily obtaied by solvig a oe parameter mootoe equatio. 200 Elsevier Sciece AMS 99 subject classificatios 62G0; 62G05. Key words ad phrases weighted hazard; oe sample log rak test; stochastic costrait; media.. INTRODUCTION Based o the likelihood fuctio there are three differet methods to produce cofidece itervals amely Wald's method, Rao's method, ad Wilks' method. Amog the three, the Wilks likelihood ratio (LR) method does ot eed the calculatio of iformatio or the iverse of that. It automatically adjust the statistics &2 log LR to a pivotal. This ca be a real advatage i the case where the iformatio (or iverse of it) is difficult to estimate. Eve whe all three are easy to obtai, the LR method still holds some uique advatages. For example, the cofidece itervals produced by the LR method are always rage respectig (cofidece bouds iside the parameter space), while the other two are ot. Therefore, trasformatio o the parameter is ofte used i coectio with Wald's ad Rao's methods to overcome the rage problem. However, the choice of the trasformatio is ad hoc. For ew parameters it is ofte uclear what trasformatio to use. I this respect, the LR method ca be described as achievig the result comparable to Wald's method with the best trasformatio, but without the eed to explicitly fid the best trasformatio. 66 0047-259X0 35.00 200 Elsevier Sciece All rights reserved.

CENSORED EMPIRICAL LIKELIHOOD RATIO 67 Recetly, Owe (988, 990) ad may others showed that the likelihood ratio method ca also be used to produce cofidece itervals i oparametric settigs after some modificatio. He termed this empirical likelihood ratio method. The empirical likelihood (EL) of i.i.d. observatios X i is just EL(F)= ` 2F(X i ). Without ay restrictios, the empirical distributio fuctio, F=F (t)= I [Xi t], will maximize the EL amog all possible distributio fuctios; therefore it is referred to as the oparametric maximum likelihood estimator or NPMLE. With a liear costrait of the form g(t) df(t)=+, (.) Owe (988, 990) showed that the distributio fuctio that maximizes the EL subject to the costrait ca be calculated usig the Lagrage multiplier method. He showed that such a distributio fuctio F has jump at X i equal to where * is defied by the equatio 2F(X i )=2F (X i )_ +*(g(x i )&+), 2F (X i ) g(x i ) +*(g(x i )&+) =+. Oce the costraied maximum is obtaied, it ca be show that the empirical likelihood ratio statistic, &2 log ELR(+), coverges i distributio to a chi-square distributio. However, a geeralizatio of the above settig to the right cesored data case is difficult. No explicit maximizatio uder costrait (.) ca be obtaied. I the aalysis of cesored data, it is ofte more coveiet to model the data i terms of the (cumulative) hazard fuctio 4(t) which is defied by 4(t)= [0, t) df(s) &F(s&). (.2) It gives rise to a martigale formulatio of the observatios. For example, the regressio model i terms of hazard leads to the Cox proportioal hazards model; oparametric estimatio i terms of cumulative hazard

68 PAN AND ZHOU leads to the NelsoAale estimator which is much easier to aalyze tha the KaplaMeier estimator. Also, iformatio i terms of hazard (Efro ad Johsto, 990) ad the Helliger distace i terms of hazard (Yig, 992) all have bee studied ad proved to be iformative. Therefore it is atural to look at the empirical likelihood i terms of hazard ad costraits i terms of hazard as i (2.6). It turs out that the theory for the EL i terms of hazard is much simpler for right cesored data. Also, martigale formulatio makes it easy to hadle eve stochastic (predictable) weight fuctios. We obtaied results for geeral parameters of the followig types () %= g(t) d4(t) for arbitrary give g(t). (2) % =g (t) d4(t) where g (t) is a radom but predictable fuctio ad depeds o sample size ; % ca also chage with sample size. (3) % is defied implicitly g(t, %) d4(t)=c for a costat C. Parameters of the first type ca arise i the cotext of a time-depedet covariate Cox model. I such a model the cumulative hazard for a perso with a time-depedet multiplicative covariate g(t) ca be computed as 4 i ({)= { 0 g(t) d4 b(t), where 4 b is the baselie cumulative hazard. The parameter of the secod type is prompted by the oe sample lograk type tests. The weight fuctio of the oe sample log-rak test takes the form g(t)=y(t) where Y(t) is the size of the risk set at time t. See, for example, Aderse et al. (993, Sect. V.) for details ad other similar types of tests. As a further example for the stochastic weight fuctio g, we take the mea, which ca be obtaied from the itegratio of the cumulative hazard with g(t)=t[&f(t&)]. Sice F is ukow, we may use g (t)=t[&f (t&)]. The prime example for the implicit type parameters is the quatiles. For example, the parameter % of the media may be defied implicitly as I [t%] d4(t)=log 2. Aother purpose of this paper is to serve as a startig poit i the compariso of the two differet types of empirical likelihoods with right cesored data, (2.4) ad (2.5). Sectio 4 shows that for cotiuous F ad as the two are equivalet, but there are may differeces whe F is discrete ador for small. We shall preset the differeces whe usig the three types of parameters discussed above i a forthcomig paper. Murphy (995) also studied the empirical likelihood ratio usig coutig process formulatios. She obtaied the explicit result whe the costrait is the hazard fuctio itself evaluated at a poit, 4(t 0 )=&log[&f(t 0 )]. Li (995), buildig o the earlier work of Thomas ad Grukemeier (975), studied the empirical likelihood method for cesored data, but oly for the parameters of the form F(t 0 ). Murphy ad Va der Vaart (997) proved a very geeral result but i each specific case oe still eeds to work out the ofte o-trivial coditios; also it is ot clear how the empirical

CENSORED EMPIRICAL LIKELIHOOD RATIO 69 likelihood should be computed. Our result gives a more explicit way to compute such itervals. We eed oly to fid the root of a mootoe uivariate fuctio. Oce the root is foud the likelihood ratio is easily obtaied (see (3.2) or (4.)). Besides, oe of the above papers deals with stochastic costraits. Due to the similarity of techical treatmet betwee the three types of costraits we shall preset the detailed proof oly for the first typ of costrait ad omit the proofs for the other two types of costraits. The rest of the paper is orgaized as follows Sectio 2 defies the likelihood i terms of hazard ad calculates the maximum of the likelihood uder the costrait of type. Sectio 3 studies the asymptotic behavior of the likelihood ratio ad shows that it coverges to a chi-square distributio. Sectio 4 looks at the differece betwee two versios of the likelihood. Sectio 5 deals with the stochastic costrait ad the implicit costrait. Sectio 6 cotais some examples. Fially some techical proofs are collected i the Appedix. 2. LIKELIHOOD IN TERMS OF HAZARD AND ITS MAXIMUM UNDER A CONSTRAINT OF TYPE Suppose that X,..., X are i.i.d. oegative radom variables deotig the lifetimes with a cotiuous distributio fuctio F 0. Idepedet of the lifetimes there are cesorig times C,..., C which are i.i.d. with a distributio G 0. Oly the cesored observatios are available to us T i =mi(x i, C i ); $ i =I[X i C i ] for, 2,...,. (2.) The empirical likelihood based o cesored observatios (T i, $ i ) pertaiig to F is EL(F)= ` [2F(T i )] $ i [&F(Ti )] &$ i. (2.2) Sice the NPMLE of the distributio F ad hazard 4 are both kow to be purely discrete fuctios (i.e., KaplaMeierNelsoAale estimator), it is reasoable to restrict the aalysis of the likelihood ratio to the purely discrete fuctios domiated by their NPMLEs. This is similar to the use of sieves i the likelihood aalysis. See Owe (988) for more discussio o this restrictio. Usig the relatio betwee hazard ad distributio &F(t)= ` st (&24(s)) ad 24(t)= 2F(t) &F(t&) (2.3)

70 PAN AND ZHOU that is valid for purely discrete distributios we ca rewrite (2.2) i terms of the cumulative hazard fuctio. The empirical likelihood (2.2) becomes EL(4)= ` [24(T i )] i_ $ ` (&24(T j )) & $ i j T j <T i ` (&24(T j )) & &$ i. (2.4) j T j T i The hazard fuctio that maximizes the likelihood EL(4) without ay costrait is the NelsoAale estimator; see, e.g., Aderso et al. (993). We shall deote the NelsoAale estimator by 4 NA(t). O the other had, a simpler versio of the likelihood ca be obtaied if we merge the secod ad third factors i (2.4) ad replace it by exp[&4(t i )], which was called a Poisso extesio of the likelihood by Murphy (994) AL(4)= ` [24(T i )] $ i exp[&4(ti )]. (2.5) See also Gill (989) for a detailed discussio of differet extesios of the likelihood fuctio for discrete distributios. Notice we have used a formula that is oly valid for cotiuous distributio i the case of a discrete distributio. But the differece is small ad egligible for large as we shall see later. O the other had, the maximizer for AL(4) for fiite is also the NelsoAale estimator, givig AL some legitimacy. We shall use AL i our aalysis first due to its simplicity ad examie the differece betwee AL ad EL later. The first ad crucial step i our aalysis is to fid a (discrete) cumulative hazard fuctio that maximizes AL(4) uder the costrait (of type ) g(t) d4(t)=%, (2.6) where g(t) is a give fuctio that satisfies some momet coditios, ad % is a give costat. We poit out before proceedig that the last jump of a (proper) discrete cumulative hazard fuctio must be oe. This is evidet from the relatio (2.3), secod equatio. This restrictio is similar to the ``jumps sum to oe'' restrictio o the discrete distributio fuctios. The cosequece is that ay discrete cumulative hazard fuctio domiated by the

CENSORED EMPIRICAL LIKELIHOOD RATIO 7 NelsoAale estimator must, at the last observatio, have the same jump as the NelsoAale estimator. I light of this we rewrite the costrait (2.6) i terms of jumps. For simplicity we shall assume there is o tie i the ucesored observatios. Without loss of geerality we assume T T 2 }}}T where oly possible ties are betwee cesored observatios. Let w i =24(T i ) for, 2,...,, where we otice w =$. The costrait (2.6) for ay 4, that is domiated by the NelsoAale estimator, ca be writte as & $ i g(t i ) w i + g(t ) $ =%. (2.7) Similarly, the likelihood AL at this 4 ca be writte i terms of the jumps AL= ` [w i ] $ i exp { & i j= w j=. (2.8) Aother importat issue is that the costrait equatio may ot always have a solutio for certai values of %. A obvious example is whe g(t)0 ad %>0. Thus for each give g(t) ad sample, we shall oly study i detail the feasible costraits, those % values that have at least oe set of solutio to (2.7). For those that do ot have a solutio we defie the value of the likelihood uder this costrait to be zero. Note that to be qualified as a solutio, we must have 0w i < for, 2,..., &. To fid the maximizer of AL uder costrait (2.7), we use Lagrage multiplier method. Oce the costraied maximizer is foud by the Lagrage multiplier (recall the ucostraied maximizer was kow to be the NelsoAale estimate), we ca proceed to study the empirical likelihood ratio. Theorem. The feasible values of % i the costrait (2.7) are give by the iterval V defied at the ed of the proof. If the costrait (2.7) is feasible, the the maximum of AL uder the costrait is obtaied whe $ i w i =W i = (&i+)+*g(t i ) $ i = $ i &i+ _ +*($ i g(t i )((&i+))), (2.9)

72 PAN AND ZHOU where * i tur is the solutio of the equatio & l(*)=%, where l(*)# g(t i ) $ i &i+ _ +*($ i g(t i )((&i+))) + g(t ) $. (2.0) Proof. To use the Lagrage multiplier, we form the target fuctio G= $ i log w i & i j= & w j +* _%& $ i g(t i ) w i &$ g(t ) &. Takig partial derivative with respect to w i, for,..., &, ad lettig them equal zero, we obtai G w i = $ i w i &(&i+)&*g(t i ) $ i =0,, 2,..., &. By solvig this equatio we get the explicit expressio for w i $ i W i = (&i+)+*g(t i ) $ i = $ i &i+ _ +*($ i g(t i )((&i+))) =24 NA(T i ) +*($ i g(t i )((&i+))) for, 2,..., &, where * has to be chose to satisfy the costrait (2.7). By pluggig W i ito (2.7) we see that * ca be obtaied as a solutio to the equatio & l(*)# g(t i ) $ i &i+ +*($ i g(t i )((&i+))) + g(t ) $ =%. The fuctio l(*) above is mootoe decreasig ad cotiuous i *, a fact that ca be verified by takig a derivative of l(*) with respect to *. O the other had, ay choice of legitimate value * must result i w i through (2.9) that are boa fide jumps of a discrete cumulative hazard fuctio, which must be bouded betwee zero ad oe. This restrictio leads to the followig legitimate * rage J.

CENSORED EMPIRICAL LIKELIHOOD RATIO 73 All max ad mi i the followig defiitios are take i the domai [i i&, $ i =, ad g(t i ){0]; if there is ay additioal restrictio the we specify i each idividual case. Case. Whe mi g(t i )>0 i& J= \max g(t i ), + =(*, ). Case 2. Whe max g(t i )<0 J= i& mi =(&, * ). \&, g(t i )+ Case 3. Whe max g(t i )>0>mi g(t i ) J= \ max g(t i )>0 i& g(t i ), mi g(t i )<0 i& g(t i )+ =(*, * ). Sice the fuctio l( } ) is cotiuous ad mootoe, the correspodig rage of the % value that makes Eq. (2.0) feasible (has a set of solutio that is a boa fide cumulative hazard fuctio) is as follows. Notice these % values also make the costrait (2.7) feasible. Case. Case 2. & V= \g(t ) $, $ i g(t i ) &i++* g(t i ) + g(t ) $ +. V= \ & $ i g(t i ) &i++* g(t i ) + g(t ) $, g(t ) $ +. Case 3. V= \ & $ i g(t i ) &i++* g(t i ) + g(t ) $, & $ i g(t i ) &i++* g(t i ) + g(t ) $ +. K

74 PAN AND ZHOU 3. ASYMPTOTIC PROPERTIES Now we study the large sample behavior of the empirical likelihood uder costrait (2.6). First, we preset a lemma about the large sample behavior of the solutio * of (2.0). Lemma. Suppose g(t) is a left cotiuous fuctio ad 0< g(x) m d4 0 (x) <, m=, 2. (&F 0 (x))(&g 0 (x)) The % 0 =g(t) d4 0 (t) is feasible with probability approachig as, ad the solutio * of (2.0) with %=% 0 satisfies * 2 w D / 2 ()\ g 2 & (x) d4 0 (x) (&F 0 (x))(&g 0 (x))+ as. Proof. See the Appedix. K Next we defie the empirical likelihood ratio i terms of the hazard for the costrait (2.7) as ALR(%)= sup[al(4) 4<<4 NA, ad 4 satisfy (2.7)]. AL(4 NA) By Theorem, ALR(%) ca be computed, whe the costrait is feasible, by usig W i defied there ad the kow property of 4 NA 24 NA(T i )=$ i (&i+). Theorem 2. Let (T, $ ),..., (T, $ ) be pairs of radom variables as defied i (2.). Suppose g is a left cotiuous fuctio ad 0< g(x) m (&F 0 (x))(&g 0 (x)) d4 0(x)<, m=, 2. The, % 0 =g(t) d4 0 (t) will be a feasible value with probability approachig oe as ad &2 log ALR(% 0 ) w D / 2 () as. Proof. I view of Lemma 2, we eed oly to prove the last claim &2 log ALR(% 0 ) w D / 2 () as. To this ed, defie Z i =$ i g(t i ) (&i+) for, 2,...,, (3.)

CENSORED EMPIRICAL LIKELIHOOD RATIO 75 ad cosider &2 log ALR(% 0 ) =2 _ &2 _ $ i log 24 NA (T i )& +2 _ & & + =&2 =&2 & &2 & =2 $ i log 24 NA (T i ) & $ i log(+*z i ) (&i+) 24 NA (T i ) & (&i+) 24 NA (T i ) +24 +*Z NA (T ) & i & $ i +2 & $ i +2 $ i *Z i +*Z i +2$ & $ i log(+*z i )&2 & $ i log(+*z i )+2 & $ i log(+*z i )+2 $ i +*Z i +2$ $ i & $ i *Z i +2 $ i * 2 Z 2 i +*Z i. (3.2) Notice max i *Z i =O p ( &2 ) max i Z i by Lemma. Now use Lemma A2 with h= g- (&F)(&G) ad Zhou (99) ad we have max Z i max i i $ i g(t i ) (&F 0 (T i ))(&G 0 (T i )) _ max i (&F 0 (T i ))(&G 0 (T i )) (&i+) =o p ( 2 ) O p ()=o p ( 2 ). (3.3) Thus max i& *Z i =O p ( &2 ) o p ( 2 )=o p () ad we may expad log(+*z i )=*Z i & 2 *2 Z 2 i +O p(* 3 ) Z 3 i. (3.4)

76 PAN AND ZHOU Substitutig (3.4) i the expressio of &2 log ALR(% 0 ), we have & &2 log ALR(% 0 )=2 where, as, & &2 & $ i Z i & & =* 2 ad, otig $ i Z 3 i =Z3 i, & $ i *Z i +2 & $ i * 2 Z 2 +O i p(* 3 ) & $ i Z 2 i +O p (* 3 ) Z 3 i & $ i * 2 Z 2 &2 i & Z 3 i &2* 3 & } O p(* 3 ) Z 3 i } O p( &2 ) o p ( 2 ) _ & 2* 3 By Lemma A3 ad (3.3) we have $ i Z 3 i +*Z i O p ( &2 ) o p ( 2 ) _ $ i * 3 Z 3 i +*Z i $ i Z 3 i +*Z i, (3.5) Z 2 i, Z 2 i. Plim Z 2 =Plim & i = $ i Z 2 =Plim & i Z 2 i g 2 (x) d4 0 (x) (&F 0 (x))(&g 0 (x)) <, where Plim deotes the limit i probability as. Therefore the last two terms i (3.5) are egligible. As for the first term there, we see that it coverges to a / 2 () distributio i view of Lemma, Lemma A3, ad the Slutsky theorem. Thus we have as &2 log ALR(% 0 ) w D / 2. K () 4. COMPARISON OF TWO VERSIONS OF LIKELIHOOD I this sectio we examie the differece betwee the two versios of the likelihood EL ad AL as defied i (2.4) ad (2.5). We shall prove that if we replace AL i Theorem 2 by EL ad everythig else remai the same, the likelihood ratio statistic &2 log ELR(% 0 ) still coverges to / 2 () as.

CENSORED EMPIRICAL LIKELIHOOD RATIO 77 Defie ELR(%)= EL(4*) EL(4 NA), where 4* is give by the jumps W i defied i Theorem. Theorem 3. Suppose all the coditios of Theorem 2 hold. The &2 log ELR(% 0 ) w D / 2 () as. Proof. We shall prove that the two likelihood ratio statistics are asymptotically equivalet i the sese that their differece goes to zero i probability. By (3.2) we have & &2 log ALR(% 0 )=2 & $ i log(+*z i )&2 $ i *Z i +*Z i, where Z i is defied as i (3.). O the other had, we also have & 2 log ELR(% 0 )=2 $ i log(+*z i ) Observe & +2 & &2 (&i+&$ i ) log(&24 NA(T i )) (&i+&$ i ) log \&24 NA(T i ) +*Z i+. (4.) log \&24 NA(T i ) +*Z i+ =log \ &24 NA(T i )+24 NA(T i ) By the same reaso as i (3.3), (3.4) we may expad log \&24 NA(T i ) +*Z i+ =log \&24 NA(T i )+24 NA(T i ) *Z i +*Z i+ *Z i +*Z i+.

78 PAN AND ZHOU =log(&24 NA(T i ))+ 24 NA(T i ) &24 NA(T i ) _ *Z i & \ 24 2 NA(T i ) +*Z i &24 NA(T i )+ =log(&24 NA(T i ))+ &i+&$ i _ *Z i & \ 2 $ i +*Z i &i+&$ i+ ' 2, (4.2) i where ' i *Z i (+*Z i ). Substitutig (4.2) i the expressio of &2 log ELR(% 0 ), we obtai $ i ' 2 i & &2 log ELR(% 0 )=2 & +2 & $ i log(+*z i )&2 ' 2 i &i+&$ i. $ i *Z i +*Z i Therefore where & &2 log ELR(% 0 )+2 log ALR(% 0 )=2 ' 2 i &i+&$ i, & 0 ' 2 i * 2 &i+&$ i & Z 2 i &i+&$ i. By Lemma ad Lemma A3 we have Therefore * & 2 Z 2 i &i+&$ i =O p () o p ()=o p (). &2 log ELR(% 0 )+2 log ALR(% 0 ) w P 0 as. I view of Theorem 2, we have &2 log ELR(% 0 ) w D / 2 () as. K

CENSORED EMPIRICAL LIKELIHOOD RATIO 79 5. STOCHASTIC CONSTRAINTS AND IMPLICIT CONSTRAINTS 5.. Stochastic Costraits Some applicatios, specifically oe sample log-rak type tests (cf. Aderse et al., 993, p. 334), madate a radom weight fuctio g(t)= g (t) i the costrait. Also, i order to obtai the mea from the itegratio of the cumulative hazard, we eed to let g(t)=g (t)=t[&f (t&)], agai a radom fuctio. To accommodate this, we allow the fuctio g to deped o the sample (of size ) but require that it be a predictable fuctio with respect to the filtratio that makes 4 NA(t)&4(t) a martigale. For example, the filtratio F t =_[T k I [Tk t]; $ k I [Tk t]; k=, 2,..., ]. (5.) Furthermore we require that for some oradom left cotiuous fuctio g(t), we have sup g (t)&g(t) =o p () ad sup tt i} g (T i ) g(t i ) } =O p() as. (5.2) The weight fuctios for the oe sample log-rak test ad ma ca be show to satisfy these requiremets. The stochastic versio of the costrait is therefore g (t) d4(t)=%. (5.3) The % value may also deped o. For example, if we are testig the hypothesis H 0 4#4 0 the we should take % =g (t) d4 0 (t). The empirical likelihood ratio statistics for the stochastic costrait is defied as &2 log ALR s (% )= sup[al(4) 4<<4 NA ad 4 satisfy (5.3)], AL(4 NA) where the umerator of the ratio ca be computed similarly as i Theorem with g (t) ad % replacig g(t) ad % there. Theorem 4. Let (T, $ ),..., (T, $ ) be pairs of radom variables as defied i (2.). Suppose g (t) is a sequece of predictable fuctios with respect to the filtratio (5.) ad satisfyig (5.2). Also assume 0< g(x) m (&F 0 (x))(&g 0 (x)) d4 0(x)<, m=, 2.

80 PAN AND ZHOU The % 0 =g (t) d4 0 (t) will be a feasible value with probability approachig oe as ad &2 log ALR s (% 0 ) wd / 2 () as. 5.2. Implicit Costraits For the implicit fuctioal costrait, we require that (i) g(t, %) d4(t) (5.4) be mootoe i % for ay give cumulative hazard fuctio 4, ad (ii) g(t, %) d4 0(t)=C (5.5) uiquely defie the parameter % 0. The likelihood ratio i this case is formed similarly. For give % we first solve the followig equatio to get *, & g(t i, %) $ i &i+ _ +*($ i g(t i, %)(&i+)) + g(t, %) $ =C, (5.6) where C is a give costat. The ALR i (%) is defied as the ratio of two ALs with the umerator computed as (2.8) with w i = $ i &i+ _ +*($ i g(t i, %)(&i+)) ad the deomiator computed via (2.8) with w i =$ i (&i+) as before. Theorem 5. Let (T, $ ),..., (T, $ ) be pairs of radom variables as defied i (2.). Suppose g(t, %) is a fuctio satisfyig (5.4) ad (5.5). Also assume The, 0< g(x, %) m d4(x)<, m=, 2. (&F 0 (x))(&g 0 (x)) &2 log ALR i (% 0 ) w D / 2 () as.

CENSORED EMPIRICAL LIKELIHOOD RATIO 8 6. SIMULATIONS AND EXAMPLES Notice our results i Sectio 2 reduce the computatio of the maximizatio to a sigle parameter *. All we eed to solve is the costrait equatio for * ad it is mootoe decreasig i *. A Splus fuctio that computes the empirical likelihood ratio described i this paper is available from the secod author. Example. For a small sample simulatio, we geerate the cesored survival data from the followig settig Survival time distributio Cesorig distributio Cumulative hazard fuctio F 0 (t)=&e &t G 0 (t)=&e &0.35t 4 0 (t)=t Sample size =20 g Parameter % 0 g(t)=e &t % 0 = 0 g(t) d4 0 (t)= The 950 cofidece iterval for % 0 ca be costructed as [% &2 log ALR(%)3.84]. Each time we compute &2 log ALR(%=) ad check to see if it is less the 3.84 (iside the iterval). I 000 idepedet such rus we recorded 947 iside for itervals that are supposed to have a asymptotical omial coverage probability of 95 0. For the same data the Wald cofidece iterval based o the NelsoAale type estimator results i 920 iside out of the 000 rus. Example 2. For a cocrete example we took the data of remissio times for solid tumor patiets (=0). These are a slightly modified (break tie) versio of Lee (992, Example 4.2) 3, 6.5, 6.5, 0, 2, 5, 8.4+, 4+, 5.7+, ad 0+. Suppose we are iterested i gettig a 950 cofidece iterval for the cumulative hazard at the time t=9.8, 4 0 (9.8). Hece % 0 =4 0 (9.8). I this case the fuctio g is a idicator fuctio g(t)=i [t9.8]. The 95 0 cofidece iterval usig the empirical likelihood ratio, &2 log ALR, for 4 0 (9.8) is (0.0024,.097). O the other had, the Wald cofidece iterval based o the NelsoAale estimator ad Greewood's formula is (&0.063, 0.882). Sice the cumulative hazard fuctio is oegative, this shows that the empirical likelihood ratio based

82 PAN AND ZHOU cofidece iterval iherits some of the advatage from its parametric cousi. Example 3. For the implicit fuctio example we shall look at the data of Australia AIDS patiets. The descriptio of the data ad some aalysis ca be foud i Veables ad Ripley (994). We shall take the 780 cases from the State of New South Wales ad igore other covariates, i.e., treat the 780 cases as i.i.d. observatios from oe populatio. The implicit fuctio we illustrate here is the media. Sice the media may ot be uiquely defied for discrete distributio like the empirical distributios, some smoothig or other modificatio may be eeded, particularly for small sample sizes. However, those modificatios will become egligible for large samples. We shall discuss the discrete distributio i aother paper ad igore the discreteess here i this example i view of its sample size. Aother aspect of the AIDS data is that it has a lot of ties i the observatios. Sice our formula developed i this paper assumes o ties i the data, we shall break the ties by subtractig a small amout (0.0000) from the successive observatios. This is equivalet to assumig that the survival time of AIDS patiet is a cotiuous radom variable, ad ties i the data are due to roudig (to the earest day). We therefore suppose the distributio F 0 is cotiuous ad the media is uiquely defied for F 0.We shall take g(t, %)=I [t%] ad costrait g(t, %) d4(t)=log 2. The 950 cofidece iterval (434.8, 492.8) for the media of the AIDS survival data is obtaied as [% &2 log ALR i (%)<3.84] with the costrait g(t, %) d4(t)=log 2. The 0.8 i the cofidece iterval is due to the additio of 0.9 to the origial data by Veables ad Ripley ad my subtractio of a small amout to break ties. APPENDIX Lemma A. For ay radom variable Y, if E Y k < the for a i.i.d. sample Y, Y 2,..., Y that has the same distributio as Y, we have max Y i =o( k ) a.s. i Proof. See Chow ad Teicher (980, p. 3, problem No. 8). K

CENSORED EMPIRICAL LIKELIHOOD RATIO 83 Lemma A2. Let (T, $ ),..., (T, $ ) be i.i.d. pairs of radom variables, where each (T i, $ i ) is defied by (2.). Let also T *=max i T i. If h 2 (x) d4 0 (x)<, the max i $ i h(t i ) - (&F 0 (T i ))(&G 0 (T i )) =o(2 ) a.s. ad $ *h(t *)=o p (), where $ * is the idicator fuctio correspodig to T *. Proof. Sice h 2 (x) d4 0 (x)<, we have E F0, G 0 $ i h 2 (T i ) (&F 0 (T i ))(&G 0 (T i )) = h2 (x) d4 0 (x)<. Therefore, by Lemma A, we have max i $ i h(t i ) - (&F 0 (T i ))(&G 0 (T i )) =o(2 ), (A.) with probability as. The fact that $ * h(t *) - (&F 0 (T *))(&G 0 (T *)) max i $ i h(t i ) - (&F 0 (T i ))(&G 0 (T i )) implies $ * h(t *) - (&F 0 (T *))(&G 0 (T *)) =o(2 ), (A.2) with probability as. Let H 0 (t) be the distributio fuctio of T i, where T i =mi(x i, C i ). The &H 0 (t)=(&f 0 (t))(&g 0 (t)). If we ca show &H 0 (T *)=O p ( & ), (A.3) or - (&F 0 (T *))(&G 0 (T *))=O p ( &2 ), the it follows from (A.2) that $ *h(t *)=o p ().

84 PAN AND ZHOU Now we show &H 0 (T *)=O p ( & ). For ay =>0, there exists M 0 >0 such that exp(&m 0 )<=. For M>M 0 cosider P \&H 0(T *) & >M +=P \ &max i H 0 (T i ) & >M + =P( max H 0 (T i )<(& & _M)) i = + \&M exp(&m)<=. Therefore &H 0 (T *)=O p ( & ). K Lemma A3. Uder the assumptios of Theorem 2, we have, for Z i defied i (3.), Z 2 i = $ i g 2 (T i ) (&i+) 2 = g2 (t) Y(t) d4 NA(t) w P g 2 d4(t) (&F)(&G) (A.4) ad & Z 2 i &i = I [Y(t)>] g 2 (t) (Y(t)&) Y(t) d4 NA(t) w P 0 as, (A.5) where Y(t)=I [Ti t]. Proof. For (A.5), use Leglart's iequality o the itegral to switch to a similar itegral except with respect to 4(t), ad the use uiform covergece of the empirical distributios to fiish the proof. The proof of (A.4) is similar. K Lemma A4. Uder the assumptios of Theorem 2, we have, for Z i defied i (3.), - \ Z i &% 0+ =- \ g(t i ) 24 NA(T i )&% 0+ wd N(0, _ 2 4 (g)), where _ 2 4 (g)= (g2 (x) d4 0 (x)(&f 0 (x))(&g 0 (x))) ad % 0 =g(t) d4 0 (t).

CENSORED EMPIRICAL LIKELIHOOD RATIO 85 Proof. Notice the summatio ca be writte as a itegral g(t i ) 24 NA(T i )&% 0 = g(t) d[4 NA(t)&4 0 (t)]. Now the coutig process ad a martigale argumet similar to Aderse et al. (993, Chap. 4) ca be used to aalyze the itegral (sice g( } ) is left cotiuous, it is predictable). A applicatio of the martigale cetral limit theorem will fiish the proof. K Proof of Lemma. First we otice that if we set *=0 i the costrait equatio (2.0), the jumps W i reduce to those of the NelsoAale estimator, implyig that %=% =g(t) d4 NA(t) is always a feasible value, i.e., % # V. O the other had, otice that the derivative l(*) * =& & Z i $ i g(t i ) &i+ _ [+*Z i ] 2, ad whe evaluated at *=0 we have l(*) =& * } & *=0 Z 2 i. By Lemmas A2 ad A3 it coverges (i fact almost surely) to & g 2 (x) d4 0 (x) (&F 0 (x))(&g 0 (x)). The itegral is positive by assumptio. Therefore the derivative of l(*) at *=0 will be bouded away from zero, i fact l$(0)'<0 at least for large. This implies that if the legitimate value of *, J, covers at least a ope iterval of legth o p ( 2 ) for all large cetered at 0, the the feasible value of %, V, will also cotai a ope iterval of legth o p ( 2 ) cetered at %. Sice % &% 0 =O p ( &2 ), this will esure that % 0 will be i V, i.e., a feasible value, for large. The fact that the legitimate value of *, J, covers at least a ope iterval of legth o p ( 2 ) for all large cetered at zero ca easily be see from the defiitio of J by oticig that * =o p( 2 ) which ca be proved similarly to (3.3). The argume t for * is the same.

86 PAN AND ZHOU Now we tur to the asymptotic distributio of the solutio * whe %=% 0. The first step is to show that *=O p ( &2 ) where * is the solutio of (2.0) so that we ca use expasio later. Recall the defiitio of Z i i (3.) ad its boud (3.3) We rewrite (2.0) i terms of Z i 's as max Z i = max Z i =o p ( 2 ). i 0= l(*) = } % 0& & = } % 0& & = }\% 0& Z i & +*Z i } Z Z i + * & & Z i+ +* Z 2 i & +*Z i } Z Z 2 i +*Z i} & * + * max Z i Z 2& i } % 0& Z i}. (A.6) The secod term of (A.6) is O p ( &2 ) by Lemma A4. Now we cosider the first term of (A.6). Sice & Z 2= i Z 2 i & Z2, by (3.3) we have Z2 =o p(). Hece by Lemma A3 & Z 2 i w P g 2 (x) (&F 0 (x))(&g 0 (x)) d4 0(x), (A.7) ad it follows that * + * max Z i =O p( &2 ), which implies that *=O p ( &2 ). (A.8)

CENSORED EMPIRICAL LIKELIHOOD RATIO 87 Expadig (2.0), we obtai 0= = Z i &% 0 & * & Z i &% 0 & * & Z 2 i +*Z i Z 2 i +*2 &2 Z 3 i +*Z i. (A.9) The last term i (A.9) is bouded by ((A.8), (3.3), ad Lemma A3) * & 2 Z 3 i * 2 max Z i & Therefore we get a expressio of * as Z 2 i =O p ( & ) o p ( 2 ) O p ()=o p ( &2 ). By Lemma A4, as *= () Z i&% 0 +o () & p ( &2 ). (A.0) Z2 i Z i &% 0 =- \ g(t i ) 24 NA(T i )&% 0+ wd N(0, _ 2 4 (g)). Thus by the Slutsky theorem ad (A.7), as * 2 w D / 2 ()\ g 2 & (x) d4 0 (x). K (A.) (&F 0 (x))(&g 0 (x))+ REFERENCES. P. K. Aderse, O. Borga, R. D. Gill, ad N. Keidig, ``Statistical Models Based o Coutig Processes,'' Spriger-Verlag, New York, 993. 2. Y. S. Chow ad H. Teicher, ``Probability Theory,'' Spriger-Verlag, New York, 980. 3. B. Efro ad I. M. Johstoe, Fisher's iformatio i terms of the hazard rate, A. Statist. 8 (990), 3862. 4. R. D. Gill, No- ad semi-parametric maximum likelihood estimators ad the VoMises Method, Part I, Scad. J. Statist. 6 (989), 9728. 5. E. T. Lee, ``Statistical Methods for Survival Data Aalysis,'' Wiley, New York, 992. 6. G. Li, O oparametric likelihood ratio estimatio of survival probabilities for cesored data, Statist. Probab. Lett. 25 (995), 9504. 7. S. A. Murphy, Likelihood ratio based cofidece itervals i survival aalysis, J. Amer. Statist. Assoc. 90 (995), 399405. 8. S. A. Murphy ad A. W. Va der Vaart, Semi-parametric likelihood ratio iferece, A. Statist. 25 (997), 47509.

88 PAN AND ZHOU 9. A. B. Owe, Empirical likelihood ratio cofidece itervals for a sigle fuctioal, Biometrika 75 (988), 237249. 0. A. B. Owe, Empirical likelihood cofidece regios, A. Statist. 8 (990), 9020.. X. R. Pa ad M. Zhou, Empirical likelihood ratio, oe parameter sub-family of distributio fuctios ad cesored data, J. Statist. Pla. Iferece 75 (999), 379392. 2. X. R. Pa ad M. Zhou, ``Empirical Likelihood i Terms of Cumulative Hazard Fuctio for Cesored Data,'' Tech. Rep. 36, Departmet of Statistics, Uiversity of Ketucky, 997b. 3. D. R. Thomas ad G. L. Grukemeier, Cofidece iterval estimatio of survival probabilities for cesored data, J. Amer. Statist. Assoc. 70 (975), 86587. 4. Z. L. Yig, Miimum Helliger-type distace estimatio for cesored data, A. Statist. 20 (992), 36390. 5. W. N. Veables ad B. D. Ripley, ``Moder Applied Statistics with S-Plus,'' Spriger- Verlag, New York, 994. 6. M. Zhou, Some properties of the KaplaMeier estimator for idepedet o-idetically distributed radom variables, A. Statist. 9 (99), 22662274.