From dfdd2de99ced45424131915a036a0eb2b96b2ca6 Mon Sep 17 00:00:00 2001 From: Sagar Sumit Date: Sat, 2 Apr 2022 00:19:23 +0530 Subject: [PATCH] [HUDI-3225] [RFC-45] for async metadata indexing (#4640) * Add RFC for async metadata indexing Add more details * Add changes since last discussion * Add another race condition handling * Update rfc --- rfc/README.md | 2 +- rfc/rfc-45/async_metadata_index.png | Bin 0 -> 39199 bytes rfc/rfc-45/rfc-45.md | 376 ++++++++++++++++++++++++++++ 3 files changed, 377 insertions(+), 1 deletion(-) create mode 100644 rfc/rfc-45/async_metadata_index.png create mode 100644 rfc/rfc-45/rfc-45.md diff --git a/rfc/README.md b/rfc/README.md index 5ec12dc66..4d8aba380 100644 --- a/rfc/README.md +++ b/rfc/README.md @@ -68,7 +68,7 @@ The list of all RFCs can be found here. | 42 | [Consistent Hashing Index](./rfc-42/rfc-42.md) | `UNDER REVIEW` | | 43 | [Compaction / Clustering Service](./rfc-43/rfc-43.md) | `UNDER REVIEW` | | 44 | [Hudi Connector for Presto](./rfc-44/rfc-44.md) | `UNDER REVIEW` | -| 45 | [Asynchronous Metadata Indexing](./rfc-45/rfc-45.md) | `UNDER REVIEW` | +| 45 | [Asynchronous Metadata Indexing](./rfc-45/rfc-45.md) | `IN PROGRESS` | | 46 | [Optimizing Record Payload Handling](./rfc-46/rfc-46.md) | `UNDER REVIEW` | | 47 | [Add Call Produce Command for Spark SQL](./rfc-47/rfc-47.md) | `UNDER REVIEW` | | 48 | [LogCompaction for MOR tables](./rfc-48/rfc-48.md) | `UNDER REVIEW` | \ No newline at end of file diff --git a/rfc/rfc-45/async_metadata_index.png b/rfc/rfc-45/async_metadata_index.png new file mode 100644 index 0000000000000000000000000000000000000000..cc044d6c8f3fabafaf0b51997d8c7b1548213a8f GIT binary patch literal 39199 zcmd42cT|*3(>I7BC{aNX1_cxb$-yCK5D7!hLq@`ov*a+SZ-^$|9v#b>IJ9rN#jg@mF67FJa>!L>0C=J}PWtePSgme(6BtWN=0SeKZlPn%d+ zuAEp{+aIv71ixZokvpa~s|sTX0aO$;r0(wS4h{~cr>DESyR)*gzJLFIetwQXAPfu) za&mI6udnUw?EL-xo12@brlvwdLbkWJhlhu~yu9qp%@8mz3oElPQJ!|zW@~F}ySuww zTwLe%Wp^!AKb8x7HcJuF9@FT=gJ0Qqzl*G_tcHe$uCA_5PfwfUBCne&lFP!n){48E zq8b_+!o$O*rKKSd$nx@XM@L79n~T>M=iwi5$;ruug@v=TvlWrSaXEn@{?0?+BO8kS zI_iV2whFqFo*Y?CEoi((~?40vmJj);uV@X;2nVF@= z#>SeOn$FJ7+}zx#s3<;R4;6KDc_k}2qRraYuDK%gbTubA#y3$=bnkY z%zLa594cAN1D>O-&POaPij=>?x596o0l;UcUU^f}H_mM9H=YN~K*R-J4 z$6Nn&%Cx3v)0O2W)N^ z-2=9FGiHt9ESY$A3Q2V(}qIs^^NJFNSM19-9x-;@xQ#rmbUVUyD>kSMgN-<8OOuH{ zfp{j#f$L31tnJ8dHaEQZxRMmX{ig}Gh{tA~ns#Y9zuXsiViuQVISXTom$%&)xndT1 zSj+QO;o;0JcME$(ejLP6Q_0=$GBh7~ue)=_CB?tLkYk^Dl-toXpQ#%fR+1elE{G0e zIkZ39HEJ;j`b9eTCrmFor8}K07$NO}em5dH%hvexOSl~l*O7~UcQ=c7FBUWZd_H5(1KK?0r;fiDWnYU%2LweqP>Q50;w!rSG37#juh&xoZ3I z9pu^7-r2>?V##D#VM4jX%Iv%TnICK7;BL-h;3~`X-_8>AuV{LoL4qb^LmnrRKtL5Y z5+g|_Fa0>j>ZwiH(^;?4=K^fdQ?_Ew^IA#n{`PyX!0@ZVjw)12_=>Df~B$P z2L(+pd8B>mul2CNo`kK7H3z`)cyhYYZM}|Df-=}FVRcXT=6ur>#gGAyzoyb*8hQAT zw_Wnf>m4$MU&jspXkd^?_hA%5U5PSr%xcc(wTM(bbi(nA0a<&eb9#-nG-fzNR= zk|{J0@+Cfj~mYqx-{3^yRAA6{l$AFV27KW%!;ZJy}Gg__4Zgt(Ub&x?_*5 zBh~H+%RgWrSNq~NfG4E}UjA(6Dw)2{Q}iz9hwM%4Y}=<>-qoghdFT7|Od0TRps!Fs zj8Tly^JkEz_<+k&-KvUtt&COuukVy4G6n%h98~y7BIh-G_0hy;P?(f0b;#to~ z^l%Pm+@8I8#zyrsfei7+t|1Gk0#91;Tw}vNBsfHWS!Paz_xj%LUd6(FF6Z}to+4-W z7`LMD=U+Yyz2}aQ#oLUz+4%FS2*{R4RxF~RrB9onVoW=p>BnaOHm4>NLk#eLBsNz* ziTo~i*fu0~ckR7=ReKZZbyF`mLk=f~aFyOVl&tSKeb@RcZX*}{=COu&#*127uMwS1 zrwR$MdDKQi17R;mF9$(b$<{W|Yz}#HP=FG|n?29B4k{kmXOU`HFTs~8<0%IEi*rYX zUoJ{22Ul4Ih5d*f2Z+HElNJUPGJ*2p|3Rkc3M~3O1DP?SOlw1Gd&D;HD0SZK$`Zj8 zniHSP;}b5OsJy|+J}yPeIx-FkhKqyEBmQHwzTCSV-0S(mj8*T&JVDeVRsVRjZEJv6VYJ9mb; zz>JS#C?ur9Vbec0(!FV2!cJWy$1!CS~c(?AI2sxx;tZ-}tEW z$+lvH*Pk~-Eb(M9l~nlDEo(pPOFbRK6Qu}5>)FyZNkgJcuehzhvX7jAKfN!QByv$t zv>cg}V#JF#oA5C7M1Q*ea$TL-*w?g+D2=~faK{bL^Vv8vDOq`J>m&c5ntiDluq|xd zY4_7y{-kyE_}$3wDtIFmS;#iE2+v~LY`dijq-^4m=i>&aqMRt4zxts34}Azs-p_Y^ z>$ew8_xfUGF}y<3YIelru)OSfpu_RCVeQaqnzSDwq+&i(ypB<&Hozx`D(zMDp>fL} znZBiiamrH{`3Xs|0v6&s>%SS@$p~m|AG`D2cJsU2yS+NGYAby&r?%gcgptVVp!E>( zJ>c5e!n>vK0=lu=%GJHQ(3X9Oe&&<_0Z&vg8io38>=pV|D6CzO$bk|WM1tXS(_*r- zo)%FEW22CtdO>L&X{D0E!Dkz`fT&JGRm5vLG7`gb;#E^3$tgny?B#?`n#-6kf$~(@Kf4?(O2J8dLQb#7^ zXa@yARl1iLPv~H+GH+H;{!Iujf?7hr#ahwSM*pF0ih!OaOkklPMMRkm_BG9jSRnBw zxab2WTspl6MKBx=Z3*X@ZNyYBrWc06$Jh-rPBg+_{vQKgT=^&L;d*acND5lh{0ij zT3-luxlYtjlAiYRWvYmnCv(lp>>pU>%~PFD0f3zcj{x%b%RixQIg&->06AynM+U$6 zpPA&^tlTb*jO=X6JwAF|E+=3c(I=<68G0&0;F5{fpWMzPF`c%c|x_KCMt zH-F^zSyW>IhcXU%_Zxs|(Uo`GcFnG`b%YEI3=ZS#x2V(8qd3!>5wQgGce+6hLm7KA zh$XYD$XYyQ8arg<{e)4Furb=vKsTt3duKn>0kE4Mr5lu%lS4P(?&RU%u=joC3Ny2D z;(u0uQVR2rA(nPW7O(-wC;n~oS6#Y6W7pLcLNfJTqjPf3;^PKYid%lCVMBgS-ymM{ zvqMqi#-e8=G=ARQ57Ub$RL1GKW?or4!vx0C=)8604&Muh@ES#r8AGk!AHshcx!3-l zFR&49>iJWkaYN>Fo9^KwH(TMpwbwy4PT5yh+5vsc2zJ6mKOZ6RUeL>z`sgyCd%D!b zt8EWox8RE7ka{Dgj)RCH-$BGu09!{1zIM=#dqsg>Fa!2QqtMS?-!txN+{wM7N8Zkf zY#rE8toS7r#OKGz^$%kTTA5JU0gj$4j?UQ|?4;VO9*2zG4HQRp^VRR)TOX6&mQ(fW zJ34O=?clDp{wZKM%=Dc9!rLK<27Ue>FmZSNO`3{i@qCS7`;4`H_*5X!#z=r5r;^?RW;7~)@ z^XB~ZntUIJJM@j9u)Y^8iMBiw7+CXZqU=w*>QE8zs-iReM#vVh8hrcvx5o#-!d%w7 zjgmO*qUY{YAYq}n*J}qi1st^K7@kjPGxU?mXm0wZ#R_=telqJfgNyOSf`Q-`p~u>M zy&&PK*ucKg(VR5z8Bj=Y@K49ou(q`MV4%|63d~YzM=RdQ3{1;RrO?OW(D@Ob30W?i z^N|SKk9U5UAbid^X`d_V1d9#c>$O3PSZGi%dBdik>ih94x-~}UZw?n}EKZS_Kk{6;&-I~M9u$l76J2cTv_J7VUI-A+t{+#7{h3Ps7$re~<=jjgJFOC>YojY){I zNR`_B{j?7!UR6n3#7xO+K@0-T^)1ei7eUHWO$!MMQf=W2P$n8_12yKgqIajVnESJe zdp6ZCF1=G#cS|CUlZiD{!R)1W_T>irdw*)`@nh( z)T)VC8i;}NtA24VIR4A(p*F6`2Y}(|87xs*BvL2nAvW4kyAn!~Vx*+Bxp(lufdd5p~j(co(seGnjj_VCW!6 zFzza5>aAzS)fRmD7Ok0b%Oc+B4vth&g@sUzrW1~dXX5NE{c`{4Qros zvV=~CC}HC78h-jv)(+Hk9eE$xtgN*H=}2R2)z78gpaImG_%rs}24cS3$*a(l=-p##k|bz}P24d;jfeojo@&e3g$4W8FPLIxYalIgmXNRR?{ckFuvO3H^J<$7^ z$h?GA&ieS38})R$Y%j>3bV=%aX34mIjJr@dHBEHsaY4t#8kr@dXS(zbwnuJfE#y>| z=^8a)0-wSvy=fDmdB^3>uNBNlZDAu z07HgyEn$VyV11Frjkq%Xjxrg_0+1fML7%cRXmq=a**P&NUody&ktKsMz^v-vcx{A4 zk%s?e9brLBv?o?&X+$dZAJJ8=BRyOu^|PSF2sUT7Ul7&@L*r&AH8xKZRw;+%HPw}m zU!MK(w>l&+%GBr5fhLQr>QGz1*_ITEp*2hr5!S>p=b1FK9I)UKIXSLGqNHN4F9I#P zY1oDSiICd%Zf`uo?ry%G&1-1XNm|QB4f>eoSyAs-X^}e&TI|e%S zXUGP5t2-U-J*pGbgNb=qIUXo0dn}du-^9sWsq#Q$k}Z53@1$Dq9Sx$6_`L8}_M7#t zAK%RssCHQh*PbUE$^}R5h%-R-41(dxY%w(-JzSI@-$(Mzv3lL@Z>}x0^%UGkua_jLD3^v(sk-SwPLurY#(%~eUY zakX(*$yXBtyT-G3fJt)M-g5@`Fi8t5OFCo$^a}Y3$;w#n%!ER48+-ABbAdFy270FK zBC25PtVe0dc9G2BOk5wfU-SJBwxX+3eX^?MJtRxCxpiT}3|1wdzu z8yZKooutsaOwslfpcvbT!OZ_8J@+nfE}#-l2_NU#beJyE{P<<}{mzA|<*L(nsFPe+ zw~3u3c|4t%FpEe9p4WBMr}vKpm3?wcQSAdqwX#3`x>GDNBF4ucjxD!NxFZ zVfzoY99}X^oAE*J#3$r$m&86;KvK9C4dn1#n}<3iOa|S#o-(3-F8>GM&~P>%e)4*K z1iXCWT~s$kqKnGLyyUI)XCB zl$1_5GX4$gWC7!}C}d!v2kAOab)580L%#W@1Ev!~@tqGA(s~>ZF1l*A5XMVRCxhQC z(qemiybCscNTW-#k2?7k!b+r=LMQEUXe}=`QkY5Ee(>C3utA-8GrLN;n$Y2}5gVx< z2jOy2*2|-z^$uv~xR%eCDo>w}#9Zkcln_4+m-62AXbhF2gw-E2>+7rR?kV{%d97xa znsPg9wrW2}3osSHLyX@rqi~vbq(>J0c==XW09^0>ol2`|LlRA!$cU$nh_j{Yre=? zt95t}FH|B_+{txxSQiz{m-rL{EI1K6*IVR{L7qHxjl-uFEWh1%xQqRK{pT-=m#+q| zi~Qc-8Tod6KPW_4v&l=9ZRp^J*9KpckJ2s!5w&+={Q+(_so&!+Z2k18P2$x^t%Y-% zwi3M+9(s3lFbkN-ues2sS_`09!{sQhabl7oWW^(c?9H>z5Jos3wdHk<3%TaJ&0nK& z;vrIJL|u}d9PhOi+_u351V+du#%32B^%uUoc#dxG*T3jPO0-DlIXZjN-;egM(;rth zIbb@1@o|NU6+K)03|rr106h0Nn&Qk)eBhKfU2X0+CS`0#mL~NCvSETxZ~VIHBD#q7 z7V|Si!d+EamK@*XOZ^g36PPIl{n*x7Wv!xf59F~Q!kdA;Itvgf0*T}SBL}S~z`0%l zyB~yt|7POsg31;jr!y(>mgQ<-JIheICJ%ID9oIS~d$x#c~?_=N)n-1k=WO$ zugYMLmz_V6moHZ-QmTB`0T92-g7K&l;jx_{vgd^617GhQd}Kftln^litJ?b>=f&IR zhG#Pbq@54skGwhA*AX_W3$&(_`>0WqI8aC8DXi~GAG$QNupdQ8pTtl}crjgC4(*MU zpi_<92oo?ZTUWHFZ(_5X%IFMEz6OhY`~rOkwiS8)m8=^=x?Cda>9$c0d>`a&Gczd^ z4-C&^L}BoHRmyU$Nzf|p1wCKlwrh@V?mF0>K2{Tc_w~W*?T)v%Ooi2s7$O0F93IqR z2XY&8Qi5lle1}NX!l(RIRNz(OV1;GUZHHbC_5CP`8kY@4L;Bc7g*3t0Ju#BUG{Ph@ zV}A^XE~8jSwjE|mN#>+iiUm=X@F^8iMzD@&PodLSoX^+rloH&sI39;XX1tRIb*a_7 z%*$Tvs*h<9-=+jFRg2-)wUGRsqIH&^_>8CsWqkU#p(Sq=LG1XeiLEM5KBlMWuVBnq zuWp~J!AC`QflbPOw{Z%OTeB>Si~bPpDn1@B{9yE;t|&fN0JF}YGJ=_2qd%>)VMgzB z_rRluQ)CV4o659Q+gPX{U(VG`awCeLxOls@Fg2!?D25kIQ5jy+A>qALgUo{r0D}0= zS5u%;pQGo8@`fJpm9FUuQ=s|;*lk%3Zop8)lk7Fk7~XB#M0G-!+-``@0}7*k|HeWp zaJ6?kS6Q9}7xWC|W5=V}G^P@(nxwi$N#=3Fo+w)9vu0gvZB5@90gMoFwweagh=vab#7`imBEjzy+bmA>y`PelF%O4^#>R>yl^V?q{MnmdPOqc zOLx$y9Zg6ctfoJCyt+;^v2(O9Q#Xh4ik)#@)Ak0Kk(m(gHqcdq9MPe?DEsF4II|!z ze*?TXt#1Rp6huvmbY|4#SY*oLU1 zwE}y9;*bQUYY!!7YcVLk^JV#agmG6&LyVELQ^}riX7WmR!M^ej$;S&~|e7&;DItkpy@Uk5!T#7usJO)s4bC zlxH^?6by&O2XZFRoDsZLjz?US`O<5VPF9tEEBB<6Y{Zb`WHi-sZ&s_;OW zGSe58A#;2a4PgZjcrFi_!E~8}ZY>jv{22l8uacz^I@6jL&KP44me&io_F7w%V;|>D zgFhjk=v#hMTPwF5MEg6S)zR761TVhxnF+$UWYdD_*=pjo27h@z1b(QJenyeBLfvQ$ zuI3eLi@rzy7Qd5h`2!{MONa{TQ;HpJN5VdMsdaSJE7b|7h)ugGX~{kqU{6g^s)`fAVlt zWvTHC{HypOnC*4)=T{EKithZt|LqeOx#?t0sR~Mp0dNYOWqc=2BC;qJ%*kSeXwjdB zQp8?F!2k9FcoR3Jnu4(L?s%OAl>XfK?xlic;0iwG43f^Yf%Z;|_d1S(=wF0Nm|0Nf zRtt3U;8yndPRsumO;ZQVp0f8&itE{m$s@*4)PM6Tryf~n&A+F^5p6n_rLoEVP>}Qk z+lPOPn2--7m;3ej?ZcEcrzBy)o_VcCRLp-UCSe9Hr6kKMZ^)v<@R~`U&cHdUWe*zo z5C7K9{Ed}vRn7`4W;s7D=rzIYRpc4y6sdxdi)piH{TEOtCUvd7mK~dvM3$a)-P?B> zfovW3mw)|S^P+D`ogLKRN9N8})&B?hXkZ69H1e?HuE*nS z_0H)H(1BEGX_K|cPt1BjZ&SID8Xd*Cv9mu58!4lZVKEq={ch{EA2`b~OcO44<&=>z zmS0J@_bly4_)1!;Y6D!(dGI=0PmuXxo94pA~y(v)*KOe z3e<@tRUt6w&xEk|x8;q-JN@-dqwDKL)96jz;(?^|OZISz_miS(5h^?HLJkm@OQnQ!VlGibhzEbwPUa1(HNCZW1 zG!RZagPy5LxU#Vr4dXXxnPIq7izUgjTBV_vbE03_;~1+j>QhiZt{EcwYcvt}cg+G* zk#L`r6`DKNnOhP-Xe#1!F_1j*g(7;{WLFLxsK-*xkKv@3wT}C6NI%Z$1j>5bz8UU( zfpLi6LJCSaSyrzTRvC)xQZ5ASHfjm0SKF)Rv~{Upc-rWZ*_ZV$w_fD+A>N!>Q-aFP zQg3vQkI++Y(?!mgh~W3kjfx|KmW#xgU=z_7fvm81cTc&~ZJLe9bCF>o;ToZin;y&hQ~SSfTMd z7*{}i-&4A^wD>{0VKfo)u#)6uNeV2}Mj&o*;7(Jcv5wYXV2!+IIlZv_iAi-|6nu7fdd&Rr{J`i`n zCV#kH*#|VIybVs@*BZ{nYZHPhZ7P)JCdH7?k-YA;Z__VEZDNe02`Ts@R)1d0Nq;dC zy>1G4UP9Nc6O^C{_gqWN;*9ht2(^h;PtsNyin0jp{&kf_K)6Bf?x;n#BXoqzFcm2Q zbNcg}I&C^Wmy#&^zo5P?3HOb$zq>kH zeaPZ3&A{u`ooHXO1s^}X<+NHME{zG zrR-5mXLbM5uhlXGOpjf~D3Y*dw5b}jrz;r*~ za-^Q#%iw~=_GqoJ=B@3qo@CLj^G$tLN48j!2ZBRVBlMYxS+FmSX* zBG?CFym~q{M=Jx_|DtVxQ%0uK&x7pk#Cu4Ta3~WT@Ck+{e8pNsfA5c$)y!`_d!t?z zstZ4LgTm=Qg?91*)fCSizck+?(#?wTzs@c%8p>wjmN^XXcc4mgnDybg5I-bgG@5!R zqW_AfybBUDJvrdf@D{T$9^2H>yaRu01;_;kIaIwRYH1sZ!3dOba~Ld^Rs*ffec4Bv zR!^95&gu4t^NfNj}6J@-slo2ZrEXh{)M90s(wz$D*5ER`WN`i z7<~e*B)pdJ&{5B@8qN2NjoBOyjrlURx?C{O;Eod2*4g6?!nliEuE}e)a3--lh6IUZ z)=jz9`9gNpmE$wq4c+SMpLu?A8AUE}i*`pST#9->772XM%K7;r2I{C|5g*hm@6t%P zsY=kR0vLr7jAn)r&*!C?0DKmRFFZGA_YTWdcp_`xmq;TD=%R5ACt4!2ew2}Iv9cpK zwWO6wlKXJvo@&JAP$}y|yO`)_n+x(JnXH_2-W&W_&K);)g)PZ3jEDZYC6hiRSW_Hq z-j*1%inZtW!7e7^3-+2FR6RUrexA_vW-Ij*W}nt&_8=Y~B^SZ>a6)AHeG(PbWciR% zmcrD$EkC*8+@kdj@j3Fdu&tAg)Ez0Jn(a>`)KbPGvWn19F;Vg<;`82=x~VGm2*D*! zZt1aX-amYTkd1D$XL>&3WUn={ONin5jzL3tIR@6z%YjP1#nPL?KZ+=eIeMtAi8L)P zZem~OlP!y4X-gTK8)$>Pj1*TK1G4A9oZdO_@Dzc)V; zNjaSBLm&BA(;T5P1ANgRHmV3O@#+37T>Jl!>8XYOYH>;U$**yRj-}4|>n77MCoQ+E z#kzi-WM@`Q@x$rt#)Y4XgClKyZ8(pL#rpfSeBDZg4TX_o**4MwMOD58W$r81JhmEEV3> zM*oVgBb1BBCKI&+wElK~AsNUfDHx9Cu-kr7^=DTW7JHG+Vh3Yz+Z6{3Hm|-0ualRX z@v0r6GZk(1VOyJa8DW$o2Ewi*OQh6?nCyWNUeZ?*#5O`n^$mL3-`NP=EQS$^4S-x$ z?_j_6H5YR8|CHO8RThm4cY&sg5Z^~;%c>Arc@4h1Tq7zrQVyqr;1kw=7(g&@G)N!G zE3Zd$YT@m>&T6z0s=jyt5s-8@<1f0PPbL9Gp*RFNJZUGqiXA0K0O|Q(NGMfExiBt# z*VZAQ<0YR1{e=m)>7B)f(G9IVAXyj~M);o17(jU_`=sQ-`Hj|272@-Y7c)AD&qM?K z=uC7*5Vu#$*lU?Uoxk>La%)M2c=2`lhSZ<>tX>OWWQa4}$i*~2BiYF1aWhNz zD!D=X*2qWC%O(9mAa^Y~`tkjATI97w&cRt12Z8*6=Y32dx|9^keamNKy*vkTygn>> zV3ckn)vk)pys8{!l~k`KC0d;U@sB(z(T&c3+Rf%q*`>^r_6+@NOW^eitEApmLI*-Z zncC{S538=Qz%Rc_uOu6YnIGY(C@x4n`vRu#GWHvUUJ;#j*G6Wd!Nf` za@WZ{wKm6oJL%6(GX+wE3xj(=VA@0(Ip)NOLO5NsR|;!pycEa zqu>TYVShvJp2#lM(O;--l?r5+vW$p;;gbO;2aDF(#>QfM8!7wscq8j5*7vnu)w(A1 z3Au|`RB1M<<@lO;nG)fp+U;pcD z7KG*#Zl<0Ta+oRi&n+l1lwmvKn(2dL3VAixUL52#y_g?{yO!V*m5iXhWZET4Jw~zg zg6!i;A0b;Ea1?pkj2=4d*I6OgCUs-|NbC6US1;I@+TNnSl?2Gh2bRn>tl>15w_CQ5 z1+}&(Rdn!AWR6mmk*RZv4O!}tIdM>IFKQM2UgLN8$)%_}@NskJV+}b&Af~XW|CR4E z&cvG8juu5tc#$RH4HeSv_Hsjt^{$&^sqx5@yA7B8PwEl<$B2 zSBz!zH~E#0=wG@W`5%Z85@q3L4^6UVa`eA}6uy5sl9!wq^3O6$!;E@h?b@(jN}hZ; z&zx$6iNazK`1O=T`U3P>Yrg(jL0<&4aT}Q4>5|m*>;&ffG~nw6&9U+$fw=bL>*~Rp zR5nYg*S4>bf0<3f_xd8_sUjP|ii8hS3`&~}{YHGJVigm6_yLgr;rm5becVaEg8SFe zU(t$Kk+7{Bvix#*Wbaf?ANF%CNt?m;)PP_r@eW-ao-ijS?5P`&aTM!F&a zD{{kxz@0v%l-Q|n`Kqqv)NnCWvHpV9$^2LZX4>FnQ~f4MFybC-$FsliyTa3lA3k@g ze+SG;W7F&4VhH^$Py#8w{DcD+Bg+J$u@J`ie+%UDL$NExGlAy!(405~0~{m8k2bnP zIa)ZsK=b&3G6amP4=(a%PBVo|qu>g<878zV!Gz%vV6>Z0)ulkY9>(R`|C@>y2Gyb* z6J|gU_fo`Rp(}mVRzCQe4b@)cW%*1`aVb%2NcMLS3ZN(j+B2`K4VV7T)V-sG)d|9( z(n^c*82<0^9|7cM0CvqO(CQl=;%&%c6L+h{%=L2rTcpS6ps#@ntVy$?VgnB%jhU(s zU*U6d(@jo7RqivXqviR~ud_l4h?0|mku*@on@%%@)S26FfnO|Uf3wUt1gr`1$SA&d zI8Q5gma;=rMY{D+Hw8G)?y>~GIi%@1YfT?|x=eL?%6`@owMdyfJqVj#dp)^Tn==zi zMbO0^)%xDStYaAzBW*d#kJnKzC5ms~dg`8rzD71z@8GPaIat$Sz3 zfyMyyRX5+2pS^?4%iCQ9BkBG8cKiDJ2E4C&G-(}=9~j8j1vMLx1;FRXsoc1q&Y;3o zs+kj$t3{voaiP-48iJ}jXp-*P)_DZ*^m5M3M(PHJH?Pn02I9@68d3H!5VH}_s4lqa zNh-P9JYBT=N^o-dCtKad7=smlW2A2~s4$|KE@G2ZPyl62Yh|?Q=2Z>M8emF2w6pKP zXbuD4ingq9kKej=`7iZQCEQkB=ebbFG3XD8g3x-x?ef~;lG)IgW+)n^%=A);1|p3H zMe8+OjG3|9ja5X+g4?3&ik#KWQ~S6#r>WK)B7{v(n^)x~?ZcwxRW7;`t$xSQ=*zHe zTAo}x_MT}*rizSS^g+t|c?~?x8IuhLIU=JhXLyCQcc$AZoT(jHIP%&@P1vmCEEJvF zna-_2t-5cuJ!k8rE=@)+4$*-nCy{-;_7dO-jAS~Cg?e;qpi06Qkco*N(VFXCkxApT zrrZYx4dg!?N-k&0+V}L@%-WsXW^LXi9uTb{vgq%0X&*s(-mK3m6)TSBd|Cfsx8DoW zuHUG6?z*LA7Vx0OI(J|3)JJ7(26g$!L9a(5*d0T0v0igB=R~p+OUK1?PDTMSz{O2QW)-r zcAm;=bD~*fhxbd_bSQd$i>>z9Uhk>)Y-vIL3kVJ-8K=ktj$Ildot+|}g>hfaL_T&b zPhg)Mgcas)b+P)F+_+cWsfppkD+vvlv3bp)U)UvKGOSQ66NE$qe`D1>y%|ttZKcq> zxTl3J9J=p1`XJE`wN?HCX#MWsc&eE{eE>aPk8wWrsu`W#f9_0tsNER@U0x*E%pSml z%breUy%9o z99lR#Enk($Hfp-au|2^E@HG#pF_qq%a0d#8hPb;#>jqH+aEt)a($3fR(d{$!j;y?# zwt#ak>T1G1T5cIu>^pmucx+$oGA8B{_6*NVSQ`rXNf!)lY_8kSWqg>(%OEwGQ0JGF zWb|^ex-`EShcK0nc#WL>o@4t#9gp%ldE($+h%{u^ozcK`H)JhQJm5J=#A>49Gf-c0 zdmjh0PhmCQ#W;!ro+=M!H!5|(d5!_Sf|$T|H)$*@U1lq%W&IX!f_Iwfa?-2;Ub>j& z1S-_piA{QCCaU}6Xzi94{H+lBBnk07>YX&vtSun91Cf<*+{lwkeS&^ErLX7nE*=rf zY->&pNi$b8uzBfXOn4YhlOWT*o|r#Jf(!b1p{A00d`&vCX9`Afv}T|s{P%s(hV^tj z+NzzO0blchMQ4%d-mez+0YBNbwJcJh!)!XHY?AVbXZ|q-S;Onv=;6B`1yzK0+Cg2h zF|oRCsh7oXe@_01zPTAi2>UGYnSE(K~XQ0i6M?$JSu~7E21|X^he>;Sy|ooTkNg6iR&+&46V$^Ok$0mJTzCqr*FC(9G?31 z8}!CEoL3K5L_t~)x0q!AtW1!3ZbJOo@y1-p)e=#ll;Em7dbpgPO|mydt{^#WwQ%q= zG^PO_ZT!&d5|2+^n=Pe8p547~2!YW{EImngt$kqqBwn&`W;Amx5ralYhyRohX^BKm zciMXxfLUHP>s8+7mUN*9EVgN6=>ljxYv*FZ6CQj0Gf!{p0?2*WS+K7rsb7xdeSi5u zY?b`4DO3RI(2P*gn)pb!AyP%1W}E#~P_or}CDj?#iLVg%XoM)lI~5v3!?T5>gG>6U z5xSrIeW|*5b4ux>izwpn zH8R(en~O1#rkP#QE*o>U*#^1Hi8pG72qLn45xCBSC-T{FKw&&$mHY;)IBrpEiRo$! z6r3R{i+zFP^Z|FB1$=9-J#)Pt}NfWt+ku`)O<4eoT5`cO_QN5t< zZ@ZS#Y?3mBTF8A?yEpP=@zi#U;qp(?>BdBx1J>hWD|g9Ko`n6^RBmM%hf zvHLWkgVb=nJGA0}g(EYuclpVs0@=cmrz_QvgN^~iM0OEm@&Na>0id%%q)sk+$DP%X zv=mt!wO8QUOox0$+@C2Y_pU^5#KgOt^6;nS)l5dEjL=-LqMtI+CIb|ggH$JXGZF&{(sw01(G9GWXm8J={yO|5>Cu-@pjv_ zvOQ;BA8K0W9;NH`4DsIY&o#zdnmwqKB0sSw!*=>`pCt!~ZXm3+reXEJhsl*Tp$*+j zms?4!5Li~;^vl)6Es2+3zu~ihiP66XOaPYiy@nmzKS0m=eGbD@=t%xV@VGm#k&lQ0 zE}+a9l~p#&Ew7#Hd)Md$$rgVA;6p_(@)GGtZl({z=aGb%GMHDy3YnNbM3+*Q;IdnH zN>?31R{xag@VAwO3&Yj=zUU`nJkyWkOaODCH}n|Rm332jSrdQ;=q#~FLzYK+%&x+R zZY60;ygu^WzHVwy!#-^%)}_x?H#1lHqVXTXp1ja{^{zR6j^=@uMa*j#SeAOqC3~CF z)UiQNnA{5k`H=jz&YjA^UrM|CY2C=#;?Mk?pNkp6X0y|YA@slPWux*v5=}!h^Vs}H z@dGSJOpbnY1k8oFpo(I;G8$yqrM@VcIx2owB&NO+vgU!in?82GWIqg&D0v%0y%qyp zPzE^{RIaRf@o~NoI1G2Ak<zw)eUH;mTv(nJmLZA6pd5DI0deTQ-&j|72$dmdTdS)4iCY zJ z7SIB^xANkI@&q$kgx*{)zjwE;;v4Qv>u&om1!7pN#jP`xWq-UTBWviM&@unUZ%))E zu$qt<9X6{bQMWgKJ6Xmm8N5TK#Ls!{(~v&)B=NVxFE|~r@bVA4KS@AYi-w4D{rOJekYH463QhWDe{IjE`QC|% zhW|#$d_B?px6;f#ABQmk%D-C@A#*YC^TK4z?6m|l`a=!FC^OerIJLuu4IPwW5GTb4F06<+h<+7g~1HoLB#KSCg7559Jc?rR`B$2Z*yIVa>#ur z>dksV=e0e=4;oB(yd~3;ed>B?xk(Wc;W71uzYpfAYydNtN+R_VZDUFBzh)nGooy>j zU#W8PK4Hqfp-7M=`+7M{w0-;hkx-TDGga8ZWn%ceqEQl>m&yPi4|IeM5Q_9}(EPii z$uvo!D@ED`1hIV}4+Mn;VCL9i6!7)D7u0xz?%I7$rd%ZQz9Dq%hXVQ+(L73d!RqoD z%M%rAq*1IIw*CH;F~dVqVu7Dl&F zeHfl5%=qcR%GDQbDy>^9>De1W+iV-OeV2JU;ka?z`!pakRK;LovO}QcxcYKl?64)4}x#ePl$T69Tqg-jZj96gt21Vh*$Tt@yx7v~FhU-#BPg`^*Y4$~N!n zxpY0RSd&`0N0w^zk|&~7aaxv%f~Ov1%;m|^@7$dwHDA=jyQu$)&ZHp))*zDbOFo{= zqj~MyC?Q0N^LtI=N74H)*J^J+8gX~d~R3k>sS%hVu?QlLeRmR36-6-KIG$3<@ z-$2vJ*vz=Y-6P1P;c2IJYH>HK!#DTWb+K|}eOxtH}ZYfoJ32)+D zFGvwQf?Tmme+sExsP*ZFY@@0OncUxr{ljma{pY7AB?lC2_Daqk8UfaQt5Sr??2!l+ zFzb+dPE(`tD1>Xpv<&fl$iD%5LF}t^;bHg-p@fDM5f&lj7-iKj1D^S|w7q#P!p_fl z8q#3>4BVM1HYi}G76i_`oUv*O&$qtW=CR^=R{Zo4^q~eaM7VsW!ow26!W-Mzg^(b+ zu#PH4{3yV^8CUzEWSR<9aj#x>r!=slFzUKSg{U_u;t(bOmMNNWd^zv7bz-UTUh<-5 z#Jfn~XI%m$IfzU9b@Pb*W1@kRV{WOI`KYUo(Cg{pB169nK7E!oPRU!_W%uf*?v!`- zEI7;G9x&$s;nK%{it$Wt9uI~4jv4lCOoafBS;@5kS1<7HbeMK@6xr=y$lYA5G{jk?HnCeDVnU#tAA3h@hZ6^EjI*cBF$;QjQyNS?T z+x}gJ4cD&>Q<{)6)+`t15&1j30A6aubV*?N45HGh|6nIYis84()O$A!iYpC&o|NMJ+GB7HLBKJ*X+2=x^M;RJeQ*g^L`ZT$s(LlYigVPyiP|6jY- z{eSs#c&|p~p{Hbw++ju7X)oq#e}{e$oZ?$ocAC&VtZxy|>}2l!y-)5IR-f~sY?Bq! zflIBH3a4(a=+o+bEQ&`8asRr3o{xHv3B>vY!X7pI*Hv^Y&^^rE_RRio+)4j+C;daR z0+!r;EJiL)#edyj|N7?XCpZ>XAL0LFGTI2ZzBbFu{6|3>@-U^()}5Mw48WEROn>bj z#thM_aQVb=w_?P=n!Hqj#KlT+#8AXBk68JkYA~Ead{^TuebWSQ*zNo7swyoP+IW%j zzVm)rqoWmO23N+=d#g%;0Qe56pYR?fo}VagLZ^m)n*~tEDe$<=?J}7dGoOeophg<< z;5!f9#jhZrMzmDH4@dp^>;2|f9}g~(@|gB5uP*=%yWX4Vl~*13(% zI<-=$b#q<)*S;lb?cLaz6BT+!RI>2+Fe%HyXCGlx-C(A2LbfVWn*1)ZpA9oxHo_Rw z)BXb2FP&Uun@NpWnfLOo*)$9Jdo?{J&Z!kso+`U&T~Kj=v0f~>cLOxt+{7vNR`&Uw z|1mOx!YKNG(e~D1QGU<=umuPb3Ift9yL2ca(jB{WcjuCVbQ*xtuq;bScXuNw zy&xb+zdxVv_xHzhUC(np|FPWnea@UYbIzHW^O~6ze z7)uxMr1N)wZlY1$emKe6E1F;yZx2JbjMW_}NexDiG8$gZ3yOZAo=j@as`H9#MTaUH zQk~Ju%#^M4h&^qK`cGrYIH#KthS)pjbqIL7PQv_!Pm{$7OK(~^n~JnS)5E!0nX&y# zqB^&&#k~7DdtBW7ADpY`=gy(&=%WOMc?@RA%HR=Sr)x0$VagvFm_sZ)&`luPS&y^p zoe0n&Z!aL!N$9lIrw#$m*dvvL(?#^g#xMF#@2qGW#TnZaN+3-1 z?3`D(q$2wL`{)gx-R?h8nHwoG?_D|2Mh=x(W4xYE55w`zWBl#D4SdY=ces)qn)*1-4E+-^q{4g& zRgBk=^#_(ENM3kZBigwMn&vqFi;FD?8#5?b7+p$yF zlBZ^w1=BY)vDn>7Q_jwb4KG6p1k^W{S3eD8J4dtjJ`sL>W#cDcVx5fpt!FA~QwQb^ z@-pMy{2gpre6n!UPIxk{m~=;ldWtCGWZMQ`s+VCXyl-RABt0uWvZ=hZG({1j!RKS{ zvv*)6`qlrB@IM@_ucEqIb{RG+%o_6p8?4WGu*u_)gt zT_^slyVWgM+1%@D<=&3eQI`eo_VY1xsf(z!QU`-9zN#HT#P>bz;JuKd==o-*n=&jaOzEVXxG+W+Fw}?;i)MLX^ zy_75e&e^k_#fb5DX{vbw4YX~=h*Uv4FW2Qyo!*GBp*#Vm8k+nN<(r2x0`VE}Ps-9- zY+&t`eL;}mBXvTeKYN7{aKYd~hc87y&)ORzT5`_(f^D=|`3%*>@!TQcrN#TbM!#Zf z{kn`^kO=1y@eGu$5L#B&1+3VZBpd=%5nF(y=2#{Dww)#Kr*Xv>kkj z$^sR9YC7UAl8R~j%lWuekY$NMQWm9xR>Hg%MeX-6V|GLq@6s+asj1>JmtE~XwhyhU ziEQhlDw_fRo@Z!Qoh*i0k*Ju>{V;s=p?JUiAkso}ETdshgD~cy2f{3C@CclE3M`T* ze-;TQ11iYfg?{5z3;LC+^bC4^O(f#!Sy@So<`7!O_2AX_C$1fp8J+8G#t&`Td(f0j zvakL6pt~AlR|3%T?v*p0ux+Y;Iipkyr)-AA|4g=k730NdtBZGWs#{Q%bI}u5a*e`X zTCIMo{PCr@A&9(3f4~8^f!$gBW1JBT|5qVsVy?YmfHzh__89@yab zn|sqHD|c}Mh|>x#EJgy8`YexBxi%3QeA9auQ^Je9cbw+IHeos#wB=&#TWc@&v8+eF zvW;b_$-YJy!!{MGxl9M$tSD9zW14WL>dm$O?DoN`aH5^nP}`0WqV_7QX9|<(Pi?{R zSX>!DZ4-5-J@)cgwwvl+1Z460(j~{*;&aJMJ#1(g-3p?Y4o*rVA>g2%148Eph|<1E zc!z1i7!=%S_HHQWN#sa35S)BFFQP$*PVtY&o54D=w<6(>eJapyG7wzZKYPhro14vq zCTH$SMJZOWV!_3r-u^D{^}TF&UxBc!G!A>zl8E82DvG0G;dSXj;}*1D^O*#dd##tC zF{}$`#iM`JaL?JwaqE%`8Tf5-`^!9R_3%r*2ChP zpkZ+_Ad!6y1ZqzuWeD+eQF*$5?7xmRqZKS(+cs`Y^RYado=>vcTb?$xCmU0?VWpxz z&>e&6)4dQnC!mQ4rxP%hn@{ma*QCUKdMq;o$PuAR(U*TFKyKT4_?KB_(%JBycTV4% zS)>I*e}`}H$AwwVx_5l9S?4hB3U(K=1baNF%6;?yQehq=al|71yYjIi*#GOJFOP}ITU03Y2j0OSO8mD{ z^D#>-cNH}BZTKzxq4@uLpy?cg05GWQqLAR!|JIz^RMMVEeE-SHe*YF!<<9>gAQ?x2 zL7-~bN)&>V_TQi2=Cl|YG)#0*9^GR2Zyig^ZEYOyeG~N{Yi}n1rtJm;etlgcFV1+<;Js+(~Uer=+Wv$r1`kSm0{NC;m zh2%sukWH+!LCKk;Z61hwIs_#q!v5&HzqK}GDyXW*%o>-q%2ytGF+{<;+`i zk+U2f|Elkq_~7SnX4SSOc?`cW);ZTLD}oYx0%KabA8DFHP}SRmSzc#hgf%ZPzp8lU%nQ)XO6PNh@>Aca?Y<4@oPnxeiPEE(jlTBlb*HO7 zu|+y~-axzIOfwpv_EEf-8^noL0)m{=N=iF1SdQZXyU`#0>cDzH@tL5^%(-4wu7f!vB}s$(?&+YaX^oN#UhLi4a!9tS^9F*gQW-2j@T*i`$i{vLVsEt1o3KsbmP#Qg({{7QmwWRu5O z=Kd7ZDPwTQY{hIaCuIE7ibFj?JC);6JZIw74%=m0#I^IsfS2jpB zxQ=e+zl?G!%|>L}+Au!xdF`Z|<*>rnvJ`qfL`WaKqcs#FFhE9H`tuLBz7m_mx?j{| ztt3W=IUye|{bmK3LI6q7M-4(7y)W#Fn!2=@7_`wlO<~P)83!?IsY3{F@KBX{F^SXc*6RMGl`Q2B#alA zT{DFR&7*i)+}<+7#D~JkvJVI7UJWYKjq+n|z`2Bsyy!c6lGra5KC7iG@%eIMUvU5u zxS!;8Y7m^YW!*1dm}m;H%d;NtJ_W>JfnRW8xDyyYv$b|2kKj1u{cY_1p`uMy6j#|7 zoTlhh@lL*Lt@6(LqPKC+pKWx;L{|Q}j>|f*ZPk*>M?>4+nevjDWo&L8e2_+%6U##K( z^Q8p^jQ4OGI$6`RjDw=OxoGmUF}Q&Hefo+f}1|FoDZ`fIH&d2JzXTNk5z8k_)cqBur zSqIj-4WFDjIbJB)zc}4kRujr5w<5fs4ieWtb>^J5JFMlvxpo4$P5)lpz&4bazOjg{wGAvEY~=Lm^G8ai z1!$Y-kdM2!V}&Pk3$^F_bRAw5gYQ-7GYLT7iyge+PlXp%;x!f6sU(hKlqNdL5{)Yl z`Y`Ai^v1Y?U@(bJFf)4L%`T*oP+Qnpyd$_ps1?>W%8NdC*Ax(uBsYSYT{OOX1kCzQw(y5fS>|nK` zpD4*DE|^2L$gRhgqqtQF(iKA1gIjqb8_0Xpe8>N1KW4&^cIlfF;_uJ(8#c~Zg}&fD zGGpUYzU`^{vh^2wfUQb%`}>QO;CN)Yq97l2{-cU{WOiELE-W#TO-fhzNBt}9x~%Xz zv4CF<`$HZ>=V$<>JvN{3?t3z<+5BlUKB`IFk^5t_dh724U=4J8-SOr<-=!jfuXH*I zP4*+Fv`)1=LtvuWyt!Qb-1>+vl1faS37SCDf@y&|6&%ha~8FuMEn1H`7H{YPDnr z8y|ADpO|_etGd?Z&_qGeK~lptnNuL6Lry(AC{xAt{zbGSGCQN@B7yeG>KbazXHtdm z-}tHM2lcxWxZJ{ybB3b`lOk^b!Z5xaf}(sIZ0!FgshjzU)KG*qC_; zc=i6OyMK*@j(8DBUo2k$YTdDXWQcm|Qh5<;_AshRHES!j?D#C5or(DlpbZh`{XDGi zb~NdEZZ%c(Cm?Mp8&v-4QFrWywhA8+Qi_BufI^rDSAtWC7UISAz9>&lY7$+`^)`Mf zJ4=WB!?{v$Dt8B3={0Vz|GdS+2n(qms<*OIy_!5L=8Y4NUKabdT-+B9IemgV->;U> zA)K>Bmbs<2au%(1e({PpvE~|43&%+oukgep$2fmY9<#e=c)9zFtHsiHW0|`Xp5m)W zqFa=iwQ>|L^-BjYA+-=E3{S_ub$4pKG%IhwiBeQ{Jb()jud3^GmJt%{L1dJ z03n>Y$NA(Y^<@K6gVhwwQXMmfxrWUPzIQ8*SMOg3SZ7KXR;&A$bsQJOTYf%?@ejClIJ+|J&ava=O?`e{1W~fXrQj=bYTV<`Sc+ga4GtSU%#LA ze?@cM8rYo(ivRY@k{o>EQTXXJ%h7z~cT89>H$I^nv(MmMe`!lg`n3~{h1l8Bi*u^i z*%8H^;<|qh!aN*WGJ$msDScS`bpA{|=ODx3(sw5W%m}GMlU^gd-=$rH`bU!}U)8ac zQwb7JC}?!q&hYjJ1n#U?3#Vj9z^041pu5C8rflMn{Fo-?Pd+=XG`>Kl+JNVUAW>!; z<&zfp8fk`Pv0wGJtMfd{0d{%PeBASLDEyLCQrPDcgL3mic8xg$ge=Yg(xf)emxjT6;DD?lfQn<1)g$l#t2mz-VcDk5rbsD zOa95$QYo5yV)0)XjKm%DrtHRbMLQ2cEndzlD8(hj_ki<=|%29;bN=bR5KmO zYgP{q1ZJ7=uB{^=)V?F80dT#>pL%1PaxU3aUVP0Y>#rV*t#-ifakCCj$erBg*RMK` zV|C*ir}A1L>@KP)p)~07qnz3E6jF?H{XtPXxy9#eW3|M2wt@Vhmopd+nusFEU=5oH zX*pwJ)mvQVhZx9KXam_}bfiz06lMx!6{+X1BcND_%YAC;Z?VM;x$cL_($2017m%_` zcyTm$1SMZY<7q;stZ@Dl|*(Q z_Mx?!5QU<->t(a4#=un9^oZO4YQs;nna{*})u&M2+?JzRO8e!;SkfE)A;`SZLH&aD z@NaZ7vLiOI{fTg1xXyO}>d9=RLne9M=bXJ5c6AtO{q8pWV{t2G^HK%+$1uXkNC|v_vHfOvaltr9(?_n zwIcb1He-odHlyW(-lt*|JuXLJ;ZiwQED!rI6zAH=GLRS~F-F34)S>yHyna$1&b6^3 z8n&a@7h~wQoOnEzFdDouHDt%C1Hk}-za@1fn%+5=%5(_*QRw1@yB%zu!HkO!A^ZIr zmYb|ZLLb0X@(_-XCN(pvhw0!k<$`Q2tj7#9DrtKY*ubZG<|E0M=M<)_VeIFDTo}ay zeXM&h^O5kPWu8x&03?>xGa=imy+w7x6REpBT)`z{82NT38^)cR0GF`HMqY5gkQA|_ zm~@(Q_*FW!?z)dFJ9u9sk6hPFN!uv6}F zJe%Sw!c}9h=$;N(jQn& z(O%OfFDIiCq|O~q$$Vh~vwrKDpcIoDGqr}ct)k{m=iwX|*60T=4@EJv6SId?IiTf^ zXBsAou4~d(UP-B_4B-YbXL5yHINuOnBJZ@MQbUkGkaAt1j+M zqCZ?IlC)5-+tAJ{8lFuzdlk1C8!MkXHS}(4Kq4@ zlHW?txc918grgoR)jdzRU8EtgG-ph zt(zw%8B<{SPl22zDvNI7lxk;FOA3G5LBN@sDdtkgw_dni5A!Y}6QOo8bsuZkx2Enl zQK_J;%~&G(yCTQ8SqJuE&BVc9(hrQioO)K73<;ae18xg@;4$BSqOZ6~aI1iGnnLx6)B~znR7w##tN9@13)=Rc-y)=jNZ;k%OeGfV^1C{j(^Eqa2@% zO1`x!%i6j1MC|M4`pY+2_|cu9a?)L;N5M@qYPxkCDnen)at_D#tKDJ^Z?iTmatyoF z74L6F+2L(2V76WZa0)Qdq|dHr%B(6z!asZsYau%EeBQ%Anpel%&($EL`PIQm&mP~d zgN*Qxn{Gu9aL==u96XzF=LZ(~#NDF(SQ?i67u+KztHvLKM)cA@tsJn15l!W_Z*~Is zo}>S2zCFgj+Dpv+wq>pv?EE*p@f|3Zf&zdy+BeuWY;}uiCy2xoz?deFr*zD^etakp z@7$;2WR>T1rQdkmXz7JIdIR{|Z^>VajJ6%PY+p#%_h(f6_lpd?hMYotEEPM_q3zko zl@xhc%((kAH3Y( zwEz&LJi|{ci+*0Xu1fuG(ENbA~KR z8Cx5xq@6{c)`R}ZciKc?GyXFTk})7PWT3DyHCeuF)a2ZgE3S+U-!*A|$21LF3o$d8 z`-fPDioo%FE=f&;(1{@%Q(a-g*Tn!>Fn~adJd^$VwX{bOs}gF1Y;*m7p#3W&%FZKj8 zV!P#)KTtOHsvY=*8-7k(vq86EU4c<9{(?&AYTKhRyHW{Rz&_x>pC^g8i_YwNW-mL& z{*E;)V~MjeFJ52!Wk0zotlqlm))Wd)y%n;qU^-*h{?CR{m;jfd7+l4Cs)VGwufhM& z(=2@1*(xwzeuFfr8l1rDfxvh6P8PS|-ECv*=UINKph;{?CT-9%OJ=KYZM?-}m|IQa zal7fBHZ2S#=3x1;;YZ{EW3sZ2x2i|a z-Q{kQO3slw`(95z6ey}1Z_2tx(jQ%Y4Xv}tW(oTXt(!pqf!0yg|AyB6vn%H_2A%(y zKtFWnA`5Lg3Y=elaA6QJg75G{PiV+X(3+~uZu7p~%lB376dQ-%zsoBsnpRBM@oO|W z5664wkIwv(w0nkwKaW01PQFLl9Gz4k{}1@A{|n}IquPvVL03}&^Smb5U;_Wf4MGtJ?oX3}HOt%o3wI{M zgz9)GBnrUMUIHjI;Mbz}6LTk^5{n{GiKTU+#@NX5|7#JH{|harq%iV=-SD%g$Lpfy zUpFKHR2Be*VtBHaNHGvBOOd!HDU8SP@HZYx2U8v@th0JAhr1d8qc{!% zVRMu^4~%-Hq4&;M*>Sr#iiH^;*~(8js0~(Eub+0$SMgSj-~F^EbzNFHU_gdL_J2Y>!KJ>H{Js8! zMH9-I$i+c$+f0LGPy5SheKAUw1eZ-%$*%-&_o(**r-C>;X%J?c zx$1L8x)lb$NRApcyRng^cT+B1ZVx2v{P#1*8_k*=p?S?10n&YEx}% zTF7C$g=WJ4&=b$TcaW_sNUtS7P&jOBKe3zeZe^&pO7@P)_b?7JAh8n7b%hiHD2Jl6 z;u$4B51U5eug^~;8c1+t?Opv}Sr_-`qY06^uN#zkpvfNrHq`xP+H121w~Oc#9ZA^cq%^ZCFqMEvNroP}6zC|F=E{edWzgTbTJq!9f$m1jLN=+xd7XvLx1xT? z60k`eW&yger7|?#Yz}MYs`%S}iVsIVNy3)zF04Jq?COpALND`4b$>rxTG9UvTw=k= z(xp!p;G>54s|jS`r~O%+;QOqW!HjhCh=px7oA%EicGJBgRBs9RdN2_PdT*0MZ+*mI zMCCy7GsC?P*Brv2UpnV>Qy%Po?qy2nE@KxRaw6wb$AABI+v{otra;JT6pk=3s0_k?vLA6re)FbQ*BXH zU@;{)v8gC@mzNnOJ*QjVy|ZFDH>!OQ?%?OPphcIZ2Is84qMsYvsj}(pZ_5aR!XdyY zckUf)bYUYk;MMk1tMgU2A1Dw#(S|y8LCcvHwq(`EhzL~t{h68gz2^2e54=SnOuZ(E zh57zAH{r|Wi>;uD8B5UzhL#Mdvy{>wxIvKuI~Nf`m*;DWzI%xg`(+VAX9vZXe_Ywoj!c?XID5dDP7*@Bz~q*i~e$emjma%leQ#PaHR{p4`044JXP;}B5= z|0Fm*E#%RCGp4*zke6cbhPg7_oW)KhDcP+EbhsQh1G8wdWJV1nySrC2lihH0H5@VZ8LJ@*F5p{Q<0he75}{TwHlH!NqIeU` zNED(>XkQ=J_HMkqd=<2HCGwFblk!Z?`^w8X%wgbbL~e!Tc|y#>;8vciG2>I=W0DJS zQnwX8rwHU&`;3YvqOBqb+HwKp3m>@0w?}-?DeNPa$Zik`mk~5Jn_4<1mhEY}e`ZRZ zN(n7zk^Z3W_zrx!HwmMI6j4;$d`^YvgA5|M)2<_n+mda;vjQWUA88%=RKP1VlaP<0 ztMp?;N6Fv0Yg@`?!;I!`GNUf&nQqHIsPpPRAkR}Jt0HHB0QtPuG#V$)*Bs1i>`_Hq z*;Hq8!Cjq(e*3IDJ|0c?XD$v$n>F;Ac2g9EF7|mES=TrXo3HZFMB*`Kx$>*wT?6%5 zF%pe@yf3|;Ov|)KPFP|&EUcw$zpcUQjJPxGn2h5{U({ac3@vYJbj``dD_W4((}CoA zFz+X^^?TMxRM_z&S3?xvW}bHIoVRyY+P63Kx+zHO)KG`-hler*y%ZcD0}!MOCt?EL zpJ@a95&jA7n=^i>iHP=+o5_k@aIX1=$?TVGaR+N(>c%M1AtCH7Xt0w6NXWrB5`f+~d8Up!@9;%dYw2Tw9V z<h`(Gds2;&y6hcOO}Nluh~~opss&nH@?_ddF_BN{AjjOO+1^@t0Y>KP=mw3L0WLIDhH!QWs+7m10fh zptklyh!NGKLVS9Zu=+LUewGHDY*IUEFOc}vJM)Ps1NZ8gaOC(dW%Gu4vg#JmHF;5o zhK+Ix6uguxFUi#5A1W_&?v76n(o^MAjn+Xz9zsU6a}92R^KRTLiT1v#*b0g|%(?W2 zJNO`c^Qkuv>Qo3fYpxOyAabE|!7K(9tv2Z`OJl_K8uNW(0Os%Hi~*@c_;>{Hc)vkN zXvkHQaVH;#dtcO;2DG_@<9m~aPjYjK3bBXgOXC*uqbDc4p29a2`FU4C1eKoO50$Sn zX&vjO1(bt#`OF_aM`V6tNlTG{ib_SC(ZiA!bs$2U%&3~7!!rP(=$+35ep&%FLBj8| z?;h1kL!G`i-(p0eejXgDXPJq`w7mi|T63w_eKIOCY}88;)*QJsZ}KH%#6VHIJ+Dhy z$SncoSfACaz=f{gm)pREs?Y)8d_zBGZO4!Jei5glbc`s=eC9)}qFc&%UV2~c zTp(2Eq2*(g)Iw$sdsQhZ~!CfNK3R5^ZR-$O`C@3 zil%l`$#{*>#rfW>sU!GA%lpWN%wPJ_H^0-=gu-x z#b0)ygn(0SZP5@Mf)ER#x~1VMdq0XS^9L8!+k=voIE?v%>YysM1oGB+?!1!5oQ-NeQdGxS+?i$@b)i-T z%0q!XuJ@)N){Jw}5qac0Q(J!f6zV(2EF<{CnsCMD9{8R|;efRSqzS41GeUSo*4c-J z{bb;m%9|!C{T&`gRA$aY*@6=r{*$^yk2zfPoL^9 z+g>dY_tJaxvFb*+KS81yQQlM9Nf9b9zrtODPMm4wxpB0hU*>7^r34yln881-Y9&J~ ziGx39bg~aqif292hL+RbrlLy9KLJVa!W%w7{f3AQ%K@lfcVGI*t!(PhqnWG28uvFN zCBnTbgvd|WP_cIs730mFc)BxN{f<`>csDNhbu+H1#g4c%$4F&diwG~lb4%Kf0X4YD zrkq*nmi#2{b@a>koXD_-<{`{3ujd~dnEyiE3Pb-~MNZBruV=-Eem>nq!!(nMOxUco z?Vs=;0vT0hB4r$tU$u^WzN9?rw7#)@ktpH_sOnss#6`X;35&qn(cYNwrE9WDdaR5B}whRvORndvD8ML|va zzBBH{&ef|lbEMppuh0or3m%;`s(O8VEW@5$mAFjNe0I7l>TLT9B?3uzsd2x05Yjf) zNkBalbLKh};dNxNrzZJ%dSvziBJ&nYzvUc3c|!+Ky0KxV$1iR2Q5q;ifS6X6t;zkx zBDa%iDx9So*nFN%VUH}f<2_64vz(h+FWy-jr4d2_uLD@Cwq3-+nW=?usCs-pBPua1 zs!^~1QM0Q?7Q*Z;z~~c^ny^g|9x2FXSp3Oz4f9kG^H;((kZEIx_r*!h3A>srxQ>$} ze%t+-bHOUKT+2=4SF07hRtE#>5l`c&o65zu7(^pzkrB1~Tn&N@)`ps_`hS_rmQiSX z1lAT(f&;Pnzg2&a=Q%loL3M&#N^pTKV+CqM?LTTbh-N$~ChZ=5G1=;!dr#MESD^X1u@7=D25XfurdS=dU-g3axp7%!Gw=Xr0g#(v(zrKSnco1;z z9z&yLFbth5#Ka&Ra|BTM?cst8;RAgx`7DruJ?|sB`wQ}=1CMB+G+W!Dbys7h12>f5 zdRxX4)P{D1dEbe83nP`_p@8ioC&B;u!mukRWjtCH@^ETTT`m*%U_tn+e`4;3<&UX* z&E*J|cDz^isUVN();9u8t00xHrm^4iM1FV65tK`i&dhg{;)pRO)59K|r;Mcbi^inc za_n$8@1OJIEM&*J9%pd>JhP$-h98V#F*l$(sm#=wvEc6sw@uZ}Hc&r>XVJ>Fi=4FhC0lOc#D{dwO%-F4hg z5w?#~VCa=$3hGynDZZOa5x}=*CWXrvUO=8Zbb~F#?90Y0PDJIEaSi?AlYK4FseC!B zaUFN!mRiNh?kZjKefgRZD<|QwM%NyPeFwW8wuY<6QoXGo{`P0|aQoT{M45FZB39qq z6bK#p!@0~{B=hCIr;*D_J3}D3=?_#1b=M?A)LK-8S%x*1_62;XMr&+|N2oO5;dk?l zdxooGUTud$z-8Z@-eG`Id(=wsmg@?Pen7>Pz|pad?OGA@9%;|X+j+RjR=h)954-O- zSz`eEMMxsBN(k_6J22$>qzq@sa< z$yP5Z5=!FoK}#?HtO9`3AXQNpoAE09$Yr{>aI>p&0Uc0G5C+e(jIva#Jao+5DQTyQA~rW(%1PbLFIHDW4Q{|7F;7IPtpCk+HPF$?$l*ROV_uy?9x28rxp4?N-A&}A-WMX7IT;>s{;ci_=|@2*c2=DL9Ek} z>7W5`!Y3xts&dO7hQ?qI_TSEmFI@~qIoXJ|HO)Ql3;+Y}=96Pp*R^!*ND+!DO-L%e5=l|n_yC+{Y^t)X!jnD1BKb4{2^K*+_aaxkIK z0ud6yAr7jNw9cPM6xoJc2pK7xSlteKn-Pf$ikLP2!-LQ{jP!Z0As&Z=^ChL6k9+~) zd$n_K_zt3{-b?a-)k%dbO@G#|G%$P3`J0+AGS|rAwiG@3Jmr*y65OH~?+^ZF*kqw| zKLgXM-~{$8^!R}?ZkEH;`{qg)U#Rrmn19y4W$7tR#bpf2;Ts+{J9?w`*m}x3U;jM` zqrm{rzD#T-d>iK4t7=-$;u$H8J>uXT9bvaoMmtx(1sW;&& zc)HQDOSyKjZaIF3Fl$3wAG(WP(WU0vX)z!1GcmZu;MiRwY_vgr*lvk;R)Xn=waKFsB5MUbw|gvaIG7+ioL^(xca#2k!HlU^Kkv!sS8YdZ2r~$<;^nlG z^6S~}{b^sN!+yZb(56igD1gd~P=JJLJ$DOY8N(()m^}p?s*mI`@?%0rWPp0P>1I%; z%VgKinmx=aEYS_b&PwQ(mj`?7+h1oVx7ezFQ9(H#04`bpDNY5z=f-7$iadmiDsVD^ zdpb{ZX0sh1Um6WU2{;>(BOXdN-iyZZ3Z7@bWwILsO5z%irv`s^O6_$Mk$D%I0kRgcg_a34eR1ej{Y)D6!ztDC z$#fH3Vit?qvGIq_XXm$S;2q4Gw1T6UHfkysc)-+_j^jogymo-+`^HSogNEFKYV`oy zS@D##KLdD5Y+~4$$S^MoL5e{TW>)az3B0jIBl$Y!q0hGHOa9$hmPF^<50pOCRu0=U z>rkmNLf&iQX1yLpr#Tylh^jTfN7Ll}e)=b!L0DWCvs-IG!kfLA{KEq^4rOQ4BD|3G z6bhHJ(xoopqDwoydh>!DM5e4Msji)`4)>u%k#&9`=XfQR^Kpl4d^q;sc~$vZJppSq z0w(S&7dX#QTNd_}jJ#<9GBJTOhC!@t7&1xVBrmoZJz*^E|L|SJ3y(zn3iuBd(?XK1 zND_fk7FU0Z+B(EJmRu+J5 z3;jK1a0S2LMD~0Q-syW)=O(rK{KgI9If#U)n%BkZcay*;v7`J8drIT(umtmfv{oPc za;MY_Dn4g5Cdki*su!a~S`e1%{xG@?B{qEXM@}Ov(K9aEX;vQGiq;Snh(srCFRaWi zHZ~KY^gc(TRw4S|2E{AjUAk}qm*XdDRe7!b42$LB z6PZuEVN$HR9xjfb!E4zuEVy^Bm#x~eQ_M8d5Js)2X|$=A=kSPITaQ1amYgX1-lVt_ zzo(q0woU&oYK1jXDd5&_$6(BfhH#2o#$@>wP(}mb; zZbILY&HGTjzmAUI;E9LTwbFG}G-wc#eR9(-PncU{{s3glzM#;}&&vBs4D+e0<|_fe zJ6@0WhnITpBj7(Sa`Kh^nz!8Cb-BM_&u*(9P;e#9RvQ{P+nS1&T>;1~Aa;^Uc?=2+!RU?{&(uOVJI$zWxyx1S< zJyC&w$xNVsw7c;g>I%Ry`;8;8y;BTH9| zS8XXkD5Z0mP!7Bc>R+;R3N={~>;occU-4$$R5<4aN@eD-4}3(lrS(8~AJ|PC0{@_B z$Q`Zrm+_`3*Fc{)lMqeG9tNsFuXh)j0DD6SB?ClRuhs<61O^!(4wWtiIXX2e2(wHV zn`g?j)>Gtq0@e;V9SlMCQ{^se8in{klK!l~Thck@a~_OEx`j;oF6+cKv9pMzJ!mC(vXX-oU8iaHTiy)+KpAN_bq(L z-~LxttlA*;l)H43_0(yzpV#HNu+zF$V9LLZ2R_^|X4Z?|i^~ zQjiA{%#Wyj3ug+%IvzI$67I08iF+-o?hyU}ImvU=s7RD#;unaTCzI2%T#?C5NJe-K z0KiWmE=-}>>jCr2cRI<+U#g@YOK=)ZOejU4?|*B12!?!{gC@T$=kHn%@@8tSEVNs7s9b2l@>3#%uti#ePLCzhpkFFk# z9ChAy7xIw%DY`ktBc_=CiC#%ilKN-Q&N%aiPOG&OC%JyWspNv=LL(sg@UOxtn>)U4tmo*``K z(QkI-Hhf0=H|l)`a4Y$}0OTiL_38#QtbT3-!7KEjiV1{0Jk73_PuOZkynV;Q55J z^Hp2@RXjL|t&BfLcG_cTeSNTC9t`oET9C*RGU!aOw~aM*$-xn3{*@TcOQ8f0`Xhw~ zTm>LG-z_D=Ncqd;qX%vn4;JQL)t}J$mSY@Jx>I>fXz=&P{meS7lot6z!ho7pCCmV| z+^q}TRN_~~%`OHBZpA}^E6#pF{bwLNh~Z;4`tZa2bSnj1zuDdG@)+4$e@10A#rlkX z#ATx2zi(L%yz@G4@2=b7_}vDqA{Vp82^TN<=V?i{`ZY7GK3ZSXuPgsa(rWhzKe`XF zJR|StIQciZJA-90Il?+jK{jW;iY2woJQHSU{RY~E7n($rGj6rzvnYWduYQ0V3K=He zb9++Q;rF-@;xE*!_$Skxpu#>x7%C1MFXR-(Xxix7$kjkA8CSvYq_`|w-l@m?;kt`mwnK@?aJDu%!E;bRUrdT<7xN+`XJ3OF# zj-Fhci*eAKOXCBp6);^?9B?=u^AV-{4M1EMU^RTQEhbBQ((og|FeR0d3DyX;m4P}9-@Z10WCE(Y5?)|}Z+ejN6HTrrzWz`> zR+k`&{It>7tA0-d(FAgq%+)K!!*s)AgrM~sX9#G}2WpmQkHBTO)g2&d$75&r)ZpAF z)`2539|;!b7grgtb@Zpi%A*o+YAhP|wGm=8Y_*Y9mYG20Pt6qL8 z4xK9yQE=7~XcVkXj!pv=cag{1wgj617URiAvu>qc=)6AJ0q@Lhpe#8`%__pZHW*;T zxmih^%?c81aCo`a?_`+5Q57 zX&Y1LwbQ|g>2Ji3*4z7UiH+7t;-XmSwUGIT_5Y{2D-DNw?Zadwov1UG7RDQ93=J8g zj$I;V>|@K4u}_w1EMq5%I+i&mgRvws#uC#Yq$E;WBspcv+Clb1hC`uro}ueH=l$@0 zf9K0w*UYp1p6CC6miu=<_pf5@zzxbF1wo035lfHTe6oC%a&;Glo*0|b<}N{-Z=PVm z?-dR*9-4*Qd!BPgtjlFG4JoW+RO`oXx?VXRroEbs9;Ah6p^yWP8@Aiy!cEUu}tK^IeLH_|*h8l>kg9KKj>my$>dQa=<9pw$= zb7HlvM4|(2yAHs*E+%hXIIfFWI^X|u!q`!qqNJ87hWth9SJF@JfI%rnVfa_-)AN=` z%^j^H+80^Rd49lEt&PrZLCjD`XU4PN%EYTk48ju}41|IxD11|VTL$4fbM>HyY>#t| z#-c11diAqse{K14_n3Kaln~N*za>xM*9`TgXZx?7~o(;75&U`g}|96@dl z-Vshv-_6A!<7uK6+G2mY|K7+yrT^Auv$SLn=-c=o*ckUJZh}-edjDS6i(c5hLH}eD z&V(rD8)WO}r(79!1c6X?P^yGW(CUa37kX^o3oaynh>Om09~&!Z^}du4F}r0#;*XL+ z92XqArb+K>yiebaemNF?+$iDEc+u3C8Xmt>%DSY%n1XPa>vf z4DXJ^C;i1Yahw(j&15n`Y0KyU_Syd=OgAZ7W76<~LDy|Ci^E{`;nr#->F%Ax1vfT|BC3F&u4K~m+m5Y90WFE&{L zBiF5#4%{tVvk!;#4In6i1odkGafeB#;gf4kAdC3{q{8x(EWQWpUlS#oEiBkNI+5fx z7j3O%ukrYMYnE@)x<&(9bnG|R3iaZmSy>-vix(D*$pczQWXzY@_My6rkLO48VV?^B zJcD_)2UG(Ki5l%(@TK10_F)i*o3q|$SkX?6ch(y>vTKwIS{R|nUDGr3Ii(S+wikcm z*e5LzM~k{9j#EWyzU7D0ytF4LlwGT7P1Y!%CXME>uV$l{#pYRY?`~{iHe<36Qg@3T z2EH%wDo=VQ=^1!}VT^8TRNpk(ld^>J=bUVEM`(zT&(`yeOk3cVX7uyciHk{bp}~PN z-=#8WCZ69j)o#Tx#Q7D(1uXGN)4}_12AV}%++UgOUo6-}9>zufNMrrv+MK z^JqXWdRp@J;92i`8AtveYBSX|&C5H+?wQ}+6> zHvQMt)a-+7_zRU6h5{VTcAE#^OoqPXW0%WYI@7VI_9o-`=lk>0 z_Wk7Pt5OeF?e{;^w0aUKZbKssV3%$Aak6ps!V4`COs$3euGX9OeHhX$ID#L$Yp>4( z;Zh2jppF0seFvh&9r$cB4OK9`T>&H>gCnR+^^HKA+6&@S&0?GNJp#zQd=*& zbSmRzahJu~hhSq(pGms9eaC@8apHR5#Gx(t(ITLK` z@awkesF|$6OBS)R%H^gLi5qH*O;$P_0Y1c-@P--U!PD;E!WQd^l-_;orzEXDH+X}&q+yJwgQ>=0R#+$z4}=da)~JQL z9CC24$A~oG=FN?ZhrpglE7T6|bQ9yC8jRkyVt{EfO|H`v8+q z2v|p1*K9p@RjX0~2+8>;mm-YX#29zN1Z>Xl2FR>HCM4&%WUN&d*h-NNvE#e_gfYgD z=J}1N6bx_&x?KV`M8572wD%o$G&g3#d-^S~ZYo3n`AVF^Or;I;ejAIsLN+yy2`K%L~hrg}q!rP^5Qq`E_2 zd~E9zc*Tx4+&&d7QQM8`h#bM$wchLnC(!L2av=r$F$Y^%f$BesID6ue2rK1&trO5r z((+ z&|NJzF}LF_0lT~-Z9+T5U0y&%vFiYJKR7B2!i7hrcM7dXJjhc6drNS9EOA>ldph8w z%9iOVAwLD7z6Id$bu$rw6@c-8Q}ExI!HX>VV?^59a8gY7FBh9SZs(FpBzBm^j#enu zQmt0pBFo2_6SQj8`-@Hc!yfJxO^{jJOQW@dp;cLO%P-i67QJA;lPB3^)y2*{+}?B? z1UL8Q8*u6P@BxI2-KH29_fprR=J`vw!mMCbP$g%-q;J%#m##oq5P31I)giG!7vf(E_Q7F&fi12KW)YJcnLH>6EpESDU=d z4$(dRE6@t1B+!@I8vUPzXjBmdfWGKsbRWiqE&5IL=<;Mt$R}1li~u*~^{mt?3!;4Ze6((W + +# RFC-45: Asynchronous Metadata Indexing + +## Proposers + +- @codope +- @manojpec + +## Approvers + +- @nsivabalan +- @vinothchandar + +## Status + +JIRA: [HUDI-2488](https://issues.apache.org/jira/browse/HUDI-2488) + +## Abstract + +Metadata indexing (aka metadata bootstrapping) is the process of creation of one +or more metadata-based indexes, e.g. data partitions to files index, that is +stored in Hudi metadata table. Currently, the metadata table (referred as MDT +hereafter) supports single partition which is created synchronously with the +corresponding data table, i.e. commits are first applied to metadata table +followed by data table. Our goal for MDT is to support multiple partitions to +boost the performance of existing index and records lookup. However, the +synchronous manner of metadata indexing is not very scalable as we add more +partitions to the MDT because the regular writers (writing to the data table) +have to wait until the MDT commit completes. In this RFC, we propose a design to +support asynchronous metadata indexing. + +## Background + +We can read more about the MDT design +in [RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements) +. Here is a quick summary of the current state (Hudi v0.10.1). MDT is an +internal Merge-on-Read (MOR) table that has a single partition called `files` +which stores the data partitions to files index that is used in file listing. +MDT is co-located with the data table (inside `.hoodie/metadata` directory under +the basepath). In order to handle multi-writer scenario, users configure lock +provider and only one writer can access MDT in read-write mode. Hence, any write +to MDT is guarded by the data table lock. This ensures only one write is +committed to MDT at any point in time and thus guarantees serializability. +However, locking overhead adversely affects the write throughput and will reach +its scalability limits as we add more partitions to the MDT. + +## Goals + +- Support indexing one or more partitions in MDT while regular writers and table + services (such as cleaning or compaction) are in progress. +- Locking to be as lightweight as possible. +- Keep required config changes to a minimum to simplify deployment / upgrade in + production. +- Do not require specific ordering of how writers and table service pipelines + need to be upgraded / restarted. +- If an external long-running process is being used to initialize the index, the + process should be made idempotent so it can handle errors from previous runs. +- To re-initialize the index, make it as simple as running the external + initialization process again without having to change configs. + +## Implementation + +### High Level Design + +#### A new Hudi action: INDEXING + +We introduce a new action `index` which will denote the index building process, +the mechanics of which is as follows: + +1. From an external process, users can issue a CREATE INDEX or run a job to + trigger indexing for an existing table. + 1. This will schedule INDEXING action and add + a `.index.requested` to the timeline, which contains the + indexing plan. Index scheduling will also initialize the filegroup for + the partitions for which indexing is planned. The creation of filegroups + will be done within a lock. + 2. From here on, the index building process will continue to build an index + up to instant time `t`, where `t` is the latest completed instant time on + the timeline without any + "holes" i.e. no pending async operations prior to it. + 3. The indexing process will write these out as base files within the + corresponding metadata partition. A metadata partition cannot be used if + there is any pending indexing action against it. As and when indexing is + completed for a partition, then table config (`hoodie.properties`) will + be updated to indicate that partition is available for reads or + synchronous updates. Hudi table config will be the source of truth for + the current state of metadata index. + +2. Any inflight writers (i.e. with instant time `t'` > `t`) will check for any + new indexing request on the timeline prior to preparing to commit. + 1. Such writers will proceed to additionally add log entries corresponding + to each such indexing request into the metadata partition. + 2. There is always a TOCTOU issue here, where the inflight writer may not + see an indexing request that was just added and proceed to commit without + that. We will correct this during indexing action completion. In the + average case, this may not happen and the design has liveness. + +3. When the indexing process is about to complete (i.e. indexing upto + instant `t` is done but before completing indexing commit), it will check for + all completed commit instants after `t` to ensure each of them added entries + per its indexing plan, otherwise simply abort after a configurable timeout. + Let's call this the **indexing catchup**. So, the indexer will not only write + base files but also ensure that log entries due to instants after `t` are in + the same filegroup i.e. no new filegroup is initialized by writers while + indexing is in progress. + 1. The corner case here would be that the indexing catchup does not factor + in the inflight writer just about to commit. But given indexing would + take some finite amount of time to go from requested to completion (or we + can add some, configurable artificial delays here say 60 seconds), an + inflight writer, that is just about to commit concurrently, has a very + high chance of seeing the indexing plan and aborting itself. + +We can just introduce a lock for adding events to the timeline and these races +would vanish completely, still providing great scalability and asynchrony for +these processes. The indexer will error out if there is no lock provider +configured. + +#### Multi-writer scenario + +![](./async_metadata_index.png) + +Let us walkthrough a concrete mutli-writer scenario to understand the above +indexing mechanism. In this scenario, let instant `t0` be the last completed +instant on the timeline. Suppose user triggered index building from an external +process at `t3`. This will create `t3.index.requested` file with the indexing +plan. The plan contains the metadata partitions that need to be created and the +last completed instant, e.g. + +``` +[ + {MetadataPartitionType.FILES.partitionPath(), t0}, + {MetadataPartitionType.BLOOM_FILTER.partitionPath(), t0}, + {MetadataPartitionType.COLUMN_STATS.partitionPath(), t0} +] +``` + +Further, suppose there were two inflight writers Writer1 and Writer2 (with +inflight instants `t1` and `t2` respectively) while the indexing was requested +or inflight. In this case, the writers will check for pending index action and +find a pending instant `t3`. Now, if the metadata index creation is pending, +which means indexer has already intialized a filegroup, then each writer will +create log files in the same filegroup for the metadata index update. This will +happen within the existing data table lock. + +The indexer runs in a loop until the metadata for data upto `t0` plus the data +written due to `t1` and `t2` has been indexed, or the indexing timed out. +Whether indexing timed out or not, table config would be updated with any MDT +partition(s) for which indexing was complete till `t2`. In case of timeout +indexer will abort. At this point, user can trigger the index process again, +however, this time indexer will check for available partitions in table config +and skip those partitions. This design ensures that the regular writers do not +fail due to indexing. + +### Low Level Design + +#### Schedule Indexing + +The scheduling initializes the file groups for metadata partitions in a lock. It +does not update any table config. + +``` +1 Run pre-scheduling validation (valid index requested, lock provider configured, idempotent checks) +2 Begin transaction + 2.a Get the base instant + 2.b Start initializing file groups for each partition + 2.c Create index plan and save indexing.requested instant to the timeline +3 End transaction +``` + +If there is failure in any of the above steps, then we abort gracefully i.e. +delete the metadata partition if it was initialized. + +#### Run Indexing + +This is a separate executor, which reads the plan and builds the index. + +``` +1 Run pre-indexing checks (lock provider configured, indexing.requested exists, idempotent checks) +2 Read the indexing plan and if any of the requested partition is inflight or already completed then error out and return early +3 Transition indexing.requested to inflight +4 Build metadata partitions + 4.a Build the base file in the metadata partition to index upto instant as per the plan + 4.b Update inflight partitions config in hoodie.properties +5 Determine the catchup start instant based on write and non-write timeline +6 Start indexing catchup in a separate thread (that can be interrupted upon timeout) + 6.a For each instant to catchup + 6.a.i if instant is completed and has corresponding deltacommit in metadata timeline then continue + 6.a.ii if instant is inflight, then reload active timeline periodically until completed or timed out + 6.a.iii update metadata table, if needed, within a lock +7 Build indexing commit metadata with the partition info and caught upto instant +8 Begin transaction + 8.a update completed metadata partitions in table config + 8.b save indexing commit metadata to the timeline transition indexing.inflight to completed. +9 End transaction +``` + +If there is failure in any of the above steps, then we abort gracefully i.e. +delete the metadata partition if it exists and revert the table config updates. + +#### Configs + +``` +# enable metadata +hoodie.metadata.enable=true +# enable asynchronous metadata indexing +hoodie.metadata.index.async=true +# enable column stats index +hoodie.metadata.index.column.stats.enable=true +# set indexing catchup timeout +hoodie.metadata.index.check.timeout.seconds=60 +# set OCC concurrency mode +hoodie.write.concurrency.mode=optimistic_concurrency_control +# set lock provider +hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.InProcessLockProvider +``` + +#### Table upgrade/downgrade + +While upgrading from a previous version to the current version, if metadata is +enabled and `files` partition exists then completed partitions in +hoodie.paroperties will be updated to `files` partition. While downgrading to a +previous version, if metadata table exists then it is deleted because metadata +table in current version has a schema that is not forward compatible. + +### Error Handling + +**Case 1: Writer fails while indexer is inflight** + +This means index update due to writer did not complete. Indexer continues to +build the index ignoring the failed instant due to writer. The next update by +the writer will trigger a rollback of the failed instant, which will also +rollback incomplete updates in metadata table. + +**Case 2: Indexer fails while writer is inflight** + +Writer will commit adding log entries to the metadata partition. However, table +config will indicate that partition is not ready to use. When indexer is +re-triggered, it will check the plan and table config to figure out which MDT +partitions to index and start indexing for those partitions. + +**Case 3: Race conditions** + +a) Writer went inflight just after an indexing request was added but indexer has +not yet started executing. + +In this case, writer will continue to log updates in metadata partition. At the +time of execution, indexer will see there are already some log files and ensure +that the indexing catchup passes. + +b) Inflight writer about to commit, but indexing completed just before that. + +Ideally, the indexing catchup in the indexer should have failed. But this could +happen in the following sequence of events: + +1. No pending data commit. Indexing check passed, indexing commit not + completed (table config yet to be updated). +2. Writer went inflight knowing that MDT partition is not ready for use. +3. Indexing commit done, table config updated. + +In this case, the writer will continue to write log files under the latest base +filegroup in the MDT partition. Even though the indexer missed the updates due +to writer, there is no "index loss" as such i.e. metadata due to writer is still +updated in the MDT partition. Async compaction on the MDT will eventually merge +the updates into another base file. + +Or, we can introduce a lock for adding events to the metadata timeline. + +c) Inflight writer about to commit but index is still being scheduled + +Consider the following scenario: + +1. Writer is in inflight mode. +2. Indexer is starting and creating the file-groups. Suppose there are 100 + file-groups to be created. +3. Writer just finished and tries to write log blocks - it only sees a subset of + file-groups created yet (as the above step 2 above has not completed yet). + This will cause writer to incorrectly write updated to lesser number of + shards. + +In this case, we ensure that scheduling for metadata index always happens within +a lock. Since the initialization of filegroups happen at the time of scheduling, +indexer will hold the lock until all the filegroups are created. + +**Case 4: Async table services** + +The metadata partition cannot be used if there is any pending index action +against it. So, async compaction/cleaning/clustering will ignore the metadata +partition for which indexing is inflight. + +**Case 5: Data timeline with holes** + +Let's say the data timeline when indexer is started looks +like: `C1, C2,.... C5 (inflight), C6, C7, C8`, where `C1` is a commit at +instant `1`. In this case the latest completed instant without any hole is `C4`. +So, indexer will continue to index upto `C4`. Instants `C5-C8` will go through +the indexing catchup. If `C5` does not complete before the timeout, then indexer +will abort. The indexer will run through the same process again when +re-triggered. + +The above example contained only write commits however the indexer will consider +non-write commits (such as clean/restore/rollback) as well. Let's take such an +example: + +| DC | DC | DC | CLEAN | DC | DC | COMPACT | DC | INDEXING | DC | +| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | +| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | +| C | C | C | I | C | C | R | C | R | I | + +Here, DC indicates a deltacommit, second row is the instant time, and the last +row is whether the action is completed (C), inflight (I) or requested(R). In +this case, the base instant upto which there are no holes in write timeline +is `DC6`. The indexer will also check the earliest pending instant in non-write +timeline before this base instant, which is `CLEAN4`. While the indexing is done +upto base instant, the remaining instants (CLEAN4, COMPACT7, DC8) are checked +during indexing catchup whether they logged updated to corresponding filegroup +as per the index plan. Note that during catchup, indexer won't move beyond +unless the instants to catch up actually get into completed state. For instance, +if the CLEAN4 was inflight till the configured timeout, then indexer will abort. + +## Summary of key proposals + +- New INDEXING action on data timeline. +- Async indexer to handle state change for the new action. +- Concept of "indexing catchup" to reconcile instants that went inflight after + indexer started. +- Table config to be the source of truth for inflight and completed MDT + partitions. +- Indexer will error out if lock provider not configured. + +## Rollout/Adoption Plan + +- What impact (if any) will there be on existing users? + +There can be two kinds of existing users: + +a) Enabling metadata for the first time: There should not be any impact on such +users. When they enable metadata, they can trigger indexing process. b) Metadata +already enabled: Such users already have metadata table with at least one +partition. If they trigger indexing process, then the indexer should take into +account the existing metadata and ignore instants upto which MDT is in sync with +the data table. + +- If we are changing behavior how will we phase out the older behavior? + +The changes will be backward-compatible and if the async indexing is diabled +then the existing behavior of MDT creation and updates will be used. + +- If we need special migration tools, describe them here. + +Not required. + +- When will we remove the existing behavior + +Not required + +## Test Plan + +- Extensive unit tests to cover all scenarios including conflicts and + error-handling. +- Run a long-running test on EMR cluster with async indexing enabled.