Skip to main content
Log in

Robust Geodesic Regression

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

This paper studies robust regression for data on Riemannian manifolds. Geodesic regression is the generalization of linear regression to a setting with a manifold-valued dependent variable and one or more real-valued independent variables. The existing work on geodesic regression uses the sum-of-squared errors to find the solution, but as in the classical Euclidean case, the least-squares method is highly sensitive to outliers. In this paper, we use M-type estimators, including the \(L_1\), Huber and Tukey biweight estimators, to perform robust geodesic regression, and describe how to calculate the tuning parameters for the latter two. We show that, on compact symmetric spaces, all M-type estimators are maximum likelihood estimators, and argue in favor of a general preference for the \(L_1\) estimator over the \(L_2\) and Huber estimators on high-dimensional spaces. A derivation of the Riemannian normal distribution on \(S^n\) and \(\mathbb {H}^n\) is also included. Results from numerical examples, including analysis of real neuroimaging data, demonstrate the promising empirical properties of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Banerjee, M., Chakraborty, R., Ofori, E., Okun, M. S., Vaillancourt, D. E., & Vemuri, B. C. (2016). A nonlinear regression technique for manifold valued data with applications to medical image analysis. In IEEE conference on computer vision and pattern recognition (CVPR) (vol. 2016, pp. 4424–4432).

  • Cheng, G., & Vemuri, B. C. (2013). A novel dynamic system in the space of SPD matrices with applications to appearance tracking. SIAM Journal on Imaging Sciences, 6, 592–615.

    Article  MathSciNet  Google Scholar 

  • Cornea, E., Zhu, H., Kim, P., & Ibrahim, J. G. (2017). Regression models on Riemannian symmetric spaces. Journal of the Royal Statistical Society: Series B, 79, 463–482.

    Article  MathSciNet  Google Scholar 

  • Cotton, A., & Freeman, D. (2002). The double bubble problem in spherical and hyperbolic space. International Journal of Mathematics and Mathematical Sciences, 32, 641–699.

    Article  MathSciNet  Google Scholar 

  • Davis, B. C., Fletcher, P. T., Bullitt, E., & Joshi, S. (2010). Population shape regression from random design data. International Journal of Computer Vision, 90, 255–266.

    Article  Google Scholar 

  • do Carmo, M. (1992). Riemannian geometry. Boston: Birkhäuser.

    Book  Google Scholar 

  • Du, J., Goh, A., Kushnarev, S., & Qiu, A. (2014). Geodesic regression on orientation distribution functions with its application to an aging study. NeuroImage, 87, 416–426.

    Article  Google Scholar 

  • Fletcher, P. T. (2013). Geodesic regression and the theory of least squares on Riemannian manifolds. International Journal of Computer Vision, 105, 171–185.

    Article  MathSciNet  Google Scholar 

  • Fletcher, P. T., Lu, C., Pizer, S. M., & Joshi, S. (2004). Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Transactions on Medical Imaging, 23, 995–1005.

    Article  Google Scholar 

  • Fletcher, T. (2020). Statistics on manifolds. In X. Pennec, S. Sommer, & T. Fletcher (Eds.), Riemannian geometric statistics in medical image analysis (pp. 39–74). London: Academic Press.

    Chapter  Google Scholar 

  • Hein, M. (2009). Robust nonparametric regression with metric-space valued output. In Advances in neural information processing systems 22.

  • Hinkle, J., Fletcher, P. T., & Joshi, S. (2014). Intrinsic polynomials for regression on Riemannian manifolds. Journal of Mathematical Imaging and Vision, 50, 32–52.

    Article  MathSciNet  Google Scholar 

  • Hong, Y., Singh, N., Kwitt, R., Vasconcelos, N., & Niethammer, M. (2016). Parametric regression on the Grassmannian. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 2284–2297.

    Article  Google Scholar 

  • Kim, H. J., Adluru, N., Collins, M. D., Chung, M. K., Bendin, B. B., Johnson, S. C., et al. (2014). Multivariate general linear models (MGLM) on Riemannian manifolds with applications to statistical analysis of diffusion weighted images. IEEE Conference on Computer Vision and Pattern Recognition, 2014, 2705–2712.

    Google Scholar 

  • Mortici, C. (2012). Completely monotone functions and the Wallis ratio. Applied Mathematics Letters, 25, 717–722.

    Article  MathSciNet  Google Scholar 

  • Shin, H.-Y. (2020). Robust geodesic regression. M.S. Thesis, Seoul National University. SNU Open Repository.

  • Steinke, F., Hein, M. (2008). Non-parametric regression between manifolds. In Advances in neural information processing systems 21.

  • Steinke, F., Hein, M., & Schölkopf, B. (2010). Nonparametric regression between general Riemannian manifolds. SIAM Journal on Imaging Sciences, 3, 527–563.

  • Zhang, X., Shi, X., Sun, Y., & Cheng, L. (2019). Multivariate regression with gross errors on manifold-valued data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41, 444–458.

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the National Research Foundation of Korea (NRF) funded by the Korea government (2020R1A4A1018207; 2021R1A2C1091357).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hee-Seok Oh.

Additional information

Communicated by Joachim Weickert.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

1.1 A.1 Proofs of Propositions 1 and 2

1.1.1 A.1.1 Proof of Proposition 1

Proof

We first note that the term in (9) is finite because \(\rho \) is bounded below and so there exists some real \(B>-\infty \) such that \(\rho (t)>B\) for all \(t\in \mathbb {R}\), which means that

$$\begin{aligned} D_M(\mu ,b,\rho )\le & {} \int _M\mathrm {exp}\bigg (-\frac{B}{b}\bigg )dy\\= & {} \mathrm {exp}\bigg (-\frac{B}{b} \bigg )\text{ Vol }(M)<\infty , \end{aligned}$$

where \(\text{ Vol }(M)\), the volume of M, is finite. So the function in (8) is a well-defined density function.

The log-likelihood of the observations \(\{(x_i,y_i)\}_{1,\ldots ,N}\) under the distribution in (8) is

$$\begin{aligned}&\!\!\!\sum _{i=1}^N\mathrm {log}\{D_M(\mathrm {Exp}(p,Vx_i),b,\rho )\}\nonumber \\&\quad -\frac{1}{b}\sum _{i=1} ^N\rho (d(\mathrm {Exp}(p,Vx_i),y_i)). \end{aligned}$$
(16)

Because M is a symmetric space, it is also a homogeneous space, meaning that for any two points on the manifold, there exists an isometry which maps one to the other. Since the integral in (9) depends only on the distance from \(\mu \) to y, it is invariant to isometries, so the expression is independent of \(\mu \). Therefore, the first sum in (16) is constant with respect to p and V. Comparing the second sum to (7), we find that the parameters \((p,V)\in M\times (T_pM)^k\) that minimize \(L_\rho (p,V)\) also maximize the log-likelihood. \(\square \)

1.1.2 A.1.2 Proof of Proposition 2

Proof

It is known that if M is a complete and simply-connected Riemannian manifold of constant sectional curvature, it is isomorphic to either a sphere \(S^n\), a Euclidean space \(\mathbb {R}^n\), or a hyperbolic space \(\mathbb {H}^n\), which are all symmetric spaces. The proposition is true on \(S^n\) by Proposition 1. In \(\mathbb {R}^n\), the \(L_2\) estimator is equivalent to the isotropic n-variate distribution with variance \(bI_n\). For the \(L_1\) estimator,

$$\begin{aligned} D_{\mathbb {R}^n}(\mu ,b,\rho )= & {} \int _{\mathbb {R}^n}\mathrm {exp} \bigg (-\frac{\rho (d(\mu ,y))}{b}\bigg )dy\\= & {} \frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})} \int _0^\infty r^{n-1}\mathrm {exp}\bigg (-\frac{|r|}{b}\bigg )dr \end{aligned}$$

because the surface area of an \((n-1)\)-sphere embedded in \(\mathbb {R}^n\) is \(((2\pi ^{\frac{n}{2}}/\Gamma (\frac{n}{2})))r^{n-1}\). For any \(b>0\), there exists some \(R_b\) such that \(r^{n-1}<\mathrm {exp}(r/2b)\) for all \(r>R_b\), so

$$\begin{aligned}&D_{\mathbb {R}^n}(\mu ,b,\rho )=\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})}\Bigg (\int _0^{R_b} r^{n-1}\mathrm {exp}\bigg (-\frac{r}{b}\bigg )dr\nonumber \\&\qquad +\int _{R_b}^\infty r^{n-1}\mathrm {exp}\bigg (-\frac{r}{b}\bigg )dr\Bigg )\nonumber \\&\quad \le \frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})}\Bigg (\int _0^{R_b}R_b^{n-1}\mathrm {exp}\bigg (-\frac{0}{b}\bigg )dr\nonumber \\&\qquad +\int _{R_b}^\infty \mathrm {exp}\bigg (\frac{r}{2b}\bigg )\mathrm {exp}\bigg (-\frac{r}{b}\bigg )dr\Bigg )\nonumber \\&\quad =\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})}\Bigg (\int _0^{R_b}R_b^{n-1}dr+\int _{R_b}^\infty \mathrm {exp}\bigg (-\frac{r}{2b}\bigg )dr\Bigg )\nonumber \\&\quad =\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})}\Bigg (R_b^{n-1}r\big |_0^{R_b}-2b\cdot \mathrm {exp}\bigg (-\frac{r}{2b}\bigg )\Big |_{R_b}^\infty \Bigg )\nonumber \\&\quad =\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})}\Bigg (R_b^n+2b\cdot \mathrm {exp}\bigg (-\frac{R_b}{2b}\bigg )\Bigg )\nonumber \\&\quad <\infty , \end{aligned}$$

so the density function is well-defined for all \(b>0\).

As noted in Remark 3.1 in Cotton and Freeman (2002), the surface area of an \((n-1)\)-sphere of radius x on \(\mathbb {H}^n\) is

$$\begin{aligned} A_{\mathbb {H}^n}(x):=\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})}\mathrm {sinh}^{n-1}(x) \end{aligned}$$
(17)

for \(x\ge 0\). We explicitly calculate the value of the corresponding normalizing constant corresponding to the \(L_2\) estimator on \(\mathbb {H}^n\) in (48) from Proposition 5(a) in “Appendix A.3.2” with \(\sigma ^2\) replacing b. This expression is clearly finite for any \(\sigma ^2=b>0\). For the \(L_1\) estimator,

$$\begin{aligned}&D_{\mathbb {H}^n}(\mu ,b,\rho )=\int _{\mathbb {H}^n} \mathrm {exp}\bigg (-\frac{\rho (d(y,\mu ))}{b}\bigg )dy\nonumber \\&\quad =\int _0^\infty A_{\mathbb {H}}(r)\mathrm {exp}\bigg (-\frac{|r|}{b}\bigg )dr \nonumber \\&\quad =\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})}\int _0^\infty \mathrm {sinh}^{n-1}(r)\mathrm {exp}\bigg (-\frac{r}{b}\bigg )dr\nonumber \\&\quad =\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})}\int _0^\infty \bigg (\frac{(\mathrm {exp}(-r)-\mathrm {exp}(r))}{2}\bigg )^{n-1}\mathrm {exp}\bigg (-\frac{r}{b}\bigg )dr\nonumber \\&\quad =\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})}\int _0^\infty \frac{1}{2^{n-1}}\times \nonumber \\&\quad \bigg (\sum _{j=0}^{n-1}{n-1 \atopwithdelims ()j}(-1)^j\mathrm {exp}(-jr)\times \nonumber \\&\qquad \mathrm {exp}((n-1-j)r)\bigg )\mathrm {exp}\bigg (-\frac{r}{b}\bigg )dr\nonumber \\&\quad =\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})}\frac{1}{2^{n-1}}\bigg (\sum _{j=0}^{n-1} {n-1 \atopwithdelims ()j}(-1)^j\nonumber \\&\qquad \times \int _0^\infty \mathrm {exp}\bigg ((n-1-2j-\frac{1}{b})r\bigg )dr. \end{aligned}$$
(18)

Because \(\int _0^\infty \mathrm {exp}(cx)dx=(1/c)\mathrm {exp}(cx)|_0^\infty =-1/c\) is finite for any constant \(c<0\), the expression in (18) is finite if \(n-1-2j-1/b<0\) for all \(j=1,\ldots ,n-1\). So (8) is a well-defined density for any \(b\in (0,1/(n-1))\).

When \(\rho \) is Huber’s loss, the finiteness of \(D_M(\mu ,b,\rho )\) for some \(b>0\) on \(M=\mathbb {R}^n\) or \(\mathbb {H}^n\) easily follows from the above results and the definition of Huber’s loss as a mixture of the \(L_2\) and \(L_1\) losses. \(\square \)

1.2 A.2 Derivations for Cutoff Parameters and Efficiency of the \(L_1\) Estimator

This section expands upon Sect. 3.2, using the same notation and approximations. We make use of the beta function B(xy), the gamma function \(\Gamma (a)\), the lower incomplete gamma function \(\gamma (a,z)\), the upper incomplete gamma function \(\Gamma (a,z)\), the lower and upper regularized gamma function \(P(a,z)=\gamma (a,z)/\Gamma (z)\) and \(Q(a,z)=\Gamma (a,z)/\Gamma (a)\), respectively, and the inverses of the two regularized gamma functions \(P^{-1}(a,z)\) and \(Q^{-1}(a,z)\). We also require partial derivatives of the upper and lower incomplete gamma functions: \(\frac{\partial }{\partial a}\Gamma (a,z)=-a^{z-1}e^{-a}\) and \(\frac{\partial }{\partial a}\gamma (a,z)=-\frac{\partial }{\partial a}\Gamma (a,z)=a^{z-1}e^{-a}\), respectively. We assume \(k\ge 2\). However, as mentioned in Sect. 3.2, the formulae for \(\xi \) and the approximate AREs for the Tukey biweight and \(L_1\) estimators, including their derivatives, turn out to still be valid in the \(n=1\) case, and similarly for the Huber estimator if the second summands in (28), (32), and (34) are set to zero. The main problem when \(n=1\) in these summands is that the upper gamma function \(\Gamma (a,z)\) is undefined when \(a=0\).

1.2.1 A.2.1 Identities

Before proceeding, four identities related to integrals are derived. Recall that the density of a standard k-variate Gaussian random variable is defined as \(\phi _n=(2\pi )^{-\frac{n}{2}}\mathrm {exp}(-\frac{1}{2}\sum _{j=1}^n(y^j)^2)\). Using the spherical coordinate system, \(r^2=\sum _{j=1}^n(y^j)^2\), \(~y^1=r\mathrm {sin}(\theta _1)\cdots \mathrm {sin}(\theta _{n-2})\mathrm {sin}(\theta _{n-1})\) and \(~y^j=r\mathrm {sin}(\theta _1)\cdots \mathrm {sin}(\theta _{n-j})\mathrm {cos}(\theta _{n-j+1})\) for \(j=2,\ldots ,n\), so that \(dy=dy_1\cdots dy_n=r^{n-1}\mathrm {sin}^{n-2}(\theta _1)\cdots \mathrm {sin}(\theta _{n-2})d\theta _{n-1}\cdots d\theta _1\). Take a function \(g:\mathbb {R}^+\rightarrow \mathbb {R}\). Letting \(B_R\subset \mathbb {R}^n\) denote the k-ball centered at 0 of radius R, it follows that

$$\begin{aligned}&\int _{B_R}g(r)\phi _n(y)dy\nonumber \\&\quad = \int _0^R\int _0^\pi \cdots \int _0^\pi \int _0^{2\pi }g(r)\frac{1}{(2\pi )^{\frac{n}{2}}}\times \nonumber \\&\qquad {\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}r^{n-1}\sin ^{n-2}(\theta _1)\cdots \nonumber \\&\qquad \mathrm {sin}(\theta _{n-2})d\theta _{n-1}\cdots d\theta _1dr\nonumber \\&\quad =\frac{1}{(2\pi )^{\frac{n}{2}}}\Big (\int _0^Rg(r)r^{n-1}\mathrm {exp}\Big (-\frac{r^2}{2}\Big )dr\Big )\times \nonumber \\&\qquad \Big (\int _0^\pi \mathrm {sin}^{n-2}(\theta _1)d\theta _1\Big )\cdots \nonumber \\&\qquad \cdots \Big (\int _0^\pi \mathrm {sin}^2(\theta _{n-3})d\theta _{n-3}\Big )\Big (\int _0^\pi \mathrm {sin}(\theta _{n-2})d\theta _{n-2}\Big )\times \nonumber \\&\qquad \Big (\int _0^{2\pi }d\theta _{n-1}\Big ) \nonumber \\&\quad =\frac{1}{(2\pi )^{\frac{n}{2}}}\Big (\int _0^Rg(r)r^{n-1}\mathrm {exp}\Big (-\frac{r^2}{2}\Big )dr\Big )\times \nonumber \\&\qquad \Big (2\int _0^{\pi /2}\mathrm {sin}^{n-2}(\theta _1)d\theta _1\Big )\cdots \nonumber \\&\qquad \cdots \Big (2\int _0^{\pi /2}\mathrm {sin}^2(\theta _{n-3})d\theta _{n-3}\Big )\Big (2\!\int _0^{\pi /2}\!\!\mathrm {sin}(\theta _{n-2})d\theta _{k-2}\Big )\times \nonumber \\&\qquad \Big (4\int _0^{\pi /2}d\theta _{n-1}\Big ) \nonumber \\&\quad =\frac{1}{(2\pi )^{\frac{n}{2}}}\Big (\int _0^Rg(r)r^{n-1}\mathrm {exp}\Big (-\frac{r^2}{2}\Big )dr\Big )B\Big (\frac{n-1}{2},\frac{1}{2}\Big )\cdots \nonumber \\&\qquad B\Big (\frac{2}{2},\frac{1}{2}\Big )\cdot 2B\Big (\frac{1}{2},\frac{1}{2}\Big ) \nonumber \\&\quad =\frac{1}{(2\pi )^{\frac{n}{2}}}\Big (\int _0^Rg(r)r^{n-1}\mathrm {exp}\Big (-\frac{r^2}{2}\Big )dr\Big )\frac{\Gamma (\frac{n-1}{2})\Gamma (\frac{1}{2})}{\Gamma (\frac{k}{2})}\cdots \nonumber \\&\qquad \frac{\Gamma (\frac{2}{2})\Gamma (\frac{1}{2})}{\Gamma (\frac{3}{2})}\cdot 2\frac{\Gamma (\frac{1}{2})\Gamma (\frac{1}{2})}{\Gamma (\frac{2}{2})} \nonumber \\&\quad =\frac{1}{(2\pi )^{\frac{n}{2}}}\Big (\int _0^Rg(r)r^{n-1}\mathrm {exp}\Big (-\frac{r^2}{2}\Big )dr\Big )\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})} \nonumber \\&\quad =2^{-\frac{n}{2}}\frac{2}{\Gamma (\frac{n}{2})}\Big (\int _0^Rg(r)r^{n-1}{\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}dr\Big ) \nonumber \\&\quad =2^{-\frac{n}{2}}\cdot \frac{n}{\Gamma (\frac{n+2}{2})}\Big (\int _0^Rg(r)r^{n-1}{\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}dr\Big ), \end{aligned}$$
(19)

where \(\Gamma (1/2)=\pi ^{\frac{1}{2}},~\Gamma (1)=1\) and \(\Gamma (z+1)=z\Gamma (z)\). The next two identities are derived in similar fashion:

$$\begin{aligned}&\int _{B_R}g(r)(y^1)^2\phi _n(y)dy \nonumber \\&\quad =\int _0^R\int _0^\pi \cdots \int _0^\pi \int _0^{2\pi }g(r)(r\mathrm {sin}(\theta _1)\cdots \mathrm {sin}(\theta _{n-2})\times \nonumber \\&\qquad \mathrm {sin}(\theta _{n-1}))^2\frac{1}{(2\pi )^{\frac{n}{2}}}{\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}r^{n-1}\times \nonumber \\&\qquad \mathrm {sin}^{n-2}(\theta _1)\cdots \mathrm {sin}(\theta _{n-2})d\theta _{n-1}\cdots d\theta _1dr \nonumber \\&\quad =\int _0^R\int _0^\pi \cdots \int _0^\pi \int _0^{2\pi }g(r)\frac{1}{(2\pi )^{\frac{n}{2}}}\times \nonumber \\&\qquad {\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}r^{n+1}\mathrm {sin}^{n}(\theta _1)\cdots \nonumber \\&\qquad \mathrm {sin}^2(\theta _{n-1})d\theta _{n-1}\cdots d\theta _1dr \nonumber \\&\quad =\frac{1}{(2\pi )^{\frac{n}{2}}}\Big (\int _0^Rg(r)r^{n+1}{\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}dr\Big )\times \nonumber \\&\qquad \frac{\Gamma (\frac{n+1}{2})\Gamma (\frac{1}{2})}{\Gamma (\frac{n+2}{2})}\cdots \frac{\Gamma (\frac{4}{2})\Gamma (\frac{1}{2})}{\Gamma (\frac{5}{2})}\cdot 2\frac{\Gamma (\frac{3}{2})\Gamma (\frac{1}{2})}{\Gamma (\frac{4}{2})} \nonumber \\&\quad =2^{-\frac{n}{2}}\cdot \frac{1}{\Gamma (\frac{n+2}{2})}\Big (\int _0^Rg(r)r^{n+1}{\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}dr\Big ) \end{aligned}$$
(20)

and

$$\begin{aligned}&\int _{B_R}g(r)y^1y^2\phi _n(y)dy \nonumber \\&\quad =\int _0^R\int _0^\pi \cdots \int _0^\pi \int _0^{2\pi }g(r)(r\mathrm {sin}(\theta _1)\cdots \mathrm {sin}(\theta _{n-2})\mathrm {sin}(\theta _{n-1}))\nonumber \\&\qquad \times (r\mathrm {sin}(\theta _1)\cdots \mathrm {sin}(\theta _{n-2})\mathrm {cos}(\theta _{n-1}))\nonumber \\&\qquad \times \frac{1}{(2\pi )^{\frac{n}{2}}}{\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}r^{n-1}\mathrm {sin}^{n-2}(\theta _1)\cdots \nonumber \\&\qquad \mathrm {sin}(\theta _{n-2})d\theta _{n-1}\cdots d\theta _1dr \nonumber \\&\quad =\frac{1}{(2\pi )^{\frac{n}{2}}}\Big (\int _0^Rg(r)r^{n-1}{\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}dr\Big )\times \nonumber \\&\qquad \Big (\int _0^\pi \mathrm {sin}^{n}(\theta _1)d\theta _1\Big )\cdots \cdots \Big (\int _0^\pi \mathrm {sin}^3(\theta _{n-2})d\theta _{n-2}\Big )\times \nonumber \\&\qquad \Big (\int _0^{2\pi }\mathrm {sin}(\theta _{n-1})\mathrm {cos}(\theta _{n-1})d\theta _{n-1}\Big ) \nonumber \\&\quad =0, \end{aligned}$$
(21)

because \(\mathrm {sin}(\theta _{n-1})\mathrm {cos}(\theta _{n-1})=\mathrm {sin}(2\theta _{n-1})/2\), so the last factor is zero. The final identity uses the substitution \(r^\prime =r^2/2\) and \(dr=[(r^\prime )^{-\frac{1}{2}}/\sqrt{2}]dr^\prime \),

$$\begin{aligned} \int _0^Rr^m{\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}dr&=\int _0^{\frac{R^2}{2}}2^{\frac{m-1}{2}}(r^\prime )^{\frac{m-1}{2}}e^{-r^\prime }dr^\prime \nonumber \\&=2^{\frac{m-1}{2}}\cdot \gamma \Big (\frac{m+1}{2},\frac{R^2}{2}\Big ) \nonumber \\&=2^{\frac{m-1}{2}}\cdot \Big [\Gamma \Big (\frac{m+1}{2}\Big )\nonumber \\&\qquad -\Gamma \Big (\frac{m+1}{2},\frac{R^2}{2}\Big )\Big ]. \end{aligned}$$
(22)

1.2.2 A.2.2 Detailed Steps

The first step uses \(MAD=\mathrm {Median}(\Vert e_1\Vert ,\ldots ,\Vert e_N\Vert )\) to find a robust estimate of \(\sigma \) in (10). In the manifold case, \(e_i=\mathrm {Log}(\mathrm {Exp}(p,x_iv),y_i)\). For a random variable \(Y^*\) distributed according to \(f(y)=\phi _n(y)\), the goal is to find a factor \(\xi \) such that \(\mathrm {Pr}(\Vert Y^*\Vert <\xi )=1/2\). Letting \(g(r)=1\) in (19) and \(m=n-1\) in (22), we have

$$\begin{aligned}&\mathrm {Pr}(\Vert Y^*\Vert <\xi )=\int _{B_\xi }\phi _n(y)dy\\&\quad =2^{-\frac{n}{2}}\frac{n}{\Gamma (\frac{n+2}{2})}\Big (\int _0^\xi r^{n-1}{\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}dr\Big ) \nonumber \\&\quad =2^{-\frac{n}{2}}\frac{n}{\Gamma (\frac{n+2}{2})}\cdot 2^{\frac{n-2}{2}}\cdot \gamma \Big (\frac{n}{2},\frac{\xi ^2}{2}\Big ) \nonumber \\&\quad =2^{-1}\frac{2}{\Gamma (\frac{n}{2})}\cdot \gamma \Big (\frac{n}{2},\frac{\xi ^2}{2}\Big ) \nonumber \\&\quad =P\Big (\frac{n}{2},\frac{\xi ^2}{2}\Big )=\frac{1}{2}. \end{aligned}$$

The solution to this equation is given by (11). Finally, we obtain \(\hat{\sigma }=MAD/\xi \).

The next step finds the multiple of \(\sigma \) that gives an ARE to the sample mean of 95%, assuming a normal distribution. It requires the four identities (19), (20), (21) and (22). We take a manifold-valued random variable \(W\in M\) with intrinsic mean \(\mu _W\). If \(W^*:=\mathrm {Log}(\mu _W,W)\) has an isotropic Gaussian distribution in \(\mathbb {R}^k\), i.e., its covariance \(\Sigma _W=\sigma _W^2I_n\) is a multiple of the identity matrix, then

$$\begin{aligned} \frac{1}{\sigma _W^2}\mathrm {E}(\Vert&\mathrm {Log}(\mu _W,W)\Vert ^2)=\mathrm {E}((W^*)^T\Sigma _W^{-1}W^*)\nonumber \\&\quad =k \implies \mathrm {Var}(W)=n\sigma _W^2, \end{aligned}$$
(23)

as \((W^*)^T\Sigma _W^{-1}W^*\sim \chi ^2(n)\). Recall that \(Y_i\), \(i=1,\ldots ,N\), are distributed according to (10) and \(Y_i^*:=\mathrm {Log}(\mu ,\hat{Y})\). Let \(\bar{Y}\) be the sample intrinsic mean of \(Y_i\) and \(\hat{Y}\) be an M-type estimator. Then we define \(\bar{Y}^*=\mathrm {Log}(\mu ,\bar{Y})\) and \(\hat{Y}^*=\mathrm {Log}(\mu ,\hat{Y})\). Assuming the latter two converge in distribution to \(N(0,\sigma _1^2I_n)\) and \(N(0,\sigma _2^2I_n)\), respectively,

$$\begin{aligned} \mathrm {ARE}(\hat{Y},\bar{Y})\approx \frac{n\sigma _1^2}{n\sigma _2^2} =\frac{\sigma _1^2}{\sigma _2^2} \end{aligned}$$
(24)

by (23), so we just need to find \(\sigma _1^2\) and \(\sigma _2^2\).

The covariance matrix of an M-type estimator can be obtained using its related influence function. For a loss function \(\rho :\mathbb {R}\rightarrow \mathbb {R}\), define \(\Vert \rho \Vert :\mathbb {R}^n\rightarrow \mathbb {R}\) by \(\Vert \rho \Vert (y)=\rho (\Vert y\Vert )\). Then for differentiable \(\Vert \rho \Vert \), define \(\psi :\mathbb {R}^n\rightarrow \mathbb {R}^n\) by \(\psi (y)=\nabla \Vert \rho \Vert (y)\). Note that this coincides with the definition of \(\psi \) as \(\rho ^\prime \) in the \(n=1\) case for \(\rho \) symmetric around 0. If F is the distribution of e, and T(F), the statistical functional at F representing the M-type estimator, is the solution to \(\mathrm {E}_F[\psi (y-T(F))]=0\), then the influence function at \(y_0\in \mathbb {R}^k\) is defined as

$$\begin{aligned} IF(y_0;T,F)=\mathrm {E}\big (J_\psi (y-T(F)))^{-1}\psi (y_0-T(F)\big ), \end{aligned}$$

where \(J_\psi \) denotes the Jacobian matrix of \(\psi \). Letting \(\hat{F_N}\) represent the empirical distribution for N independent samples from F, \(T(\hat{F_N})\) is the sample M-estimator for these data points, and it is known by the central limit theorem that

$$\begin{aligned}&\!\!\!\sqrt{N}\big (T(\hat{F_{N}})-T(F)\big )\\&\qquad \Rightarrow N\Big (0,\int IF(y;T,F)IF(y;T,F)^TdF(y)\Big ). \end{aligned}$$

Taking our M-type estimator to be either Huber’s or Tukey’s estimator and F to represent the multivariate normal distribution, \(T(F)=\mu =0\) and the covariance of the sample M-type estimator \(T(\hat{F_{N}})\) is asymptotically given by

$$\begin{aligned} \Sigma _\psi =\frac{1}{N}\big (\mathrm {E}(J_\psi (y))^{-1}\big )^2\mathrm {E} \big [\psi (y)\psi (y)^T\big ]. \end{aligned}$$
(25)

The covariance of the sample mean \(\bar{Y}^*=(1/N)\sum _{i=1}^NY_i^*\) is simply

$$\begin{aligned} \frac{1}{N}\mathrm {Cov}(Y_1^*)=\frac{1}{N}I_k, \end{aligned}$$
(26)

so \(\sigma _1^2=1/N\) in (24).

(a) Huber estimator:  In the case of the Huber estimator, we have

$$\begin{aligned} \psi _H(y)= & {} {\left\{ \begin{array}{ll} y &{}\text{ if }\ \Vert y\Vert<c \\ c\cdot \frac{y}{\Vert y\Vert } &{}\text{ otherwise }, \end{array}\right. } ~~~ \text{ and } ~~~\nonumber \\ J_{\psi _H}(y)= & {} {\left\{ \begin{array}{ll} I_k &{}\text{ if }\ \Vert y\Vert <c \\ c\big (\frac{1}{\Vert y\Vert }I_n-\frac{1}{\Vert y\Vert ^3}yy^T\big ) &{}\text{ otherwise }. \end{array}\right. } \end{aligned}$$
(27)

We first consider the first matrix term in (25). Using the identity of (21), \(\mathrm {E}(J_{\psi _H}(y))_{12}=-\int _{B_c^c}\frac{1}{\Vert y\Vert ^3}(y^1)(y^2)\phi _kn(y)dy=0\). On the other hand, using the identities (19), (20), and (22),

$$\begin{aligned}&\mathrm {E}(J_{\psi _H}(y))_{11}=\int _{B_c}\phi _n(y)dy+c\int _{B_c^c}\frac{1}{\Vert y\Vert }\phi _n(y)dy\nonumber \\&\qquad -c\int _{B_c^c}\frac{1}{\Vert y\Vert ^3}(y^1)^2\phi _n(y)dy \nonumber \\&\quad =2^{-\frac{n}{2}}\cdot \frac{n}{\Gamma \big (\frac{n+2}{2}\big )}\Big (\int _0^cr^{n-1}{\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}dr\Big ) \nonumber \\&\qquad +c\cdot 2^{-\frac{n}{2}}\cdot \frac{n}{\Gamma \big (\frac{n+2}{2}\big )}\Big (\int _c^{\infty }\frac{1}{r}r^{n-1}{\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}dr\Big ) \nonumber \\&\qquad -c\cdot 2^{-\frac{n}{2}}\cdot \frac{1}{\Gamma \big (\frac{n+2}{2}\big )}\Big (\int _c^{\infty }\frac{1}{r^3}r^{n-1}{\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}dr\Big ) \nonumber \\&\quad =\frac{1}{\Gamma \big (\frac{n+2}{2}\big )}\Bigg \{2^{-\frac{n}{2}}\cdot n\cdot 2^{\frac{n-2}{2}}\cdot \gamma \Big (\frac{n}{2},\frac{c^2}{2}\Big )\nonumber \\&\qquad +c\cdot 2^{-\frac{n}{2}}\cdot n\cdot 2^{\frac{n-3}{2}}\cdot \Big [\Gamma \Big (\frac{n-1}{2}\Big ) \nonumber \\&\qquad -\gamma \Big (\frac{n-1}{2},\frac{c^2}{2}\Big )\Big ]-c\cdot 2^{-\frac{n}{2}}\cdot 2^{\frac{n-3}{2}}\nonumber \\&\qquad \cdot \Big [\Gamma \Big (\frac{n-1}{2}\Big )-\gamma \Big (\frac{n-1}{2},\frac{c^2}{2}\Big )\Big ]\Bigg \} \nonumber \\&\quad =\frac{1}{\Gamma \big (\frac{n+2}{2}\big )}\Bigg \{\frac{n}{2}\gamma \Big (\frac{n}{2},\frac{c^2}{2}\Big )\!+\!2^{-\frac{3}{2}}c(n-1)\Gamma \Big (\frac{n-1}{2},\frac{c^2}{2}\Big )\Bigg \}. \end{aligned}$$
(28)

By symmetry, \(\mathrm {E}(J_{\psi _H}(y))_{jj}=\mathrm {E}(J_{\psi _H}(y))_{11}\) for \(j=1,\ldots ,n\), and \(\mathrm {E}(J_{\psi _H}(y))_{lj}=\mathrm {E}(J_{\psi _H}(y))_{12}\) for all \(j,~l=1,\ldots ,n\), \(l\ne j\), so the covariance of the sample mean is a scalar multiple of the identity matrix; namely, \(\mathrm {E}(J_{\psi _H}(y))\) is \(I_n\) multiplied by the result of (28).

We now consider the second matrix term in (25). The non-diagonal terms can again be shown to be zero using identity (21) and symmetry, and the diagonal terms can be shown to be equal by symmetry. Then with \(\psi _H=(\psi _H^1,\ldots ,\psi _H^n)\) in (27), it follows that

$$\begin{aligned}&\mathrm {E}[\psi _H(y)\psi _H(y)^T]_{11}=\mathrm {E}[(\psi _H^1(y))^2] \nonumber \\&\quad =\int _{B_c}(y^1)^2\phi _n(y)dy+c^2\int _{B_c^c}\frac{1}{\Vert y\Vert ^2}(y^1)^2\phi _n(y)dy \nonumber \\&\quad =\frac{2^{-\frac{n}{2}}}{\Gamma \big (\frac{n+2}{2}\big )}\Big (\int _0^cr^{n+1}{\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}dr\Big )\nonumber \\&\qquad +c^2\cdot \frac{2^{-\frac{n}{2}}}{\Gamma \big (\frac{n+2}{2}\big )}\Big (\int _c^{\infty }r^{kn-1}{\mathrm {exp}\Big (-\frac{r^2}{2}\Big )}dr\Big ) \nonumber \\&\quad =\frac{1}{\Gamma \big (\frac{n+2}{2}\big )}\Bigg \{\gamma \Big (\frac{n+2}{2},\frac{c^2}{2}\Big )+\frac{c^2}{2} \Gamma \Big (\frac{n}{2},\frac{c^2}{2}\Big )\Bigg \}, \end{aligned}$$
(29)

using (20) and (22). Thus, the matrix \(\mathrm {E}[\psi _H(y)\psi _H(y)^T]\) is the above expression multiplied by \(I_n\), and the variance \(\Sigma _\psi \) in (25) can be calculated using (28) and (29),

$$\begin{aligned} \Sigma _{\psi _H}=\frac{\mathrm {E}[\psi _H(y)\psi _H(y)^T]_{11}}{N(\mathrm {E} (J_{\psi _H}(y))_{11})^2} \cdot I_n, \end{aligned}$$
(30)

giving \(\sigma _2^2\) in (24). Hence, from (24), (26), (28), (29), and (30), the approximate ARE to the sample mean is given by (12)

$$\begin{aligned} \text{ ARE}_{H,L_2}(c,n)\approx A_H(c,n):=\frac{H_1^2}{\Gamma \big (\frac{n+2}{2}\big )H_2}, \end{aligned}$$
(31)

where

$$\begin{aligned} H_1= & {} \Gamma \Big (\frac{n+2}{2}\Big )\mathrm {E}(J_{\psi _H}(y))_{11}=\frac{n}{2}\gamma \Big (\frac{n}{2},\frac{c^2}{2}\Big )\nonumber \\&+2^{-\frac{3}{2}}c(n-1)\Gamma \Big (\frac{n-1}{2},\frac{c^2}{2}\Big ), \end{aligned}$$
(32)
$$\begin{aligned} H_2= & {} \Gamma \Big (\frac{n+2}{2}\Big )\mathrm {E}[\psi _H(y)\psi _H(y)^T]_{11}=\gamma \Big (\frac{n+2}{2},\frac{c^2}{2}\Big )\nonumber \\&+\frac{c^2}{2} \Gamma \Big (\frac{n}{2},\frac{c^2}{2}\Big ). \end{aligned}$$
(33)

Lastly, we apply the Newton-Raphson method to find the value of c for which the ARE is approximately 95%, that is, the solution in c to the equation \(A_H(c,n)-0.95=0\). This requires the partial derivative of \(A_H(c,n)\) with respect to c,

$$\begin{aligned} \frac{\partial }{\partial c}A_H(c,n)=\frac{2H_1H_3H_2-H_1^2H_4}{\Gamma \big (\frac{n+2}{2}\big )H_2^2}, \end{aligned}$$

where \(H_1\) and \(H_2\) are as above and

$$\begin{aligned} H_3= & {} \frac{\partial }{\partial c}H_1\nonumber \\= & {} \frac{cn}{2}\Big (\frac{c^2}{2}\Big )^{\frac{n-2}{2}}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )}\nonumber \\&\quad +2^{-\frac{3}{2}}(n-1)\Gamma \Big (\frac{n-1}{2},\frac{c^2}{2}\Big )\nonumber \\&\quad -2^{-\frac{3}{2}}c^2(n-1)\Big (\frac{c^2}{2}\Big )^{\frac{n-3}{2}}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )} \nonumber \\= & {} 2^{-\frac{n}{2}}c^{k-1}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )}+2^{-\frac{3}{2}}(n-1)\Gamma \Big (\frac{n-1}{2},\frac{c^2}{2}\Big ), \nonumber \\ \end{aligned}$$
(34)
$$\begin{aligned} H_4= & {} \frac{\partial }{\partial c}H_2 \nonumber \\= & {} c\Big (\frac{c^2}{2}\Big )^{\frac{n}{2}}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )}\nonumber \\&\quad +c\Gamma \Big (\frac{n}{2},\frac{c^2}{2}\Big )-c\Big (\frac{c^2}{2}\Big )\Big (\frac{c^2}{2}\Big )^{\frac{n-2}{2}}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )} \nonumber \\= & {} c\Gamma \Big (\frac{n}{2},\frac{c^2}{2}\Big ). \end{aligned}$$
(35)

(b) Tukey biweight estimator:  For this estimator, it is easy to show that

$$\begin{aligned} \psi _B(y)={\left\{ \begin{array}{ll} \Big [1-\big (\frac{\Vert y\Vert }{c}\big )^2\Big ]^2\cdot y &{}~~\text{ if }\; \Vert y\Vert <c\\ 0 &{}~~ \text{ otherwise }, \end{array}\right. } \end{aligned}$$

and

$$\begin{aligned} J_{\psi _B}(y)={\left\{ \begin{array}{ll} \Big [1-\big (\frac{\Vert y\Vert ^2}{c^2}\big )^2\Big ]^2I_n\\ \qquad -\frac{4}{c^2}\Big [1-\big (\frac{\Vert y\Vert ^2}{c^2}\big )^2\Big ]yy^T &{}~~\text{ if }\;\Vert y\Vert <c\\ 0 &{}~~ \text{ otherwise }. \end{array}\right. } \end{aligned}$$

By similar arguments to the ones used for the Huber estimator, we have \(\mathrm {E}(J_{\psi _B}(y))_{12}=0\), \(\mathrm {E}[\psi _H(y)\psi _H(y)^T]_{12}=0\),

$$\begin{aligned}&\mathrm {E}(J_{\psi _H}(y))_{11}=\frac{1}{\Gamma \big (\frac{n+2}{2}\big )}\Bigg \{\frac{2(n+4)}{c^4}\gamma \Big (\frac{n+4}{2},\frac{c^2}{2}\Big )\nonumber \\&\quad -\frac{2(n+2)}{c^2}\gamma \Big (\frac{n+2}{2},\frac{c^2}{2}\Big )+\frac{n}{2}\gamma \Big (\frac{n}{2},\frac{c^2}{2}\Big )\Bigg \}, \end{aligned}$$
(36)
$$\begin{aligned}&\mathrm {E}[\psi _H(y)\psi _H(y)^T]_{11}\nonumber \\&\quad =\frac{1}{\Gamma \big (\frac{n+2}{2}\big )}\Bigg \{\gamma \Big (\frac{n+2}{2},\frac{c^2}{2}\Big )-\frac{8}{c^2}\gamma \Big (\frac{n+4}{2},\frac{c^2}{2}\Big )\nonumber \\&\qquad +\frac{24}{c^4}\gamma \Big (\frac{n+6}{2},\frac{c^2}{2}\Big )-\frac{32}{c^6}\gamma \Big (\frac{n+8}{2},\frac{c^2}{2}\Big )\nonumber \\&\qquad +\frac{16}{c^8}\gamma \Big (\frac{n+10}{2},\frac{c^2}{2}\Big )\Bigg \}. \end{aligned}$$
(37)

Thus, the variance \(\Sigma _\psi \) in (25) can be calculated using (36) and (37),

$$\begin{aligned} \Sigma _{\psi _B}=\frac{\mathrm {E}[\psi _B(y)\psi _B(y)^T]_{11}}{N(\mathrm {E}(J_{\psi _B}(y))_{11})^2}\cdot I_n. \end{aligned}$$
(38)

giving \(\sigma _2^2\) in (24). Therefore, from (24), (26), (36), (37), and (38), the approximate ARE to the sample mean is given by (13),

$$\begin{aligned} \text{ ARE}_{T,L_2}(c,n)\approx A_T(c,n):=\frac{T_1^2}{\Gamma \big (\frac{n+2}{2}\big )T_2}, \end{aligned}$$

where

$$\begin{aligned} T_1= & {} \Gamma \Big (\frac{n+2}{2}\Big )\mathrm {E}(J_{\psi _T}(y))_{11}=\frac{2(n+4)}{c^4}\gamma \Big (\frac{n+4}{2},\frac{c^2}{2}\Big )\\&\quad -\frac{2(n+2)}{c^2}\gamma \Big (\frac{n+2}{2},\frac{c^2}{2}\Big ) \\&\quad +\frac{n}{2}\gamma \Big (\frac{n}{2},\frac{c^2}{2}\Big ), \\ T_2= & {} \Gamma \Big (\frac{n+2}{2}\Big )\mathrm {E}[\psi _T(y)\psi _H(y)^T]_{11}=\gamma \Big (\frac{n+2}{2},\frac{c^2}{2}\Big )\\&\quad -\frac{8}{c^2}\gamma \Big (\frac{n+4}{2},\frac{c^2}{2}\Big )+\frac{24}{c^4}\gamma \Big (\frac{n+6}{2},\frac{c^2}{2}\Big ) \\&\quad -\frac{32}{c^6}\gamma \Big (\frac{n+8}{2},\frac{c^2}{2}\Big )+\frac{16}{c^8}\gamma \Big (\frac{n+10}{2},\frac{c^2}{2}\Big ). \end{aligned}$$

We solve for the root of the function \(A_T(c,n)-0.95\) by utilizing \(\frac{\partial }{\partial c}A_T(c,n)\) in the Newton-Raphson method,

$$\begin{aligned} \frac{\partial }{\partial c}A_T(c,n)=\frac{2T_1T_3T_2-T_1^2T_4}{\Gamma \big (\frac{n+2}{2}\big )T_2^2}, \end{aligned}$$

where \(T_1\) and \(T_2\) are as above and

$$\begin{aligned} T_3= & {} \frac{\partial }{\partial c}T_1 \\= & {} -\frac{8(n+4)}{c^5}\gamma \Big (\frac{n+4}{2},\frac{c^2}{2}\Big )+\frac{2(n+2)}{c^3}\Big (\frac{c^2}{2}\Big )^{\frac{n+2}{2}}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )}\\&+\frac{4(n+2)}{c^3}\gamma \Big (\frac{n+2}{2},\frac{c^2}{2}\Big )-\frac{2(n+2)}{c}\Big (\frac{c^2}{2}\Big )^{\frac{n}{2}}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )}\\&+\frac{cn}{2}\Big (\frac{c^2}{2}\Big )^{\frac{n-2}{2}}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )} \\= & {} -\frac{8(n+4)}{c^5}\gamma \Big (\frac{n+4}{2},\frac{c^2}{2}\Big )+\frac{4(n+2)}{c^3}\gamma \Big (\frac{n+2}{2},\frac{c^2}{2}\Big )\\&-2^{-\frac{n-2}{2}}c^{n-1}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )}, \\ T_4= & {} \frac{\partial }{\partial c}T_2 \\= & {} c\Big (\frac{c^2}{2}\Big )^{\frac{n}{2}}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )}+\frac{16}{c^3}\gamma \Big (\frac{n+4}{2},\frac{c^2}{2}\Big )\\&-\frac{8}{c}\Big (\frac{c^2}{2}\Big )^{\frac{n+2}{2}}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )} \\&-\frac{96}{c^5}\gamma \Big (\frac{n+6}{2},\frac{c^2}{2}\Big )d+\frac{24}{c^3}\Big (\frac{c^2}{2}\Big )^{\frac{n+4}{2}}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )}\\&+\frac{192}{c^7}\gamma \Big (\frac{n+8}{2},\frac{c^2}{2}\Big ) \\&-\frac{32}{c^5}\Big (\frac{c^2}{2}\Big )^{\frac{n+6}{2}}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )}-\frac{128}{c^9}\gamma \Big (\frac{n+10}{2},\frac{c^2}{2}\Big )\\&+\frac{16}{c^7}\Big (\frac{c^2}{2}\Big )^{\frac{n+8}{2}}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )} \\= & {} \frac{16}{c^3}\gamma \Big (\frac{n+4}{2},\frac{c^2}{2}\Big )-\frac{96}{c^5}\gamma \Big (\frac{n+6}{2},\frac{c^2}{2}\Big )\\&+\frac{192}{c^7}\gamma \Big (\frac{n+8}{2},\frac{c^2}{2}\Big ) -\frac{128}{c^9}\gamma \Big (\frac{n+10}{2},\frac{c^2}{2}\Big ). \end{aligned}$$

1.2.3 A.2.3 Proof of Proposition 3

Proof of Proposition 3(a)

Using (31), (32), (33), (34), and (35), and two applications of L’Hôpital’s rule, we obtain

$$\begin{aligned} \lim _{c\rightarrow 0}A_H(c,n)&=\lim _{c\rightarrow 0}\frac{H_1^2}{\Gamma \big (\frac{n+2}{2}\big )H_2} =\lim _{c\rightarrow 0}\frac{2H_1H_3}{\Gamma \big (\frac{n+2}{2}\big )H_4} =\lim _{c\rightarrow 0}\frac{2H_3^2+2H_1\frac{\partial }{\partial c}H_3}{\Gamma \big (\frac{n+2}{2}\big )\frac{\partial }{\partial c}H_4} \\&=\lim _{c\rightarrow 0}\frac{2\Big \{2^{-\frac{n}{2}}c^{n-1}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )}+2^{-\frac{3}{2}}(n-1)\Gamma \big (\frac{n-1}{2},\frac{c^2}{2}\big )\Big \}^2-2H_12^{-\frac{n}{2}}c^k{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )}}{\Gamma \big (\frac{n+2}{2}\big )\Big \{\Gamma \big (\frac{n}{2},\frac{c^2}{2}\big )-2^{-\frac{n-2}{2}}c^{n-1}{\mathrm {exp}\Big (-\frac{c^2}{2}\Big )}\Big \}} \\&=\frac{2\Big \{2^{-\frac{3}{2}}(n-1)\Gamma \big (\frac{n-1}{2}\big )\Big \}^2}{\Gamma \big (\frac{n+2}{2}\big )\Gamma \big (\frac{n}{2}\big )}=\frac{\big (\frac{n-1}{2}\big )^2\Gamma ^2\big (\frac{n-1}{2}\big )}{\Gamma \big (\frac{n}{2}\big )\Gamma \big (\frac{n+2}{2}\big )}=\frac{\Gamma ^2\big (\frac{n+1}{2}\big )}{\Gamma \big (\frac{n}{2}\big )\Gamma \big (\frac{n+2}{2}\big )}. \end{aligned}$$

\(\square \)

Lemma 1

It follows that

$$\begin{aligned} \frac{\Gamma ^2\big (\frac{n+1}{2}\big )}{\Gamma \big (\frac{n}{2}\big )\Gamma \big (\frac{n+2}{2}\big )}<\sqrt{\frac{n}{n+1}}. \end{aligned}$$

Proof

Theorem 3 in Mortici (2012) states that, for \(x\ge 1\),

$$\begin{aligned} \frac{1}{\sqrt{x\bigg (1+\frac{1}{4x-\frac{1}{2}+\frac{3}{16x+\frac{15}{4x}}}\bigg )}}< & {} \frac{\Gamma \big (x+\frac{1}{2}\big )}{\Gamma \big (x+1\big )}\nonumber \\< & {} \frac{1}{\sqrt{x\bigg (1+ \frac{1}{4x-\frac{1}{2}+\frac{3}{16x}}\bigg )}}. \end{aligned}$$
(39)

Because \(x\ge 1\), it follows that \(4x-\frac{1}{2}+\frac{3}{16x}\le 4x-\frac{1}{2}+\frac{3}{16}=4x-\frac{5}{16}<4x\), so we have

$$\begin{aligned} \frac{1}{\sqrt{x\bigg (1+\frac{1}{4x-\frac{1}{2}+\frac{3}{16x}}\bigg )}}<\frac{1}{\sqrt{x\Big (1+\frac{1}{4x}\Big )}}=\frac{1}{\sqrt{x+\frac{1}{4}}}. \end{aligned}$$
(40)

Therefore, using (14) and letting \(x=\frac{n}{2}\ge 1\) in (39) and (40), we obtain

$$\begin{aligned}&\frac{\Gamma ^2\big (\frac{n+1}{2}\big )}{\Gamma \big (\frac{n}{2}\big )\Gamma \big (\frac{n+2}{2}\big )}=\frac{\Gamma ^2\big (\frac{n+1}{2}\big )}{\frac{2}{n}\Gamma \big (\frac{n+2}{2}\big )\Gamma \big (\frac{n+2}{2}\big )}=\frac{n}{2} \Bigg (\frac{\Gamma \big (\frac{n+1}{2}\big )}{\Gamma \big (\frac{n+2}{2}\big )}\Bigg ) ^2<\frac{\frac{n}{2}}{\frac{n}{2}+\frac{1}{4}}\\&=\frac{2n}{2n+1}. \end{aligned}$$

Now, for \(n\ge 2\), it follows that \(4n^3+4n^2<4n^3+4n^2+n\). Thus, we have

$$\begin{aligned} \frac{2n}{2n+1}<\sqrt{\frac{n}{n+1}}, \end{aligned}$$

which completes the proof. \(\square \)

Proof of Proposition 3(b)

By Lemma 1 and (14), we have

$$\begin{aligned}&A_{L_1}(n+1)=\frac{\Gamma ^2\big (\frac{n+2}{2}\big )}{\Gamma \big (\frac{n+1}{2}\big )\Gamma \big (\frac{n+3}{2}\big )} =\frac{\frac{n}{2}\Gamma \big (\frac{n}{2}\big )\Gamma \big (\frac{n+2}{2}\big )}{\Gamma \big (\frac{n+1}{2}\big )\cdot \frac{n+1}{2}\Gamma \big (\frac{n+1}{2}\big )}\\&\quad =\frac{n}{n+1}\frac{1}{A_{L_1}(n)} \\&\quad>\frac{n}{n+1}\sqrt{\frac{n+1}{n}}=\sqrt{\frac{n}{n+1}} \\&\quad >A_{L_1}(n) \end{aligned}$$

for \(n\ge 2\). \(\square \)

Proof of Proposition 3(c)

We again use (39). Because \(x\ge 1>0\), it follows that \(4x-\frac{1}{2}+\frac{3}{16x+\frac{15}{4x}}>4x-\frac{1}{2}>3x\), and so we have

$$\begin{aligned} \frac{1}{\sqrt{x\bigg (1+\frac{1}{4x-\frac{1}{2}+\frac{3}{16x+\frac{15}{4x}}} \bigg )}}>\frac{1}{\sqrt{x\Big (1+\frac{1}{3x}\Big )}}=\frac{1}{\sqrt{x+\frac{1}{3}}}. \end{aligned}$$
(41)

Combining (39), (40), and (41), we obtain

$$\begin{aligned} \frac{1}{\sqrt{x+\frac{1}{3}}}<\frac{\Gamma \big (x+\frac{1}{2}\big )}{\Gamma \big (x+1\big )}<\frac{1}{\sqrt{x+\frac{1}{4}}} \end{aligned}$$
(42)

for \(x\ge 1\). Taking the reciprocal of (42) and replacing x with \(x-\frac{1}{2}\) gives

$$\begin{aligned} \sqrt{(x-\frac{1}{2})+\frac{1}{4}}<\frac{\Gamma \big ((x-\frac{1}{2})+1\big )}{\Gamma \big ((x-\frac{1}{2})+\frac{1}{2}\big )}<\sqrt{(x-\frac{1}{2})+\frac{1}{3}} \end{aligned}$$

or

$$\begin{aligned} \sqrt{x+\frac{1}{2}}<\frac{\Gamma \big (x+\frac{1}{2}\big )}{\Gamma \big (x\big )} <\sqrt{x-\frac{1}{6}} \end{aligned}$$
(43)

for \(x-\frac{1}{2}\ge 1\), or \(x\ge \frac{3}{2}\). Then multiplying (42) and (43) gives

$$\begin{aligned} \sqrt{\frac{x+\frac{1}{2}}{x+\frac{1}{3}}}<\frac{\Gamma ^2\big (x+\frac{1}{2}\big )}{\Gamma \big (x\big )\Gamma \big (x+1\big )}<\sqrt{\frac{x-\frac{1}{6}}{x+\frac{1}{4}}} \end{aligned}$$
(44)

for \(x\ge \frac{3}{2}\). The limits as \(x\rightarrow \infty \) of the left- and right-hand expressions in (44) are both 1, and letting \(x=\frac{n}{2}\), the central expression is (14), completing the proof. \(\square \)

1.3 A.3 Riemannian Normal Distribution on \(S^n\) and \(\mathbb {H}^n\)

Here we derive the normalizing constant for the Riemannian normal distribution on \(S^n\) and \(\mathbb {H}^n\), which leads to a full description of the Riemannian normal density on those manifolds. We further describe how to randomly generate points from this distribution.

1.3.1 A.3.1 Riemannian Normal Distribution on \(S^n\)

For \(0\le R\le \pi \) and integer \(m\ge 0\), define

$$\begin{aligned}&G_{m, \sigma ^2}(R) \nonumber \\&\quad :=\bigg (\frac{i}{2}\bigg )^m\sqrt{\frac{\pi \sigma ^2}{2}}\sum _{j=0}^{m} {m \atopwithdelims ()j}(-1)^j\times \nonumber \\&\qquad \mathrm {exp}\Big (-\frac{(m-2j)^2\sigma ^2}{2}\Big )\mathrm {erf}\Big (\frac{R}{\sqrt{2\sigma ^2}}\nonumber \\&\quad +\sqrt{\frac{\sigma ^2}{2}}\big (m-2j\big )i\Big ), \end{aligned}$$
(45)

where \(i=\sqrt{-1}\) and \(\mathrm {erf}(z):=(2/\sqrt{\pi })\int _0^z e^{-t^2} dt\) is the error function for complex z.

Proposition 4

(a) When \(M=S^n\), the normalizing constant in (10) is given by

$$\begin{aligned} C_{S^n}(\mu ,\sigma ^2)=\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})} \big (G_{n-1,\sigma ^2}(\pi )-G_{n-1,\sigma ^2}(R)\big ). \end{aligned}$$

Substituting this normalizing constant into (10) gives the Riemannian normal density on \(S^n\). (b) The distribution function of \(d(y,\mu )\) is

$$\begin{aligned} F_{S^n}(R)&:&=\mathrm {Pr}(d(y,\mu )\le R)\nonumber \\= & {} \left\{ \begin{array}{ll} 0 &{}~~ \mathrm{if}\ R<0\\ \frac{G_{n-1,\sigma ^2}(R)-G_{n-1,\sigma ^2}(0)}{G_{n-1,\sigma ^2}(\pi )-G_{n-1,\sigma ^2}(0)} &{}~~ \mathrm{if}\ 0\le R\le \pi \\ 1 &{}~~ \mathrm{if}\ R>\pi . \end{array} \right. \end{aligned}$$
(46)

Lemma 2

$$\begin{aligned} G_{m, \sigma ^2}(R)-G_{m, \sigma ^2}(0)=\int _0^R \mathrm {sin}^m(r) \mathrm {exp}\bigg (-\frac{r^2}{2\sigma ^2}\bigg )dr \end{aligned}$$

Proof

$$\begin{aligned}&\frac{d}{dr} \Bigg (\bigg (\frac{i}{2}\bigg )^m\sqrt{\frac{\pi \sigma ^2}{2}} \sum _{j=0}^{m} {m \atopwithdelims ()j}(-1)^j\times \nonumber \\&\quad \mathrm {exp}\Big (-\frac{(m-2j)^2\sigma ^2}{2}\Big )\mathrm {erf}\Big (\frac{r}{\sqrt{2\sigma ^2}}+\sqrt{\frac{\sigma ^2}{2}}\big (m-2j\big )i\Big )\Bigg ) \nonumber \\&\quad = \bigg (\frac{i}{2}\bigg )^m\sqrt{\frac{\pi \sigma ^2}{2}} \sum _{j=0}^{m} {m \atopwithdelims ()j}(-1)^j\times \nonumber \\&\quad \mathrm {exp}\Big (-\frac{(m-2j)^2\sigma ^2}{2}\Big ) \frac{d}{dr} \Bigg (\mathrm {erf}\Big (\frac{r}{\sqrt{2\sigma ^2}}+\sqrt{\frac{\sigma ^2}{2}}\big (m-2j\big )i\Big )\Bigg )\nonumber \\&\quad = \bigg (\frac{i}{2}\bigg )^m\sqrt{\frac{\pi \sigma ^2}{2}} \sum _{j=0}^{m}{m \atopwithdelims ()j}(-1)^j\times \nonumber \\&\quad \mathrm {exp}\Big (2\sigma ^2mj-2\sigma ^2j^2-\frac{\sigma ^2 m^2}{2}\Big ) \frac{1}{\sqrt{2\sigma ^2}}\frac{2}{\sqrt{\pi }}\mathrm {exp}\Big (-\Big (\frac{r}{\sqrt{2\sigma ^2}}\nonumber \\&\qquad +\sqrt{\frac{\sigma ^2}{2}}\big (m-2j\big )i\Big )^2\Big )\nonumber \\&\quad = \bigg (\frac{i}{2}\bigg )^m\sum _{j=0}^{m} {m \atopwithdelims ()j}(-1)^j \mathrm {exp}\Big (2\sigma ^2mj-2\sigma ^2j^2\nonumber \\&\qquad -\frac{\sigma ^2 m^2}{2}-\bigg (\frac{r^2}{2\sigma ^2} + r(m-2j)i\nonumber \\&\qquad -2\sigma ^2\big (\frac{m^2}{4}-mj+j^2\big )\bigg )\Big )\nonumber \\&\quad = \bigg (\frac{i}{2}\bigg )^m\sum _{j=0}^{m} {m \atopwithdelims ()j}(-1)^j \mathrm {exp}\Big (2\sigma ^2mj-2\sigma ^2j^2-\frac{\sigma ^2 m^2}{2}\nonumber \\&\quad -\frac{r^2}{2\sigma ^2}-r(m-2j)i+\frac{\sigma ^2m^2}{2}\nonumber \\&\qquad -2\sigma ^2mj+2\sigma ^2j^2\Big )\nonumber \\&\quad = \bigg (\frac{i}{2}\bigg )^m\sum _{j=0}^{m} {m \atopwithdelims ()j}(-1)^j \mathrm {exp}\Big (-\frac{r^2}{2\sigma ^2}-r(m-2j)i\Big )\nonumber \\&\quad = \bigg (\frac{i}{2}\bigg )^m \mathrm {exp}\Big (-\frac{r^2}{2\sigma ^2}\Big )\mathrm {exp}(mri)\times \nonumber \\&\qquad \sum _{j=0}^{m} {m \atopwithdelims ()j}(-1)^j \mathrm {exp}(-r(2m-2j)i)\nonumber \\&\quad = \mathrm {exp}\Big (-\frac{r^2}{2\sigma ^2}\Big )\bigg (\frac{\mathrm {exp}(ri)i}{2}\bigg )^m\times \nonumber \\&\qquad \sum _{j=0}^{m} {m \atopwithdelims ()j}(-1)^j \big (\mathrm {exp}(-2ri)\big )^{m-j}\nonumber \\&\quad = \mathrm {exp}\Big (-\frac{r^2}{2\sigma ^2}\Big )\bigg (\frac{\mathrm {exp}(ri)i}{2}\bigg )^m \big (\mathrm {exp}(-2ri)-1\big )^m\nonumber \\&\quad = \mathrm {exp}\Big (-\frac{r^2}{2\sigma ^2}\Big )\bigg (\frac{i(\mathrm {exp}(-ri)-\mathrm {exp}(ri))}{2}\bigg )^m\nonumber \\&\quad = \mathrm {exp}\Big (-\frac{r^2}{2\sigma ^2}\Big )\bigg (\frac{i(\mathrm {cos}(r)-i\mathrm {sin}(r)-\mathrm {cos}(r)-i\mathrm {sin}(r))}{2}\bigg )^m\nonumber \\&\quad = \mathrm {sin}^m(r)\mathrm {exp}\Big (-\frac{r^2}{2\sigma ^2}\Big ) \end{aligned}$$

\(\square \)

Lemma 2 implies that despite the presence of imaginary terms in (45), \(G_{m, \sigma ^2}(R)\) is always real.

Proof of Proposition 4(a)

As noted in Remark 3.1 in Cotton and Freeman (2002), the surface area of an \((n-1)\)-sphere of radius x on \(S^n\) is

$$\begin{aligned} A_{S^n}(x):=\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})}\mathrm {sin}^{n-1}(x) \end{aligned}$$

for \(0\le x\le \pi \). Then

$$\begin{aligned} C_{S^n}(\mu ,\sigma ^2)&=\int _{S^n} \mathrm {exp}\bigg (-\frac{d(y,\mu )^2}{2\sigma ^2}\bigg )dy\\&=\int _0^\pi A_{S^n}(r)\mathrm {exp}\bigg (-\frac{r^2}{2\sigma ^2}\bigg )dr \nonumber \\&=\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})}\int _0^\pi \mathrm {sin}^{n-1}(r)\mathrm {exp}\bigg (-\frac{r^2}{2\sigma ^2}\bigg )dr\\&=\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})} \big (G_{n-1,\sigma ^2}(\pi )-G_{n-1,\sigma ^2}(R)\big ), \end{aligned}$$

where the last equality comes from setting \(m=n-1\) in Lemma 2. \(\square \)

Proof of Proposition 4(b)

In a similar vein to the above, it is clear that the distribution function of \(d(y,\mu )\) is

$$\begin{aligned} F_{S^n}(R):= & {} \mathrm {Pr}(d(y,\mu )\le R)\\= & {} \frac{\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})} \big (G_{n-1,\sigma ^2}(R)-G_{n-1,\sigma ^2}(0)\big )}{\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})} \big (G_{k-1,\sigma ^2}(\pi )-G_{n-1,\sigma ^2}(0)\big )}\\= & {} \frac{G_{n-1,\sigma ^2}(R)-G_{n-1,\sigma ^2}(0)}{G_{n-1,\sigma ^2}(\pi )-G_{n-1,\sigma ^2}(0)}, \end{aligned}$$

for \(R\in [0,\pi ]\). \(\square \)

1.3.2 A.3.2 Riemannian Normal Distribution on \(\mathbb {H}^n\)

For \(R\ge 0\) and integer \(m\ge 0\), define

$$\begin{aligned}&H_{m, \sigma ^2}(R):=\frac{1}{2^m}\sqrt{\frac{\pi \sigma ^2}{2}}\sum _{j=0}^{m} {m \atopwithdelims ()j}(-1)^j\mathrm {exp}\Big (\frac{(m-2j)^2\sigma ^2}{2}\Big )\times \nonumber \\&\mathrm {erf}\Big (\frac{R}{\sqrt{2\sigma ^2}}-\sqrt{\frac{\sigma ^2}{2}}\big (m-2j\big )\Big ), \end{aligned}$$
(47)

with i and \(\mathrm {erf}\) defined as before.

Proposition 5

  1. (a)

    When \(M=\mathbb {H}^n\), the normalizing constant in (10) is given by

    $$\begin{aligned} C_{\mathbb {H}^n}(\mu ,\sigma ^2)=\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})} \big (\lim _{R^\prime \rightarrow \infty } H_{n-1,\sigma ^2}(R^\prime )-H_{n-1,\sigma ^2}(0)\big ), \end{aligned}$$
    (48)

    where

    $$\begin{aligned} \lim _{R\rightarrow \infty }&H_{n-1,\sigma ^2}(R)\nonumber \\&\quad =\frac{1}{2^{n-1}}\sqrt{\frac{\pi \sigma ^2}{2}}\sum _{j=0}^{n-1} {n-1 \atopwithdelims ()j}(-1)^j\mathrm {exp}\Big (\frac{(n-1-2j)^2\sigma ^2}{2}\Big ).\nonumber \!\!\!\!\!\\ \end{aligned}$$
    (49)

    Substituting this normalizing constant into (10) gives the Riemannian normal density on \(\mathbb {H}^n\).

  2. (b)

    The distribution function of \(d(y,\mu )\) is

    $$\begin{aligned} F_{\mathbb {H}^n}(R):= & {} \mathrm {Pr}(d(y,\mu )\le R)\nonumber \\= & {} \left\{ \begin{array}{ll} 0 &{}~~ \text{ if } R<0\\ \frac{H_{n-1,\sigma ^2}(R)-H_{n-1,\sigma ^2}(0)}{\lim _{R^\prime \rightarrow \infty } H_{n-1,\sigma ^2}(R^\prime )-H_{n-1,\sigma ^2}(0)} &{}~~ \text{ if } R\ge 0. \end{array} \right. \nonumber \\ \end{aligned}$$
    (50)

Lemma 3

$$\begin{aligned} \int _0^R \!\mathrm {exp}(-ar^2+br)dr \!= & {} \!\sqrt{\frac{\pi }{4a}} \mathrm {exp}\Big (\frac{b^2}{4a}\Big )\mathrm {erf}\Big (\sqrt{a}r-\frac{b}{2\sqrt{a}}\Big ), \end{aligned}$$

for \(a>0\), \(b\in \mathbb {R}\).

Proof

$$\begin{aligned}&\frac{d}{dr} \Bigg (\sqrt{\frac{\pi }{4a}}\mathrm {exp}\Big (\frac{b^2}{4a}\Big )\mathrm {erf}\Big (\sqrt{a}r-\frac{b}{2\sqrt{a}}\Big )\Bigg ) \nonumber \\&\quad = \sqrt{\frac{\pi }{4a}}\mathrm {exp}\Big (\frac{b^2}{4a}\Big )\frac{d}{dr}\Bigg (\mathrm {erf}\Big (\sqrt{a}r-\frac{b}{2\sqrt{a}}\Big )\Bigg ) \nonumber \\&\quad = \sqrt{\frac{\pi }{4a}}\mathrm {exp}\Big (\frac{b^2}{4a}\Big )\sqrt{a}\frac{2}{\sqrt{\pi }}\mathrm {exp}\Big (-\Big (\sqrt{a}r-\frac{b}{2\sqrt{a}}\Big )^2\Big ) \nonumber \\&\quad = \mathrm {exp}\Big (\frac{b^2}{4a}\Big )\mathrm {exp}\Big (-ar^2+br-\frac{b^2}{4a}\Big ) \nonumber \\&\quad = \mathrm {exp}(-ar^2+br). \end{aligned}$$

\(\square \)

Lemma 4

$$\begin{aligned} H_{m, \sigma ^2}(R)-H_{m,\sigma ^2}(0)=\int _0^R \mathrm {sinh}^m(r)\mathrm {exp} \bigg (-\frac{r^2}{2\sigma ^2}\bigg )dr. \end{aligned}$$

Proof

$$\begin{aligned}&\int _0^R \mathrm {sinh}^m(r)\mathrm {exp}\bigg (-\frac{r^2}{2\sigma ^2}\bigg )dr\nonumber \\&\quad = \int _0^R \bigg (\frac{(\mathrm {exp}(-r)-\mathrm {exp}(r))}{2}\bigg )^m\mathrm {exp}\bigg (-\frac{r^2}{2\sigma ^2}\bigg )dr\nonumber \\&\quad = \int _0^R \frac{1}{2^m}\bigg (\sum _{j=0}^{m} {m \atopwithdelims ()j}(-1)^j\mathrm {exp}(-jr)\times \nonumber \\&\quad \mathrm {exp}((m-j)r)\bigg )\mathrm {exp}\bigg (-\frac{r^2}{2\sigma ^2}\bigg )dr\nonumber \\&\quad = \frac{1}{2^m}\bigg (\sum _{j=0}^{m} {m \atopwithdelims ()j}(-1)^j \int _0^R \mathrm {exp}\bigg (-\frac{r^2}{2\sigma ^2}+(m-2j)r\bigg )dr\nonumber \\&\quad = \frac{1}{2^m}\bigg (\sum _{j=0}^{m} {m \atopwithdelims ()j}(-1)^j \sqrt{\frac{\pi \sigma ^2}{2}}\times \\&\quad \mathrm {exp}\Big (\frac{(m-2j)^2\sigma ^2}{2}\Big )\mathrm {erf}\Big (\frac{R}{\sqrt{2\sigma ^2}}-\sqrt{\frac{\sigma ^2}{2}}\big (m-2j\big )\Big )\nonumber \\&\quad = H_{m, \sigma ^2}(R), \end{aligned}$$

where the second to last inequality comes from letting \(a=\frac{1}{2\sigma ^2}\) and \(b=m-2j\) in Lemma 3. \(\square \)

Proof of Proposition 5(a)

Using equation (17) for the surface area of an \((n-1)\)-sphere in \(\mathbb {H}^n\),

$$\begin{aligned}&C_{\mathbb {H}}(\mu ,\sigma ^2)\\&\quad =\int _{\mathbb {H}^n} \mathrm {exp}\bigg (-\frac{d(y,\mu )^2}{2\sigma ^2}\bigg )dy\nonumber \\&\quad =\int _0^\infty A_{\mathbb {H}}(r)\mathrm {exp}\bigg (-\frac{r^2}{2\sigma ^2}\bigg )dr \nonumber \\&\quad =\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})}\int _0^\infty \mathrm {sinh}^{n-1}(r)\mathrm {exp}\bigg (-\frac{r^2}{2\sigma ^2}\bigg )dr\\&\quad =\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})} \big (\lim _{R^\prime \rightarrow \infty }H_{n-1,\sigma ^2}(R^\prime )-H_{n-1,\sigma ^2}(0)\big ), \end{aligned}$$

where the last equality comes from setting \(m=n-1\) in Lemma 4. (49) follows from (47) and the fact that \(\lim _{x\rightarrow \infty } \mathrm {erf}(x)=1\). \(\square \)

Proof of Proposition 5(b)

In a similar vein to the above, it is clear that the distribution function of \(d(y,\mu )\) is

$$\begin{aligned} F_{\mathbb {H}^n}(R):= & {} \mathrm {Pr}(d(y,\mu )\le R)=\frac{\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})} H_{n-1,\sigma ^2}(R)}{\frac{2\pi ^{\frac{n}{2}}}{\Gamma (\frac{n}{2})} \lim _{R\rightarrow \infty }H_{k-1,\sigma ^2}(\pi )}\\= & {} \frac{H_{n-1,\sigma ^2}(R)-H_{n-1,\sigma ^2}(0)}{ \lim _{R^\prime \rightarrow \infty }H_{n-1,\sigma ^2}(R^\prime )-H_{n-1,\sigma ^2}(0)}, \end{aligned}$$

for \(R\ge 0\). \(\square \)

1.3.3 A.3.3 Generating Random Points from the Riemannian Normal Distribution

To generate a random y from the Riemannian normal distribution on \(M=S^n\) or \(\mathbb {H}^n\):

  1. 1.

    If \(M=S^n\), draw a random \(R\in [0,\pi ]\) from \(F_{S^n}\) in (46) with \(F_{S^n}^{-1}|_{(0,1)}(t)\in [0,\pi ]\), where \(t\in (0,1)\) is drawn from the uniform distribution U(0, 1). R is the distance from \(\mu \). Similarly, if \(M=\mathbb {H}^n\), draw a random \(R\ge 0\) from \(F_{\mathbb {H}^n}\) in (50) with \(F_{\mathbb {H}^n}^{-1}|_{(0,1)}(t)>0\), where t is from U(0, 1).

  2. 2.

    Draw a random unit vector \(u\in T_\mu M\) from the uniform distribution on the unit \((n-1)\)-sphere in \(T_\mu M\cong \mathbb {R}^n\) by drawing a vector from an isotropic n-variate Gaussian distribution and dividing it by its magnitude. This works because all points on M that are a fixed distance from \(\mu \) are equally likely.

  3. 3.

    Multiply the randomly drawn R (magnitude) from step 1 and unit vector u (direction) from step 2 to give \(Ru\in T_\mu M\), a tangent vector at \(\mu \), and finally \(y=\mathrm {Exp}(\mu ,Ru)\).

Appendix B

1.1 B.1 Details About the Sphere, \(S^n\)

The n-sphere can be represented as the unit sphere embedded in \((n+1)\)-dimensional Euclidean space:

$$\begin{aligned} S^n=\{p\in \mathbb {R}^{n+1}\big |\Vert p\Vert =1\}, \end{aligned}$$

where \(\Vert p\Vert =\sqrt{\langle p,p\rangle }\) and \(\langle \cdot ,\cdot \rangle \) is the usual dot product defined by \( \langle p,q\rangle =\sum _{j=1}^{n+1}p^jq^j. \) with \(p=(p^1,\ldots ,p^{n+1}),q=(q^1,\ldots ,q^{n+1})\in \mathbb {R}^{n+1}\). The tangent space at \(p\in S^n\) then consists of the vectors in \(\mathbb {R}^{n+1}\) orthogonal to p with respect to the dot product:

$$\begin{aligned} T_pS^n=\{v\in \mathbb {R}^{n+1}|\langle p,v\rangle =0\}. \end{aligned}$$

The exponential map for \(S^n\) is given by

$$\begin{aligned} \mathrm {Exp}(p,v)=\mathrm {cos}(\Vert v\Vert )p+\mathrm {sin}(\Vert v\Vert )\frac{v}{\Vert v\Vert } \end{aligned}$$

for \(p\in S^n,~v\in T_pS^n\). For \(p_1,~p_2\in S^n,~p_1\ne -p_2\), the logarithmic map is given by

$$\begin{aligned} \mathrm {Log}(p_1,p_2)=\mathrm {cos}^{-1}( \langle p_1,p_2\rangle )\frac{p_2-\langle p_1,p_2\rangle p_1}{\Vert p_2-\langle p_1,p_2\rangle p_1\Vert }, \end{aligned}$$

meaning \(d(p_1,p_2)=\mathrm {cos}^{-1}( \langle p_1,p_2\rangle )\), and the parallel transport of a vector \(v\in T_{p_1}S^n\) along the unique minimizing geodesic from \(p_1\) to \(p_2\) (provided \(p_2\ne -p_1\)) is given by

$$\begin{aligned} \Gamma _{p_1\rightarrow p_2}(v)= & {} v-\frac{\langle \mathrm {Log}(p_1,p_2),v\rangle }{(\Vert \mathrm {Log}(p_1,p_2)\Vert )^2}\nonumber \\&\times \big (\mathrm {Log}(p_1,p_2)+\mathrm {Log}(p_2,p_1)\big ), \end{aligned}$$
(51)

or equivalently

$$\begin{aligned}&\Gamma _{p_1\rightarrow p_2}(v)=v^\perp \\&\quad +\Big \langle v,\frac{\mathrm {Log}(p_1,p_2)}{\Vert \mathrm {Log}(p_1,p_2)\Vert }\Big \rangle \Big (\mathrm {cos}(\Vert \mathrm {Log}(p_1,p_2)\Vert )\\&\quad \frac{\mathrm {Log}(p_1,p_2)}{\Vert \mathrm {Log}(p_1,p_2)\Vert }-\mathrm {sin}(\Vert \mathrm {Log}(p_1,p_2)\Vert )p_1\Big ), \end{aligned}$$

where

$$\begin{aligned} v^\top= & {} \Big \langle v,\frac{\mathrm {Log}(p_1,p_2)}{\Vert \mathrm {Log}(p_1,p_2)\Vert }\Big \rangle \frac{\mathrm {Log}(p_1,p_2)}{\Vert \mathrm {Log}(p_1,p_2)\Vert },~\text{ and }~~\\ v^\perp= & {} v-v^\top , \end{aligned}$$

which denote the parts of v that are parallel and orthogonal to \(\mathrm {Log}(p_1,p_2)\), respectively. The gradients with respect to p and each \(v^j\), calculated using Jacobi fields, are

$$\begin{aligned} \nabla _pE_\rho= & {} -\sum _{i=1}^N\frac{\rho ^\prime (\Vert e_i\Vert )}{\Vert e_i\Vert }d_p\mathrm {Exp}(p,Vx_i)^\dag e_i \\= & {} -\sum _{i=1}^N\frac{\rho ^\prime (\Vert e_i\Vert )}{\Vert e_i\Vert }\times \\&\quad \Big (\mathrm {cos}(\Vert Vx_i\Vert )\Gamma _{\hat{y}_i\rightarrow p}^\perp (e_i)+\Gamma _{\hat{y}_i\rightarrow p}^\top (e_i)\Big ),~\text{ and }\\ \nabla _{v^j}E_\rho= & {} -\sum _{i=1}^Nx_i^j\frac{\rho ^\prime (\Vert e_i\Vert )}{\Vert e_i\Vert }d_v\mathrm {Exp}(p,Vx_i)^\dag e_i\\= & {} -\sum _{i=1}^Nx_i^j\frac{\rho ^\prime (\Vert e_i\Vert )}{\Vert e_i\Vert }\times \\&\Big (\frac{\mathrm {sin}(\Vert Vx_i\Vert )}{\Vert Vx_i\Vert }\Gamma _{\hat{y}_i\rightarrow p}^\perp (e_i)+\Gamma _{\hat{y}_i\rightarrow p}^\top (e_i)\Big ), \end{aligned}$$

where \(e_i=\mathrm {Log}(\hat{y}_i,y_i)\), and \(\Gamma _{\hat{y}_i\rightarrow p}^\top (e_i)\) and \(\Gamma _{\hat{y}_i\rightarrow p}^\perp (e_i)\) are defined by

$$\begin{aligned} \Gamma _{\hat{y}_i\rightarrow p}^\top (e_i)= & {} \Big \langle \Gamma _{\hat{y}_i\rightarrow p}(e_i),\frac{v}{\Vert v\Vert }\Big \rangle \frac{v}{\Vert v\Vert },~\text{ and }\\ \Gamma _{\hat{y}_i\rightarrow p}^\perp (e_i)= & {} \Gamma _{\hat{y}_i\rightarrow p}(e_i)-\Gamma _{\hat{y}_i\rightarrow p}(e_i)^\top . \end{aligned}$$

1.2 B.2 Details About the Hyperbolic Space, \(\mathbb {H}^n\)

Unlike \(S^n\), hyperbolic space cannot be embedded in Euclidean space without distortion, so there exist several equivalent models for visualizing and performing calculations on this manifold. We will consider the hyperbolic model and the Poincaré ball model.

The hyperboloid model is particularly convenient for use in our gradient descent algorithm because several formulae are simple and analogous to the spherical case, as we will see. In this model, \(\mathbb {H}^n\) is embedded in the pseudo-Euclidean \((n+1)\)-dimensional Minkowski space. Originally used to model 4-dimensional spacetime in the special theory of relativity, it is equipped with the Minkowski (pseudo-)inner product, defined by

$$\begin{aligned} \langle p,q\rangle _\mathbf{M }=-p^1q^1+\sum _{j=2}^{n+1}p^jq^j \end{aligned}$$

with \(p=(p^1,\ldots ,p^{n+1}),q=(q^1,\ldots ,q^{n+1})\in \mathbb {R}^{n+1}\), instead of the usual dot product. This symmetric bilinear form is a pseudo-inner product because while it is non-degenerate, it is not positive definite. \(\mathbb {H}^n\) is represented as the upper sheet of a two-sheeted n-dimensional hyperboloid embedded in \(\mathbb {R}^{n+1}\):

$$\begin{aligned} \mathbb {H}^n=\left\{ p=\left( p^1,\ldots ,p^{n+1}\right) \in \mathbb {R}^{n+1}\big |\langle p,p\rangle _\mathbf{M }=-1,\ p^1>0\right\} . \end{aligned}$$

The tangent space at \(p\in \mathbb {H}^n\) then consists of the vectors in \(\mathbb {R}^{n+1}\) orthogonal to p with respect to the Minkowski inner product:

$$\begin{aligned} T_p\mathbb {H}^n=\{v\in \mathbb {R}^{n+1}|\langle p,v\rangle _\mathbf{M }=0\}. \end{aligned}$$

Even though \(\langle \cdot ,\cdot \rangle _\mathbf{M }\) is not positive-definite, its restriction to \(T_p\mathbb {H}^n\) is, so \(\mathbb {H}^n\) is a Riemannian manifold embedded in the pseudo-Riemannian Minkowski space and we can define a norm on the tangent space by \(\Vert v\Vert _\mathbf{M }=\sqrt{\langle v,v\rangle _\mathbf{M }}\) for \(v\in T_p\mathbb {H}^n\). The exponential map is then given by

$$\begin{aligned} \mathrm {Exp}(p,v)=\mathrm {cosh}(\Vert v\Vert _\mathbf{M })p+\mathrm {sinh}(\Vert v\Vert _\mathbf{M })\frac{v}{\Vert v\Vert _\mathbf{M }}. \end{aligned}$$

For \(p_1,~p_2\in \mathbb {H}^n\), the logarithmic map is given by

$$\begin{aligned} \mathrm {Log}(p_1,p_2)=\mathrm {cosh}^{-1}( -\langle p_1,p_2\rangle _\mathbf{M })\frac{p_2+\langle p_1,p_2\rangle _\mathbf{M } p_1}{\Vert p_2+\langle p_1,p_2\rangle _\mathbf{M } p_1\Vert _\mathbf{M }}, \end{aligned}$$

meaning \(d(p_1,p_2)=\mathrm {cosh}^{-1}(-\langle p_1,p_2\rangle _\mathbf{M })\), and the parallel transport of a vector \(v\in T_{p_1}\mathbb {H}^n\) along the unique minimizing geodesic from \(p_1\) to \(p_2\) is given by

$$\begin{aligned} \Gamma _{p_1\rightarrow p_2}(v)= & {} v-\frac{\langle \mathrm {Log}(p_1,p_2),v\rangle _\mathbf{M }}{(\Vert \mathrm {Log}(p_1,p_2)\Vert _\mathbf{M })^2}\times \nonumber \\&\big (\mathrm {Log}(p_1,p_2)+\mathrm {Log}(p_2,p_1) \big ), \end{aligned}$$
(52)

or equivalently

$$\begin{aligned}&v^\perp +\Big \langle v,\frac{\mathrm {Log}(p_1,p_2)}{\Vert \mathrm {Log}(p_1,p_2)\Vert _\mathbf{M }}\Big \rangle _\mathbf{M }\\&\quad \times \Big (\mathrm {cosh}(\Vert \mathrm {Log}(p_1,p_2)\Vert _\mathbf{M })\frac{\mathrm {Log}(p_1,p_2)}{\Vert \mathrm {Log}(p_1,p_2)\Vert _\mathbf{M }}\\&\quad +\mathrm {sinh}(\Vert \mathrm {Log}(p_1,p_2)\Vert _\mathbf{M })p_1\Big ), \end{aligned}$$

where

$$\begin{aligned}&\!\!\!v^\top =\Big \langle v,\frac{\mathrm {Log}(p_1,p_2)}{\Vert \mathrm {Log}(p_1,p_2)\Vert _\mathbf{M }}\Big \rangle _\mathbf{M }\frac{\mathrm {Log} (p_1,p_2)}{\Vert \mathrm {Log}(p_1,p_2)\Vert _\mathbf{M }},\\&\!\!\!\text{ and }~~v^\perp =v-v^\top . \end{aligned}$$

The gradients with respect to p and each \(v^j\), calculated using Jacobi fields, are

$$\begin{aligned} \nabla _pE_\rho= & {} -\sum _{i=1}^N\frac{\rho ^\prime (\Vert e_i\Vert _\mathbf{M })}{\Vert e_i\Vert _\mathbf{M }}d_p\mathrm {Exp}(p,Vx_i)^\dag e_i \\= & {} -\sum _{i=1}^N\frac{\rho ^\prime (\Vert e_i\Vert _\mathbf{M })}{\Vert e_i\Vert _\mathbf{M }}\times \\&\Big (\mathrm {cosh}(\Vert Vx_i\Vert _\mathbf{M })\Gamma _{\hat{y}_i\rightarrow p}^\perp (e_i)+\Gamma _{\hat{y}_i\rightarrow p}^\top (e_i)\Big ),~\text{ and }\\ \nabla _{v^j}E_\rho= & {} -\sum _{i=1}^Nx_i^j\frac{\rho ^\prime (\Vert e_i\Vert _\mathbf{M })}{\Vert e_i\Vert _\mathbf{M }}d_v\mathrm {Exp}(p,Vx_i)^\dag e_i\\= & {} -\sum _{i=1}^Nx_i^j\frac{\rho ^\prime (\Vert e_i\Vert _\mathbf{M })}{\Vert e_i\Vert _\mathbf{M }}\times \\&\Big (\frac{\mathrm {sinh}(\Vert Vx_i\Vert _\mathbf{M })}{\Vert Vx_i\Vert _\mathbf{M }}\Gamma _{\hat{y}_i\rightarrow p}^\perp (e_i)+\Gamma _{\hat{y}_i\rightarrow p}^\top (e_i)\Big ), \end{aligned}$$

where \(e_i=\mathrm {Log}(\hat{y}_i,y_i)\), and \(\Gamma _{\hat{y}_i\rightarrow p}^\top (e_i)\) and \(\Gamma _{\hat{y}_i\rightarrow p}^\perp (e_i)\) are defined by

$$\begin{aligned} \Gamma _{\hat{y}_i\rightarrow p}^\top (e_i)= & {} \Big \langle \Gamma _{\hat{y}_i\rightarrow p}(e_i),\frac{v}{\Vert v\Vert _\mathbf{M }}\Big \rangle \frac{v}{\Vert v\Vert _\mathbf{M }},~\text{ and }\\ \Gamma _{\hat{y}_i\rightarrow p}^\perp (e_i)= & {} \Gamma _{\hat{y}_i\rightarrow p}(e_i)-\Gamma _{\hat{y}_i\rightarrow p}(e_i)^\top . \end{aligned}$$

The Poincaré ball model, along with the so-called Beltraim-Klein model, is useful for visualization. In it, hyperbolic space is represented as the interior of the unit ball in \(\mathbb {R}^n\):

$$\begin{aligned} \mathbb {P}^n=\{q=(q^1,\ldots ,q^n)\in \mathbb {R}^n\big |\Vert q\Vert <1\}, \end{aligned}$$

and a geodesic is represented as either an arc of a circle that are orthogonal to the boundary of the unit ball, or a diameter of the ball. The distance between two points \(q_1,q_2\in \mathbb {P}^n\) increases exponentially as they get closer to the boundary:

$$\begin{aligned} d(q_1,q_2)=\mathrm {cosh}^{-1}\Bigg (1+2\frac{\Vert q_1-q_2\Vert ^2}{(1-\Vert q_1\Vert ^2)(1-\Vert q_2\Vert ^2)}\Bigg ). \end{aligned}$$

The Poincaré ball can be constructed from the hyperboloid model via the function \(g:\mathbb {H}^n\rightarrow \mathbb {P}^n\) defined by

$$\begin{aligned} g((p^1,p^2,\ldots ,p^{n+1}))=\frac{(p^2,\ldots ,p^{n+1})}{p^1+1}. \end{aligned}$$

That is, a point \(p=(p^1,\ldots ,p^{n+1})\) in the hyperboloid model is projected onto the interior of the unit ball in the hyperplane \(x^1=0\) through the line connecting that point to \((-1,0,\ldots ,0)\). The inverse of this function, \(g^{-1}:\mathbb {P}^n\rightarrow \mathbb {H}^n\), mapping the open ball back to the hyperboloid model is

$$\begin{aligned} g^{-1}((q^1,q^2,\ldots ,q^n))=\frac{(1+\sum _{j=1}^n (q^j)^2, 2q^1,\ldots ,2q^n)}{1-\sum _{j=1}^n (q^j)^2}. \end{aligned}$$

This easy conversion between the two models allows one to take advantage of the strengths of both.

1.3 B.3 Details About Kendall’s 2-Dimensional Shape Space, \(\Sigma _2^K\)

Much of this section has been written with reference to Sect. 3.11 of the online supplementary document of Cornea et al. (2017) and Sect. 5.2.1 of Fletcher (2013).

As mentioned in Sect. 4.2, a shape is the geometry of an object after the effects of translation, scaling and rotation have been removed. A K-configuration in the two-dimensional plane can be expressed as a K-by-2 matrix, or equivalently as a complex K-vector \(z=(z^1,\ldots ,z^K)\in \mathbb {C}^K\). Translation is removed by subtracting the centroid \(\frac{1}{n}\sum _{m=1}^K z^m\) from each element of z and scaling is removed by dividing z by its norm \(\Vert z\Vert = \sqrt{\langle z,z\rangle }\); recall that the standard complex inner product is given by \(\langle z_1,z_2\rangle = \overline{z_2}^T z_1 = \sum _{m=1}^K z_1^m \overline{z_2^m}\). In this way, we limit our consideration to \(D^K=\{z\in \mathbb {C}^K | \sum _{m=1}^K z^m=0\), \(\sum _{m=1}^K z^m \overline{z^m}=1\}\), which can be thought of as a unit sphere of real dimension \(2K-3\). This set is called the pre-shape space, and its elements pre-shapes.

As only rotation remains, pre-shapes have the same shape if they are planar rotations of each other. We define an equivalence relation on \(D^K\) such that all pre-shapes of the same shape are equivalent. Then two pre-shapes \(z_1, z_2\in D^K\) are equivalent (\(z_1\sim z_2\)) if \(z_1 = z_2e^{i\theta }\) for some angle \(\theta \), as rotation in the complex plane is performed by multiplication by \(e^{i\theta }\). So a shape is the equivalence class \(p=[z_p]_\sim =\{z^\prime =z_pe^{i\theta }|\theta \in [0,2\pi )\}\subset D^K\), the set of all rotations of a pre-shape \(z_p\), and is an element of the quotient space \(\Sigma _2^K=D^K/S^1\), a Riemannian manifold of real dimension \((2K-4)\). This space is equivalent to \(\mathbb {C}P^{K-2}\), the set of complex lines through the origin in \(\mathbb {C}^{K-1}\), as the space of centered K-configurations is equivalent to \(\mathbb {C}^{K-1}\), and scaling and rotation together are equivalent to multiplication by a complex number \(re^{i\theta }\).

The manifold is endowed with the complex inner product and the tangent space at \(y=[z_y]_\sim \in \Sigma _2^K\) is given by

$$\begin{aligned} T_y \Sigma _2^K= & {} \{v=(v^1,\ldots ,v^K)|\frac{1}{K}\times \\&\sum _{m=1}^K v^m=0 \text{ and }~\text{ Re }(\langle z_ye^{i\theta },v\rangle )\\= & {} 0, \forall \theta \in [0,2\pi )\}\\= & {} \{v=(v^1,\ldots ,v^K)|\sum _{m=1}^K v^m=0, \langle z^\prime ,v\rangle =0\\&\text{ for } \text{ any } z^\prime \in [z_y]_\sim \}, \end{aligned}$$

where Re(\(\langle \cdot ,\cdot \rangle \)) gives the real inner product when the complex k-vectors are instead conceptualized as real 2k-vectors.

All calculations in shape space are done using representatives in pre-shape space. Given \(z_{p_1},z_{p_2}\in D^K\), \(z_{p_2}^*={{\,\mathrm{arg\!\min }\,}}_{z_{p_2}^\prime \in [z_{p_2}]_\sim } d_{D^K}(z_{p_1},z_{p_2}^\prime )\), where \(d_{D^K}\) is the spherical geodesic distance on \(D^K\), is the optimal rotational alignment of \(z_{p_2}\) to \(z_{p_1}\). It can be shown that

$$\begin{aligned} z_{p_2}^*=z_{p_2}e^{i\theta ^*} \text{, } \text{ where } e^{i\theta ^*}=\frac{\langle z_{p_1}, z_{p_2}\rangle }{|\langle z_{p_1}, z_{p_2}\rangle |}, \end{aligned}$$
(53)

so that \(\theta ^*\) is the argument of \(\langle z_{p_1}, z_{p_2}\rangle \); note that this means \(\langle z_{p_1}, z_{p_2}^*\rangle =|\langle z_{p_1}, z_{p_2}\rangle |\) is real and positive. Then the geodesic distance \(d_{\Sigma _2^K}\) between \(p_1=[z_{p_1}]_\sim \) and \(p_2=[z_{p_2}]_\sim \) on \(\Sigma _2^K\) is

$$\begin{aligned}&d_{\Sigma _2^K}(p_1,p_2)=\min _{z_{p_2}^\prime \in [z_{p_2}]} d_{D^K}(z_{p_1},z_{p_2}^\prime )=d_{D^K}(z_{p_1},z_{p_2}^*)\\&\quad =\mathrm {cos}^{-1}(\langle z_{p_1},z_{p_2}^* \rangle )\\&\quad = \mathrm {cos}^{-1}(|\langle z_{p_1},z_{p_2} \rangle |), \end{aligned}$$

where \(z_{p_2}\) can be any element of \([z_{p_2}]_\sim \) and the geodesic distance does not depend on the choice of the representative pre-shapes. The exponential map for \(\Sigma _2^K\) is given by

$$\begin{aligned} \mathrm {Exp}(p,v)=\Big [\mathrm {cos}(\Vert v\Vert )z_p+\mathrm {sin}(\Vert v\Vert )\frac{v}{\Vert v\Vert }\Big ]_\sim , \end{aligned}$$

where \(p=[z_p]_\sim \in \Sigma _2^K\), \(v\in T_p\Sigma _2^K\). This is similar to the exponential map for the k-sphere. Note that the resulting pre-shape in the square brackets is optimally aligned to the representative pre-shape \(z_p\). The logarithmic map is given by

$$\begin{aligned}&\mathrm {Log}(p_1,p_2)=\mathrm {cos}^{-1}(\langle z_{p_1},z_{p_2}^*\rangle )\frac{z_{p_2}^*-\langle z_{p_1},z_{p_2}^*\rangle z_{p_1}}{\Vert z_{p_2}^*-\langle z_{p_1},z_{p_2}^*\rangle z_{p_1}\Vert }\\&\quad =\mathrm {cos}^{-1}( |\langle z_{p_1},z_{p_2}\rangle |)\frac{z_{p_2}^*-|\langle z_{p_1},z_{p_2}\rangle |z_{p_1}}{\Vert z_{p_2}^*-|\langle z_{p_1},z_{p_2}\rangle |z_{p_1}\Vert }, \end{aligned}$$

where \(p_1=[z_{p_1}]_\sim \) and \(p_2=[z_{p_2}]_\sim \) are in \(\Sigma _2^K\) and \(z_{p_2}^*\) is as defined in (53). Note that this depends on the choice of \(z_{p_1}\) but not \(z_{p_2}\), and so is only valid at the at this particular representation of p. Parallel transport of \(v\in T_p\Sigma _2^K\) along the geodesic from \(p_1=[z_{p_1}]_\sim \) to \(p_2=[z_{p_2}]_\sim \) is

$$\begin{aligned}&\Gamma _{p_1\rightarrow p_2}(v)=e^{-i\theta ^*}\Bigg \{v-\langle v,z_{p_1}\rangle z_{p_1} - \langle v,\tilde{z_{p_2}^*}\rangle \tilde{z_{p_2}^*}\\&\qquad + \Big (\langle z_{p_2}^*,z_{p_1}\rangle \langle v,z_{p_1}\rangle - \sqrt{1-|\langle z_{p_2}^*,z_{p_1}\rangle |^2}\langle v,\tilde{z_{p_2}^*}\rangle \Big )z_{p_1} \nonumber \\&\qquad +\Big (\sqrt{1-|\langle z_{p_2}^*,z_{p_1}\rangle |^2} \langle v,z_{p_1}\rangle - \overline{\langle z_{p_2}^*,z_{p_1}\rangle }\langle v,\tilde{z_{p_2}^*}\rangle \Big )\tilde{z_{p_2}^*}\Bigg \}\nonumber \\&\quad =\frac{\overline{\langle z_{p_1}, z_{p_2}\rangle }}{|\langle z_{p_1}, z_{p_2}\rangle |}\Bigg \{v-\langle v,z_{p_1}\rangle z_{p_1} - \langle v,\tilde{z_{p_2}^*}\rangle \tilde{z_{p_2}^*}\nonumber \\&\qquad + \Big (|\langle z_{p_1}, z_{p_2}\rangle |\langle v,z_{p_1}\rangle \nonumber \\&\qquad - \sqrt{1-|\langle z_{p_1}, z_{p_2}\rangle |^2}\langle v,\tilde{z_{p_2}^*}\rangle \Big )z_{p_1}\nonumber \\&\qquad +\Big (\sqrt{1-|\langle z_{p_1}, z_{p_2}\rangle |^2} \langle v,z_{p_1}\rangle \\&\qquad - |\langle z_{p_1}, z_{p_2}\rangle |\langle v,\tilde{z_{p_2}^*}\rangle \Big )\tilde{z_{p_2}^*}\Bigg \}, \end{aligned}$$

where \(\tilde{z_{p_2}^*}=(z_{p_2}^*-\langle z_{p_2}^*, z_{p_1}\rangle z_{p_1})/\sqrt{1-\langle z_{p_2}^*, z_{p_1}\rangle ^2}=(z_{p_2}^*-|\langle z_{p_1}, z_{p_2}\rangle |z_{p_1})/\sqrt{1-|\langle z_{p_1}, z_{p_2}\rangle |^2}\) and \(z_{p_2}^*, \theta *\) are as defined in (53). Parallel transport uses the special unitary group. Note that this depends on the choice of both \(z_{p_1}\) and \(z_{p_2}\), so care must be taken.

The gradients with respect to p and each \(v^j\), calculated using Jacobi fields, are

$$\begin{aligned} \nabla _pE_\rho= & {} -\sum _{i=1}^N\frac{\rho ^\prime (\Vert e_i\Vert )}{\Vert e_i\Vert }d_p\mathrm {Exp}(p,Vx_i)^\dag e_i \\= & {} -\sum _{i=1}^N\frac{\rho ^\prime (\Vert e_i\Vert )}{\Vert e_i\Vert }\times \\&\Big (\mathrm {cos}(\Vert Vx_i\Vert )u_i^\perp +\mathrm {cos}(\Vert 2 Vx_i\Vert )w_i^\perp +u_i^\top +w_i^\top \Big ),\\ \nabla _{v^j}E_\rho= & {} -\sum _{i=1}^Nx_i^{j}\frac{\rho ^\prime (\Vert e_i\Vert )}{\Vert e_i\Vert }d_{v}\mathrm {Exp}(p,Vx_i)^\dag e_i\\= & {} -\sum _{i=1}^Nx_i^{j}\frac{\rho ^\prime (\Vert e_i\Vert )}{\Vert e_i\Vert }\times \\&\Big (\frac{\mathrm {sin}(\Vert Vx_i\Vert )}{\Vert Vx_i\Vert }u_i^\perp +\frac{\mathrm {sin}(\Vert 2Vx_i\Vert )}{\Vert 2Vx_i\Vert }w_i^\perp +u_i^\top +w_i^\top \Big ), \end{aligned}$$

where \(e_i=\mathrm {Log}(\hat{y}_i,y_i)\) and \(u_i\), \(w_i\) are defined as follows: Define a function \(j:\mathbb {C}\rightarrow \mathbb {C}\) by \(j(v)=iv\), where \(i=\sqrt{-1}\), not the index. Separate \(\Gamma _{\hat{y}_i\rightarrow p}(e_i)\) into components \(u_i\) and \(w_i\) that are orthogonal and parallel to \(j(Vx_i)\) respectively, where all these vectors are conceptualized as real 2K-vectors rather than complex K-vectors i.e.

$$\begin{aligned} w_i= & {} \mathrm {Re}\Big (\Big \langle \Gamma _{\hat{y}_i\rightarrow p}(e_i),\frac{j(Vx_i)}{\Vert j(Vx_i)\Vert }\Big \rangle \Big ) \frac{j(Vx_i)}{\Vert j(Vx_i)\Vert },~\text{ and }\\&\qquad \qquad \qquad u_i=\Gamma _{\hat{y}_i\rightarrow p}(e_i)-w_i. \end{aligned}$$

Then \(u_i^\perp \) and \(u_i^\top \) are defined by

$$\begin{aligned} u_i^\top =\mathrm {Re}\Big (\Big \langle u_i,\frac{v}{\Vert v\Vert }\Big \rangle \Big ) \frac{v}{\Vert v\Vert },~\text{ and }~~u_i^\perp =u_i-u_i^\top , \end{aligned}$$

again treating the complex K-vectors as real 2K-vectors, and \(w_i^\perp \) and \(w_i^\top \) are defined similarly.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shin, HY., Oh, HS. Robust Geodesic Regression. Int J Comput Vis 130, 478–503 (2022). https://doi.org/10.1007/s11263-021-01561-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-021-01561-w

Keywords

Navigation