9-2 Weighted median

For nn distinct elements x1,x2,,xnx_1, x_2, \ldots, x_n with positive weights w1,w2,,wnw_1, w_2, \ldots, w_n such that i=1nwi=1\sum_{i = 1}^n w_i = 1, the weighted (lower) median is the element xkx_k satisfying

xi<xkwi<12\sum_{x_i < x_k} w_i < \frac{1}{2}

and

xi>xkwi12.\sum_{x_i > x_k} w_i \le \frac{1}{2}.

For example, if the elements are 0.1,0.35,0.05,0.1,0.15,0.05,0.20.1, 0.35, 0.05, 0.1, 0.15, 0.05, 0.2 and each element equals its weight (that is, wi=xiw_i = x_i for i=1,2,,7i = 1, 2, \ldots, 7), then the median is 0.10.1, but the weighted median is 0.20.2.

a. Argue that the median of x1,x2,,xnx_1, x_2, \ldots, x_n is the weighted median of the xix_i with weights wi=1/nw_i = 1 / n for i=1,2,,ni = 1, 2, \ldots, n.

b. Show how to compute the weighted median of nn elements in O(nlgn)O(n\lg n) worstcase time using sorting.

c. Show how to compute the weighted median in Θ(n)\Theta(n) worst-case time using a linear-time median algorithm such as SELECT\text{SELECT} from Section 9.3.

The post-office location problem is defined as follows. We are given nn points p1,p2,,pnp_1, p_2, \ldots, p_n with associated weights w1,w2,,wnw_1, w_2, \ldots, w_n. We wish to find a point pp (not necessarily one of the input points) that minimizes the sum i=1nwid(p,pi)\sum_{i = 1}^n w_i d(p, p_i), where d(a,b)d(a, b) is the distance between points aa and bb.

d. Argue that the weighted median is a best solution for the 11-dimensional postoffice location problem, in which points are simply real numbers and the distance between points aa and bb is d(a,b)=abd(a, b) = |a - b|.

e. Find the best solution for the 22-dimensional post-office location problem, in which the points are (x,y)(x,y) coordinate pairs and the distance between points a=(x1,y1)a = (x_1, y_1) and b=(x2,y2)b = (x_2, y_2) is the Manhattan distance given by d(a,b)=x1x2+y1y2d(a, b) = |x_1 - x_2| + |y_1 - y_2|.

a. Let mkm_k be the number of xix_i smaller than xkx_k. When weights of 1/n1 / n are assigned to each xix_i, we have xi<xkwi=mk/n\sum_{x_i < x_k} w_i = m_k / n and xi>xkwi=(nmk1)/2\sum_{x_i > x_k} w_i = (n - m_k - 1) / 2. The only value of mkm_k which makes these sums <1/2< 1 / 2 and 1/2\le 1 / 2 respectively is when n/21\lceil n / 2 \rceil - 1, and this value xx must be the median since it has equal numbers of xisx_i's which are larger and smaller than it.

b. First use mergesort to sort the xix_i's by value in O(nlgn)O(n\lg n) time. Let SiS_i be the sum of the weights of the first ii elements of this sorted array and note that it is O(1)O(1) to update SiS_i. Compute S1,S2,S_1, S_2, \dots until you reach kk such that Sk1<1/2S_{k − 1} < 1 / 2 and Sk1/2S_k \ge 1 / 2. The weighted median is xkx_k.

c. We modify SELECT\text{SELECT} to do this in linear time. Let xx be the median of medians. Compute xi<xwi\sum_{x_i < x} w_i and xi>xwi\sum_{x_i > x} w_i and check if either of these is larger than 1/21 / 2. If not, stop. If so, recurse on the collection of smaller or larger elements known to contain the weighted median. This doesn't change the runtime, so it is Θ(n)\Theta(n).

d. Let pp be the minimizer, and suppose that pp is not the weighted median. Let ϵ\epsilon be small enough such that ϵ<mini(ppi)\epsilon < \min_i(|p − p_i|), where we don't include kk if p=pkp = p_k. If pmp_m is the weighted median and p<pmp < p_m, choose ϵ>0\epsilon > 0. Otherwise choose ϵ<0\epsilon < 0. Thus, we have

i=1nwid(p+ϵ,pi)=i=1nwid(p,pi)+ϵ(pi<pwipi>pwi)<i=1nwid(p,pi),\sum_{i = 1}^n w_id(p + \epsilon, p_i) = \sum_{i = 1}^n w_id(p, p_i) + \epsilon\left(\sum_{p_i < p} w_i - \sum_{p_i > p} w_i \right) < \sum_{i = 1}^n w_id(p, p_i),

the difference in sums will take the opposite sign of epsilon.

e. Observe that

i=1nwid(p,pi)=i=1nwipx(pi)x+i=1nwipy(pi)y.\sum_{i = 1}^n w_id(p, p_i) = \sum_{i = 1}^n w_i |p_x - (p_i)_x| + \sum_{i = 1}^n w_i|p_y - (p_i)_y|.

It will suffice to minimize each sum separately, which we can do since we choose pxp_x and pyp_y individually. By part (e), we simply take p=(px,py)p = (p_x, p_y) to be such that pxp_x is the weighted median of the xx-coordinates of the pip_i's and py is the weighted medain of the yy-coordiantes of the pip_i's.