## References galore

The last few lectures have covered algorithms for which the best reference remains the original research paper describing the algorithm. Here are the relevant pointers, collected. If I have forgotten to add a reference for a result recently covered in class, please notify me and I’ll update this post.

- The multi-pass exact median algorithm is due to Munro and Paterson.
- The approximate median algorithm for randomly ordered streams is due to Guha and McGregor.
- The quantiles algorithm is due to Greenwald and Khanna.
- The one-pass algorithm for empirical entropy is due to Chakrabarti, Cormode and McGregor.
- State-of-the-art results for spanner construction were obtained by Elkin.

For future work (including homework problems), do not be scared to dig into these papers, if you need to understand one of these algorithms at a deeper level than we managed to cover in class.

## New homework is up

Homework 2 is ready and is now available on the course website. This one is 100% about designing algorithms, and it is designed to get you thinking more deeply about what you have learned so far and coming up with creative solutions.

Start brainstorming ASAP, for these problems will not be solved in one sitting. You’ll need to sleep over them a couple of times and let the necessary ideas come to you over time.

## X-hour update

Just a heads-up that we will be using the X-hour, 3:00-4:10pm, next Wednesday (Oct 28). Also, Amit will be out of town next Tuesday (Oct 27), so we will not have regular class. Instead, you can use the opportunity to discuss HW2 with Chrisil.

## Reference for today’s lecture

Today’s lecture was more technically heavy than usual. I hope it didn’t bring back any “calculus nightmares” for anyone! In any case, here is Indyk’s paper, which includes the algorithms and ideas we discussed today, plus quite a bit more.

We’re not quite done discussing all the higher-level issues raised by this, and some other recent algorithms we have seen. Next Tuesday, I want to spend approximately the first half discussing these things, and then the second half talking about the MEDIAN problem. As you’ll note, we’re continuing to tweak the lecture plan slightly as we go. For instance, today we ended up discussing stable distributions earlier than planned.

## More lecture notes

I have added a new chapter to the lecture notes, which covers the Flajolet-Martin DISTINCT-ELEMENTS algorithm. I also made small tweaks to the earlier parts. As always, the link from the course website will get you to the latest version of the notes.

Please notify me if you find any typos or errors in the notes. With 20-odd eagle-eyed readers reading these, it would be a shame if something got away!

The LaTeX sources to these notes have been distributed (by email) to all registered students, who can use them while scribing. If you are a registered student and did not get my email about this, please notify me at once.

## Permuted

As you must have noticed, I did permute the list of topics as mentioned below. I have also made this change on the course website.

## Possible permutation

I am considering permuting the lecture plan a bit, so we cover the F_{2} algorithm and Johnson-Lindenstrauss next, and then return to the MEDIAN and SELECTION problems afterwards. Just a heads-up.

## Lecture notes

The first set of lecture notes, covering the basics and the first algorithm we saw (Misra-Gries; scanned PDF here) is now ready. This is also linked from the course website. I will add to these notes soon, to include the Flajolet-Martin and AMS algorithms for DISTINCT-ELEMENTS. Also, Radhika and Andrew will add to the notes to include Count-Min Sketch, CountSketch and the BJKST algorithm.

I’ll announce each significant update to the lecture notes on this blog.

Please notify me, or write a comment here, if you spot an error or a typo in the notes, or wish to say something related to all this. Let this blog be at least somewhat informal!

## The first homework set

It is ready at last! I have posted it to the course website (scroll down all the way). Future homework sets will be announced here as and when they get posted.

Problem 2 on this set refers to the finite field . If you are unfamiliar with this object, read this Wikipedia article. That already gives you enough information for now. If you want to learn more about finite fields, stop by my office sometime or pick up a suitable Mathematics textbook.

## Probability theory reference

By now, we have introduced the idea of randomized algorithms, and to analyse their performance, we have used some basic probability theory. To recap, the important concepts are:

- Probability spaces, events, union bound
- Random variables, expectation, linearity of expectation
- Independent events, independent random variables
- Variance, standard deviation

Most importantly, we have introduced the notion of *large deviation* inequalities, or *tail inequalities*. Briefly, these inequalities assert that a certain random variable is not too likely to deviate much from its expected value. The important inequalities (so far) are:

- Markov’s inequality: If is a random variable and a constant, then
.
- Chebyshev’s inequality: If , then
.
- Chernoff bound: If and are independent and identically distributed indicator random varibles, then
; .

Going forward, it will be vital to have these inequalities at your fingertips, ready for use. An excellent reference covering all of this material is the early chapters of the book *Randomized Algorithms*, by Motwani and Raghavan.

leave a comment