Streams Course Blog

Notes further updated

Posted in Uncategorized by amitc1 on November 12, 2009

The lecture notes have been updated some more. Thanks to all who have provided me timely scribe notes. Please read what other scribes have written and if this gives you ideas for improving your own notes, please send me a new version of your notes. It would be most welcome.

Tagged with:

Lecture notes

Posted in Uncategorized by amitc1 on October 11, 2009

The first set of lecture notes, covering the basics and the first algorithm we saw (Misra-Gries; scanned PDF here) is now ready. This is also linked from the course website. I will add to these notes soon, to include the Flajolet-Martin and AMS algorithms for DISTINCT-ELEMENTS. Also, Radhika and Andrew will add to the notes to include Count-Min Sketch, CountSketch and the BJKST algorithm.

I’ll announce each significant update to the lecture notes on this blog.

Please notify me, or write a comment here, if you spot an error or a typo in the notes, or wish to say something related to all this. Let this blog be at least somewhat informal!

Tagged with:

Probability theory reference

Posted in Uncategorized by amitc1 on October 6, 2009

By now, we have introduced the idea of randomized algorithms, and to analyse their performance, we have used some basic probability theory. To recap, the important concepts are:

  • Probability spaces, events, union bound
  • Random variables, expectation, linearity of expectation
  • Independent events, independent random variables
  • Variance, standard deviation

Most importantly, we have introduced the notion of large deviation inequalities, or tail inequalities. Briefly, these inequalities assert that a certain random variable is not too likely to deviate much from its expected value. The important inequalities (so far) are:

  1. Markov’s inequality: If X \ge 0 is a random variable and t \ge 0 a constant, then
    \Pr[X \ge t] ~\le~ \mathbb{E}[X]/t.
  2. Chebyshev’s inequality: If \mu = \mathbb{E}[X], \sigma = \sqrt{\mathrm{Var}[X]}, then
    \Pr[|X - \mu| \ge t\sigma] ~\le~ 1/t^2.
  3. Chernoff bound: If X = X_1 + \cdots + X_n, \mu = \mathbb{E}[X] and \{X_i\} are independent and identically distributed indicator random varibles, then
    \Pr[X - \mu \ge \varepsilon\mu] ~\le~ e^{-\mu\varepsilon^2/3}; \Pr[X - \mu \le -\varepsilon\mu] ~\le~ e^{-\mu\varepsilon^2/3}.

Going forward, it will be vital to have these inequalities at your fingertips, ready for use. An excellent reference covering all of this material is the early chapters of the book Randomized Algorithms, by Motwani and Raghavan.

Tagged with: