Streams Course Blog


Posted in Uncategorized by amitc1 on October 15, 2009

As you must have noticed, I did permute the list of topics as mentioned below. I have also made this change on the course website.


Possible permutation

Posted in Uncategorized by amitc1 on October 12, 2009

I am considering permuting the lecture plan a bit, so we cover the F2 algorithm and Johnson-Lindenstrauss next, and then return to the MEDIAN and SELECTION problems afterwards. Just a heads-up.

Lecture notes

Posted in Uncategorized by amitc1 on October 11, 2009

The first set of lecture notes, covering the basics and the first algorithm we saw (Misra-Gries; scanned PDF here) is now ready. This is also linked from the course website. I will add to these notes soon, to include the Flajolet-Martin and AMS algorithms for DISTINCT-ELEMENTS. Also, Radhika and Andrew will add to the notes to include Count-Min Sketch, CountSketch and the BJKST algorithm.

I’ll announce each significant update to the lecture notes on this blog.

Please notify me, or write a comment here, if you spot an error or a typo in the notes, or wish to say something related to all this. Let this blog be at least somewhat informal!

Tagged with:

The first homework set

Posted in Uncategorized by amitc1 on October 10, 2009

It is ready at last! I have posted it to the course website (scroll down all the way). Future homework sets will be announced here as and when they get posted.

Problem 2 on this set refers to the finite field GF(2^n). If you are unfamiliar with this object, read this Wikipedia article. That already gives you enough information for now. If you want to learn more about finite fields, stop by my office sometime or pick up a suitable Mathematics textbook.

Tagged with:

Probability theory reference

Posted in Uncategorized by amitc1 on October 6, 2009

By now, we have introduced the idea of randomized algorithms, and to analyse their performance, we have used some basic probability theory. To recap, the important concepts are:

  • Probability spaces, events, union bound
  • Random variables, expectation, linearity of expectation
  • Independent events, independent random variables
  • Variance, standard deviation

Most importantly, we have introduced the notion of large deviation inequalities, or tail inequalities. Briefly, these inequalities assert that a certain random variable is not too likely to deviate much from its expected value. The important inequalities (so far) are:

  1. Markov’s inequality: If X \ge 0 is a random variable and t \ge 0 a constant, then
    \Pr[X \ge t] ~\le~ \mathbb{E}[X]/t.
  2. Chebyshev’s inequality: If \mu = \mathbb{E}[X], \sigma = \sqrt{\mathrm{Var}[X]}, then
    \Pr[|X - \mu| \ge t\sigma] ~\le~ 1/t^2.
  3. Chernoff bound: If X = X_1 + \cdots + X_n, \mu = \mathbb{E}[X] and \{X_i\} are independent and identically distributed indicator random varibles, then
    \Pr[X - \mu \ge \varepsilon\mu] ~\le~ e^{-\mu\varepsilon^2/3}; \Pr[X - \mu \le -\varepsilon\mu] ~\le~ e^{-\mu\varepsilon^2/3}.

Going forward, it will be vital to have these inequalities at your fingertips, ready for use. An excellent reference covering all of this material is the early chapters of the book Randomized Algorithms, by Motwani and Raghavan.

Tagged with:

Hello, world

Posted in Uncategorized by amitc1 on October 6, 2009

I shall be posting information and updates regarding the course “CS 85/185: Data Stream Algorithms” to this blog. Students registered for the course are required to keep up with it, because important announcements will be made on it from time to time. Apart from announcements, I will try to use this blog to supplement the content of the lectures. Use the comments to initiate discussions on course topics. All students (including auditors) are welcome to comment.

In case something (such as commenting) does not work right, please bear with me; I am new to WordPress. You can always contact me via good old email.

Tagged with: