Streams Course Blog

References galore

Posted in Uncategorized by amitc1 on October 30, 2009

The last few lectures have covered algorithms for which the best reference remains the original research paper describing the algorithm. Here are the relevant pointers, collected. If I have forgotten to add a reference for a result recently covered in class, please notify me and I’ll update this post.

For future work (including homework problems), do not be scared to dig into these papers, if you need to understand one of these algorithms at a deeper level than we managed to cover in class.

Tagged with:

New homework is up

Posted in Uncategorized by amitc1 on October 22, 2009

Homework 2 is ready and is now available on the course website. This one is 100% about designing algorithms, and it is designed to get you thinking more deeply about what you have learned so far and coming up with creative solutions.

Start brainstorming ASAP, for these problems will not be solved in one sitting. You’ll need to sleep over them a couple of times and let the necessary ideas come to you over time.

X-hour update

Posted in Uncategorized by amitc1 on October 20, 2009

Just a heads-up that we will be using the X-hour, 3:00-4:10pm, next Wednesday (Oct 28). Also, Amit will be out of town next Tuesday (Oct 27), so we will not have regular class. Instead, you can use the opportunity to discuss HW2 with Chrisil.

Reference for today’s lecture

Posted in Uncategorized by amitc1 on October 15, 2009

Today’s lecture was more technically heavy than usual. I hope it didn’t bring back any “calculus nightmares” for anyone! In any case, here is Indyk’s paper, which includes the algorithms and ideas we discussed today, plus quite a bit more.

We’re not quite done discussing all the higher-level issues raised by this, and some other recent algorithms we have seen. Next Tuesday, I want to spend approximately the first half discussing these things, and then the second half talking about the MEDIAN problem. As you’ll note, we’re continuing to tweak the lecture plan slightly as we go. For instance, today we ended up discussing stable distributions earlier than planned.

More lecture notes

Posted in Uncategorized by amitc1 on October 15, 2009

I have added a new chapter to the lecture notes, which covers the Flajolet-Martin DISTINCT-ELEMENTS algorithm. I also made small tweaks to the earlier parts. As always, the link from the course website will get you to the latest version of the notes.

Please notify me if you find any typos or errors in the notes. With 20-odd eagle-eyed readers reading these, it would be a shame if something got away!

The LaTeX sources to these notes have been distributed (by email) to all registered students, who can use them while scribing. If you are a registered student and did not get my email about this, please notify me at once.


Posted in Uncategorized by amitc1 on October 15, 2009

As you must have noticed, I did permute the list of topics as mentioned below. I have also made this change on the course website.

Possible permutation

Posted in Uncategorized by amitc1 on October 12, 2009

I am considering permuting the lecture plan a bit, so we cover the F2 algorithm and Johnson-Lindenstrauss next, and then return to the MEDIAN and SELECTION problems afterwards. Just a heads-up.

Lecture notes

Posted in Uncategorized by amitc1 on October 11, 2009

The first set of lecture notes, covering the basics and the first algorithm we saw (Misra-Gries; scanned PDF here) is now ready. This is also linked from the course website. I will add to these notes soon, to include the Flajolet-Martin and AMS algorithms for DISTINCT-ELEMENTS. Also, Radhika and Andrew will add to the notes to include Count-Min Sketch, CountSketch and the BJKST algorithm.

I’ll announce each significant update to the lecture notes on this blog.

Please notify me, or write a comment here, if you spot an error or a typo in the notes, or wish to say something related to all this. Let this blog be at least somewhat informal!

Tagged with:

The first homework set

Posted in Uncategorized by amitc1 on October 10, 2009

It is ready at last! I have posted it to the course website (scroll down all the way). Future homework sets will be announced here as and when they get posted.

Problem 2 on this set refers to the finite field GF(2^n). If you are unfamiliar with this object, read this Wikipedia article. That already gives you enough information for now. If you want to learn more about finite fields, stop by my office sometime or pick up a suitable Mathematics textbook.

Tagged with:

Probability theory reference

Posted in Uncategorized by amitc1 on October 6, 2009

By now, we have introduced the idea of randomized algorithms, and to analyse their performance, we have used some basic probability theory. To recap, the important concepts are:

  • Probability spaces, events, union bound
  • Random variables, expectation, linearity of expectation
  • Independent events, independent random variables
  • Variance, standard deviation

Most importantly, we have introduced the notion of large deviation inequalities, or tail inequalities. Briefly, these inequalities assert that a certain random variable is not too likely to deviate much from its expected value. The important inequalities (so far) are:

  1. Markov’s inequality: If X \ge 0 is a random variable and t \ge 0 a constant, then
    \Pr[X \ge t] ~\le~ \mathbb{E}[X]/t.
  2. Chebyshev’s inequality: If \mu = \mathbb{E}[X], \sigma = \sqrt{\mathrm{Var}[X]}, then
    \Pr[|X - \mu| \ge t\sigma] ~\le~ 1/t^2.
  3. Chernoff bound: If X = X_1 + \cdots + X_n, \mu = \mathbb{E}[X] and \{X_i\} are independent and identically distributed indicator random varibles, then
    \Pr[X - \mu \ge \varepsilon\mu] ~\le~ e^{-\mu\varepsilon^2/3}; \Pr[X - \mu \le -\varepsilon\mu] ~\le~ e^{-\mu\varepsilon^2/3}.

Going forward, it will be vital to have these inequalities at your fingertips, ready for use. An excellent reference covering all of this material is the early chapters of the book Randomized Algorithms, by Motwani and Raghavan.

Tagged with: