Different notions of complexity
- worst case complexity – the number of steps in the worst possible scenario
- average case complexity – complexity of one operation on a random input (assuming some distribution of inputs)
- expected complexity – complexity of one operation averaged over random decisions of the algorithm
- amortized complexity – average complexity of one operation in any sequence of operations
- expected amortized, … :-)
These notions give different guarantees:
- worst case complexity – every operation takes at most \(O(T(n))\) time
- amortized complexity – some operations may take more time, but any sequence of \(n\) operations takes \(O(n\cdot T(n))\) time
- expected complexity – every operation takes \(O(T(n))\) time on average (when the algorithm is randomized and uses coin flips to make some decisions, a single operation may take different times depending on the results of coin flipping); thus, \(n\) operations take expected $O(nT(n)) time
- average case complexity – one operation takes \(O(T(n))\) time and \(n\) operations take \(O(n\cdot T(n))\) time on average input; there may be (viciously constructed) inputs that take much longer
Worst case vs. average case complexity
Example 1: Imagine that we have an unsorted array \(A\) with \(n\) numbers and we would like to find a given number \(x\) in \(A\). In the worst case, we have to access all the \(n\) elements but, if \(A\) is a random permutation and \(x\) is in \(A\), on average, we only access \(n / 2\) elements.
Example 2: We want to sort an array of \(n\) numbers. A simple way to do that is insertion sort (in the \(i\)-th step, we have first \(i\) items already sorted and we insert the next item into this sequence). What is the number of comparisons this algorithm makes? In the best case, the array is already sorted and insertion sort makes \(n - 1\) comparisons to check that. In the worst case, the array is sorted but in the opposite order. Insertion of the \(i\)-th item takes \(i - 1\) comparisons which sums to approximately \(n^2 / 2\) comparisons. In the average case, it is approximately \(n^2 / 4\).
Example 3: In the quicksort algorithm, we take the first number in the array as a pivot and divide the whole array into two parts: numbers smaller than the pivot and numbers bigger than the pivot. This can be done easily in \(O(n)\) time. Then we apply the same procedure on the two smaller parts. In the best case, pivot divides the array exactly into halves and the algorithm runs in \(O(n\log n)\) time. However, in the worst case, when the array is already sorted, the pivot is always the smallest number, so all the numbers are on one side leading to a terrible \(\Theta(n^2)\) performance. Therefore, a better choice for a pivot is the middle number: At least when the input is sorted, this variant runs in \(O(n\log n)\) time. It is a little harder to construct the worst case input, but still, on some permutations the algorithm runs in \(\Theta(n^2)\) time. On the other hand, on most inputs, the quicksort algorithm is very fast and the average case complexity is \(O(n\log n)\).
Average case vs. expected complexity
Example 4: Let us continue with the quicksort algorithm from Example 3, but this time, shuffle the array before sorting it (alternatively, instead of picking the middle element as a pivot, pick a random one). We claim that this modification has expected \(O(n\log n)\) running time. What is the difference? In Example 3, the algorithm was deterministic – if we ran it twice on the same input, the algorithm would work exactly the same and would run the same amount of time. The average time was over all possible inputs - over all \(n\!\) permutations (provided that all permutations were equally likely), but an adversary, after inspecting our implementation, could come up with a “killer” sequence on which our algorithm runs in \(\Theta(n^2)\). On the other
hand, our new algorithm is randomized. If we run it twice on the same input, most probably it will not run the same way due to the random decisions. No adversary can come up with a killer sequence now – by shuffling the input, we have, so to speak, turned the worst case into an average case. The previous algorithm ran fast on the vast majority of inputs (only a little slower on some inputs and catastrophically on some specially constructed inputs); this algorithm runs fast on all inputs (only sometimes slower and very slowly only with negligible probability). Here, the average is over random choices, not over inputs.
Worst case vs. amortized complexity
Example 5: Imagine we would like to implement a stack of objects with operations push(x)
and pop()
. This can be easily done with an array and a pointer to the top item. Both operations take \(O(1)\) time in the worst case. Now imagine that we would like to add a clear()
operation that removes all the items from the stack. One solution is to simply move the top pointer to the beginning of the array (all the elements are kept in the memory, but will be overwritten by subsequent pushes). Another possibility is to pop
the items one by one until the stack is empty. What is the complexity of this solution? Clearly, the time to pop
\(n\) items is \(O(n)\). However, imagine a sequence of n operations push(x)
, pop()
, and clear()
. If clear()
runs in \(O(n)\) time, does it mean that \(n\) operations take \(\Theta(n^2)\) time? No. We can only pop
the items that were pushed onto the stack and thus the time for all pops
and clear
s cannot be (asymptotically) bigger than the time for all the push
es. Any sequence of \(n\) operations runs in \(O(n)\) total time. So even though one clear
operation is slow, if we compute an average time as the [total time] / [number of operations], we find out that one operation takes \(O(1)\) time on average. Therefore, we say that the clear
operation runs in amortized \(O(1)\) time - averaged over any sequence of operations.
Example 6: Imagine that we have an array of length \(n\) with operations set(i,x)
which sets the \(i\)-th item to value \(x\) and init()
, which sets all the \(n\) items in the array to zero. Is this amortized \(O(1)\) time? Surely not. If we only used init
once at the beginning and then \(n - 1\) set operations would follow, the average time per one operation would be \(O(1)\). However, if we only call init
operations (and no set
operations), the average time per one operation is not any better than the worst case time, i.e., \(O(n)\). (Contrast this with calling the clear
operation from the previous example \(n\) times.)
Example 7: Nowadays, modern programming languages usually include an implementation of a dynamic (resizable) array such as vector
in C++, ArrayList
in Java, or built-in list data type in Python. How do these data structures work? For this example, let us make things simple: we want a data structure supporting random access (get \(i\)-th item) and operation push(x)
that inserts element \(x\) at the end of array. The problem is that we don’t know in advance, how big an array we need and we don’t want to waste memory. A simple solution is to have an array of size equal to the number of elements it contains. For each push(x)
operation, we resize the array (so that one more element fits in) and add \(x\) at the end. However, think about what resizing really means: (since the memory right after our array may be already taken), we have to allocate a new, bigger array, copy all the elements, and finally dispose the old array. This takes \(O(n)\) time, which means that (due to resizing) simply pushing \(n\) items into the array takes \(O(n^2)\) time. This is not acceptable.
Instead of increasing the size by one, we could increase it by, say, 100. This would make the resizing 100-times less frequent, but the worst case complexity for push
ing \(n\) items would still be \(O(n^2 / 100) = O(n^2)\). A better solution is to always double the array capacity. This way, for every resize operation increasing capacity from \(c\) to \(2c\) which takes \(O(c)\) time, there must have been \(c / 2\) pushes which took only constant time. If we push \(n = 2k\) items, the total time spent by resizing is only \(1 + 2 + 4 + 8 + \cdots + 2^k < 2^{k+1} = O(n)\), so the average (amortized) time for one push operation is \(O(1)\).
Average case vs. amortized complexity
Example 8: Let us continue with Example 7 and extend our dynamic array with operation pop()
which removes the last element. So this time, the number of elements in the array may both increase and decrease and we still don’t want to waste much memory. After going through Example 7, an obvious (and wrong) solution suggests itself: when the number of elements drops below half the capacity, resize the array to half its size. Why is this solution slow? (Try to find it out by yourself, first.) We show a sequence of operations that takes quadratic time: First, push
\(n = 2^k\) items so that the array is full. Now alternately push
and pop
one item \(n\) times. Since the array is full, it gets doubled with the push
, but with the next pop
, the number of elements drops below one half and the array is halved (and full again). So these \(2n\) operations take \(O(n^2)\) time.
Example 9: Hashing