Formal Definition

We have, so far, taken an informal approach to the application of the Big Oh notion. If I tell you the runtime of a function is $T_1(n) = 3n + 5$ for instance, you know its running time is $O(n)$. You know this because you learned to suppress constant factors and ignore lower terms in $T_1(n)$. As such, you know the running times $T_2(n)=0.34n$ and $T_3(n) = 234234n$ are also $O(n)$.

It seems that $O(n)$ describes a group of functions that all share a characteristic: they all exhibit a runtime function that linearly grows as the size of input increases.

The second algorithm ($T_2$) is obviously faster among the three, but the difference between them is negligible in comparisons to another algorithm with e.g. runtime of $O(n^2)$. Why is that?

The $g(n) = n^2$ function grows much faster than $f(n) = cn$ when $n$ get arbitrarily large (for any $c > 0$).

We can, more formally, define Big Oh notation as follows.

$O(f(n))$ is a set (or class) of functions that grow no faster than $f(n)$.

So when we say running time of an program is $O(n)$ for instance, we are saying its running time is a function, one of many in a set of functions which all share a linear rate of growth.

Any other function $T(n)$ is in the set $O(f(n))$ if $\frac{T(n)}{f(n)}$, is bounded above by some constant $c$ as $n$ gets arbitarily large.

  • The term "bounded above" means the value of $\frac{T(n)}{f(n)}$, as you increase the $n$, eventually goes and stays at or below some constant $c$.

  • The term "constant" means it does not depend on $n$.

For example, $T_1(n) = 3n + 5$ is in $O(n)$ because the ratio between $T_1(n)$ and $g(n) = n$ stays at or below $c = 4$ as long as $n \ge 1$.

$$ \lim_{n\rightarrow +\infty } \frac{3n+5}{n} < 4 $$

So we write $T_1(n) \in O(n)$.

Many references write $T_1(n) = O(n)$ which (in this context) means the same thing.

The constant $c$ in the definition of Big Oh is not describing the number that the ratio approaches! It's describing an upper bound on the ratio. That's why in the example above I used $c=4$ (and not $c=3$).

The constant $c$ is not necessarily unique. In the example above, I could have chosen any $c \ge 4$.

Case in point!

Big Oh notation groups functions into a set of classes: All functions in a particular class are considered to have the same efficiency.

This means we consider $T_1(n)$, $T_2(n)$, and $T_3(n)$, all having the same efficiency!

Aside: Why are we obsessed with analyzing runtime when the input gets arbitrarily large?!

When it comes to computational problems, the input usually gets bigger. Think about it! Your social network app is likely to gain more users and more cat pictures as time goes on!!