Opened 17 years ago
#295 new enhancement
Rework stats tables to avoid the need to store totals
Reported by: | rjl | Owned by: | rjl |
---|---|---|---|
Priority: | normal | Milestone: | 1.1.0 |
Component: | SQL scripts | Version: | 1.0.0 |
Severity: | normal | Keywords: | stats average totals maia_stats maia_stats_history |
Cc: |
Description
In the maia_stats and maia_stats_history tables, we currently store item counts as well as a "total" column that holds a running sum of all the items of a given type processed to date. Since this is an ever-growing figure, the total column is doomed to eventually overflow whatever storage container we use for it (currently BIGINT and UNSIGNED BIGINT). It would be best to avoid storing the total figure at all, particularly since we can always compute it by other means.
Specifically, I propose that instead of storing totals we store averages. Typically we think of averages as:
avg = sum / count
but we also know that
sum = avg * count
so if we only store averages and counts, we can always obtain the sum figure when we need it. When a new item gets processed, the average gets updated like so:
newavg = ((oldavg * count) + newitem) / (count + 1)
By computing the sum instead of storing it, we avoid having to deal with an ever-growing figure (and hence the use of disgustingly large data types like BIGINTs). Instead of:
total_*_size
we would store:
average_*_size
The same principle can be applied to other stats for which we store totals, particularly when we're only storing the totals in order to compute averages.