How I log - part 1
Now that I've lamented the amount of data I have, I might as well mention how I record it all. I'll start with an overview of logging and then go deeper.
There are basically 3 types of site logging. The first is webserver logging, the second is remote logging and the third is programmatic logging.
Webserver is the easiest of the logging to use but in my opinion the least useful. Basically, every request to the webserver is taken and written to a log. Maybe you have control over what is written, maybe not but the bottom line is a 'glob' of data. The main problem is that you have no control over the data. If you have 1,000,000 hits from Google, you'll have 1,000,000 entries of the Google user agent. This can lead to huge reams of data that can't be used unless it's run through some log reader like webtrends. Personally, I NEVER log through the webserver.
Remote logging was once viewed with suspicion as you're giving your visitor statistics to some third party and not always gaining anything from it other than a hit count or the like. For this reason I avoided remote logging until Google Analytics came along. If you've ever used Google Analytics, I can tell you that your data is used to great effect through that service. You get so much data you can devote your entire day to reviewing it and seeing how it can be used. Statistics, charts and more are nicely laid out. I'll have to blog about that sometime in the future.
As a side note on Google Analytics data, I was looking over the screen resolution of the visitors to House of Fusion and over 39% of the people have a resolution of 1024 wide or more. This tells me that I can add an extra column of useful data (latest CF-Talk, CF-Jobs, Fusion Authority, important blog posts, etc.) to the site and it'll be used by people. But this is all something for a future post. :)
The third type of logging is what I actually do. This involves some sort of web based programming language. You might know of one of them. Something called Cold Fusion or ColdFusion or something like that. Basically, the language has access to all of the CGI information about the visitor. This means you can store it all in a nice clean DataBase. But wait, there's more. Because you have control over the whole thing, you can do data manipulation. Lets say you have 1,000,000 Google hits today. Rather than recording Google's User Agent 1,000,000 times, all you have to do is have an Agent lookup table and record the User Agent once. Then the log only needs the ID of the google agent. A single number takes far less Database space than a text value of up to 100 characters.
The same can be done with other pieces of data which means you have a final DataBase that's made up of 1-3 text fields with the rest being numbers. This is perfect for those who like tight DataBase structures or have limited space. But beware of success as a good site can still get gigs of data even with a tight DataBase structure.
So now that we've decided on the way we're going to log, we need the how. I'll discuss that a little later after I deal with all the flack from dissing webserver logging. :)


"This involves some sort of web based programming language. "
But what about content that isn't requested via the application server? Documents, static .html pages, images etc...
For example, an html page can have javascript that either does external logging or internal logging or an image ('web bug') that allows for external logging (even when the external is a local website). On the other hand, images, pdf files, and other material of the sort can only be logged through webserver logging.
Just curious. Thanks.