pseudo-memory leak
For the last few weeks I've been having some problems with House of Fusion. The memory for the JRun.exe has been going through the roof and I didn't know why. The code was tight, nothing had really changed on the site, so what was up? The answer was Yahoo.
In the last 3 weeks Yahoo has ramped up their indexing of sites. For a site as large as House of Fusion, this can take quite a bit of time. I've logged 2-4 yahoo bot hits per second at some times.
So how was yahoo the problem? Because of client variables. Not DB client variables and not even the dreaded registry client variables. Just simple cookie based client variables. It seems that when a client variable is set, a memory structure is also set for CF. Now each bot hit is assumed to be it's own session as it does not accept cookies. This mean each bot hit generates a memory structure of about 1k. Now this is not really a lot, but when you have a few 10's of thousands of hits from bots a day, it adds up.
I'm still waiting on word from Macromedia as to when a client memory structure times out, but this seems to be the issue.
So what's the solution? There are 4.
1. Increase your ram. If you can do this, then ramp up your memory as high as you can. This is not a perfect solution but it saves throwing time at the problem and gives you a 'buffer' against problems of this sort.
2. Set a robots.txt with a Crawl-delay setting. Mine is set to 1 second but you can set yours to something higher
3. set a different cfapplication for the most common bots. I use a simple regular expression to find key words that only exist in bots:
<CFIF REFindNoCase('Slurp|Googlebot|BecomeBot|msnbot|Mediapartners-Google|ZyBorg|RufusBot|EMonitor', cgi.http_user_agent)>
<CFAPPLICATION name="FusionA" clientmanagement="no" sessionmanagement="no" setclientcookies="no" setdomaincookies="no" clientstorage="Cookie">
<CFELSE>
<CFAPPLICATION name="FusionA" clientmanagement="yes" sessionmanagement="no" setclientcookies="yes" setdomaincookies="no" clientstorage="Cookie">
</CFIF>
This will make sure that a client structure is NOT created for one of these bots.
4. Use the same regex to clean out the client structure after the bot finishes the page. Use structclear(client) to remove the data in the onRequestEnd.cfm, the onRequestEnd method of the application.cfc or in the template itself.
Bottom line is that while bots are great for indexing your content, they can cause havoc on your system when a lot of memory is assigned to what is essentially a 'dead session'.


With reference to solution 3, if you change the cfapplication attributes with each request isn't this going to cause problems? One minute the FusionA application uses client management, and the next it doesn't. Aren't these settings application wide? Won't it cause problems if a request starts off with the application set one way, and ends up with it set another?
Cheers
Bert
When I make use of a client var, I always check that it exists first so there's no real problem.
<cfif (it's a bot)>
<cfapplication name="FusionA_ForBots" clientmanagement="no" (etc)>
<cfelse>
<cfapplication name="FusionA" clientmanagement="yes" (etc)>
</cfif>