Code Snippet: Keep Track of Hits from your Incoming Links
When I used to run internet lotto games, I had an affiliate program, and used to have to keep track of hits coming in from my affiliates' websites, so I could credit them with their commissions.
One of my clients recently asked me if there was any way she could tell whether she was getting any hits to her website from her reciprocal links. The 'lightbulb' lit over my head! Why not do it just the way I used to do it for my affiliates? This code snippet is the result.
Using this technique requires a bit of advance planning, since your incoming links must have an ID attached the URL in order to identify where the hit to your website came from. This means that every time you arrange to have your link posted on another website, it must contain a unique identifier in the querystring.
http://www.yourwebsite.com/index.shtml?ID=729
When someone clicks on this link, the ID is picked up by a Perl script included on the 'index.shtml' page, and saved to a file on your site. Notice the 's' in the name of the page. Most UNIX web servers will allow pages with the '.shtml' extension to 'include' a Perl script on that page, which will then be executed before the page is output to the browser.
How to use the code: Put the following line somewhere in your 'index.shtml' page. (I usually put SSI includes like this one in the <head> section, just to keep them organized.)
<!--#exec cgi="/cgi-bin/enterlink.cgi"-->
Here is the code for the 'enterlink.cgi' script, which should be placed in your 'cgi-bin' folder, of course.
1 #!/usr/bin/perl
3 ### enterlink.cgi
5 print "Content-type: text/html\n\n";
7 ### Get the date/time and environmental variables
8 my $dt = &dt;
9 my $ID = $ENV{'QUERY_STRING'};
10 my $ref = $ENV{'HTTP_REFERER'};
12 ### Get rid of everything before and after the ID key
13 $ID =~ s/.*ID=//;
14 $ID =~ s/&.*//;
16 ### Write link information to the linklog.txt file
17 if ($ID ne "")
18 {
19 open (FILE, ">>../files/linklog.txt") || die "Error Opening Logfile for Write";
20 print FILE "$dt | $ID | $ref |\r\n";
21 close FILE;
22 }
24 sub dt
25 {
26 my($sec,$min,$hour,$mday,$mon,$year) = gmtime();
27 sprintf("%04d-%02d-%02d %02d:%02d:%02d GMT",
$year+1900, $mon+1, $mday, $hour, $min, $sec);
28 } # end sub dt
Explanation: Line 1 may need to be changed to what your server needs for the shebang line. Nothing else need be changed - the script should work as is.
Line 8 retrieves the date/time using the subroutine 'dt' in lines 24-28. This is formatted as YYYY-MM-DD HR:MN:SC, which may be changed if you wish. I have also used Greenwich Mean Time. (I can never seem to remember the time zone in which my servers are located.)
In lines 13-14, the 'ID=' part of the querystring is removed, as is anything that may be there after the ID number.
Finally, in lines 17-22 a new line is added to the 'linklog.txt' file, which should NOT be located in the 'cgi-bin' folder on your website. I keep my logs in a seperate folder called 'files'.
How to view the log: All you need to do is type the URL of your logfile into the address bar of your browser, like:
http://www.yourwebsite.com/files/linklog.txt
The entries will look something like the following:
2006-01-14 03:43:55 GMT | 729 | http://www.referringsite.com/ |
2006-01-16 20:11:08 GMT | 704 | |
When there is an entry with a blank HTTP_REFERER, it means that either the referring site has disabled this environmental variable (a not uncommon occurrence nowadays), or the hit was from a spider that is checking the referring site's links.
I also have a script that I use to keep track of search engine hits and the keywords that were used to find the website, and another script to record hits by search engine spiders. ( Update: I don't use these other scripts any more, since Google Analytics does a much better job of tracking these items. )