SB Log Analyzer

(c) 1997-2001 SB-Software and Scott M Baker, smbaker@sb-software.com


What does it do?

SB Log is intended to process HTTP log files generated by most web providers. SB Log tabulates the data from the log files and generates a statistical report in the form of html documents.

What log files does it work with?

Log file formats tend to vary from one provider to another. SBLog works witht he most common log file formats, including the Apache web server and Microsoft's web server. I have tested it with a few simple providers and it is working fine. I really need user input here on what works and what doesn't. Please let me know how it handles things with your provider. The format that I process looks like this:

194.163.250.33 - - [12/Jun/1997:00:02:07 -0700] "GET /images/bcurprj.jpg HTTP/1.0" 200 1988
194.163.250.65 - - [12/Jun/1997:00:02:07 -0700] "GET /images/bastro.jpg HTTP/1.0" 200 1957
194.163.250.65 - - [12/Jun/1997:00:02:07 -0700] "GET /images/baz.jpg HTTP/1.0" 200 1975

If your log files look like that, then it ought to work fine. If your log format doesn't work, then feel free to send me a chunk of it via email at smbaker@primenet.com -- I'll analyze it and add support in the next release of SBLog.

Where do I get log files from?

The location of the log files depends on your service provider. You will probably have to contact your provider for this information. For example, my provider -- simplenet.com -- stores the log files in /Logs. Linux boxes running Apache typically store their log files in /var/log/httpd. The log files are usually named "access", and depending on the service provider, may be renamed on a daily basis (i.e. access-19990101). If all of this makes little sense to you, then remember this simple suggestion -- ask your provider how to get to the log files!

How do I get log files?

You can usually download them with FTP. I plan on eventually implementing an FTP auto-downloading into SBLog, which will automate the process entirely.


Configuration

There are several "tabs" which select configuration options. In general, input options are located under the <Log Setup> tab and output options are located under the <Report Setup> tab.

Input log file (path) name:

This is the location of your log files. You can use a filename here, such as "c:\logs\access.log". You can specify a wildcard specification, such as "c:\logs\access*.*". You can also just specify a directory and SBLog will look for all files located there, such as "c:\logs\".

Output file (path) name:

This is the output filename and/or pathname of the file that will be the root document of the statistics. If you leave it blank, SBLog will output "index.htm" in the current directory. As with above, you may specify a filename, or a complete path, and SBLog will understand it. There are quite a few files emitted by SBLog, so I suggest you place the log files in a directory dedicated to them.

Item Counts:

These are the number of items to include in the "Top" lists. You may set them to whatever you want.

IP Address Resolution:

SBLog can translate the addresses of the people who access your website into descriptive domain/host names. However, this process does take a considerable amount of time, so you may wish to bypass it by selecting "Do not resolve addresses". Selecting the middle option "Only resolve top lists" will only lookup addresses that are actually listed in the top users lists of the report.

SBLog will remember host names from session to session, so once it has looked up one address, it will not need to do it again.

Bar Color:

This allows you to choose what color you want the bars in the graphs to be. It's simply a matter of personal preference. The colors use the files "redbar.bmp", "bluebar.bmp", and "greenbar.bmp", so you can overwrite one of those with your own file if you would like a custom color choice.


Using the program

These are the normal steps involved in using SBLog:

  1. Download the logfile(s) you want from your server using your favorite FTP program.
  2. Run SBLog.(press the "Run" button)
  3. Load up the main file (what you specified in "Output file (path) name" in the setup dialog) in your favorite WWW browser. You can use the <View> button to automatically launch your browser. Everything should be self explanatory from there.

If you would like to automate use of SBLog, then you can use the "-auto" command line option. This will cause SBLog to automatically start processing log files and automatically exit the program after the report has been generated. Command line parameters may be entered in a variety of ways, depending on the method you are using to launch SBLog. For example, in a windows shortcut, add "-auto" to the end of the Target string (i.e. Target = "c:\Program Files\SBLog\log.exe -auto".

Interpretation:

Interpreting web statistics is more of an art than a science -- you need to have some idea of what trends you're looking for. I'll define a few of the common concepts that are used in SBLog below:

A hit is what happens when a given file is served by the web server. Each hit applies to exactly one file, which may be a web page, image, sound, or other object. When a user retrieves a web page, several hits may be generated. For example, assume your website has a page, "index.html", and it includes three graphics: "pic1.gif", "pic2.gif" and "pic3.gif". Each time a user viewed this page, four hits would be generated.

Web objects are cached by the user's web browser. This means that the browser will remember what it has downloaded for an extended period of time. For example, if your website has a common logo image which is included in many pages, then the logo image will probably only be downloaded with the first page the user downloads, and no hits will be generated for the logo on subsequent pages.

Distinct users are referred to as Clients. The distinction is most prominent in SBLog's directory statistics. In a given directory with several pages and images, a single client may generate several hits for different files in the directory, but will only be counted once in the client count.

Statistics are also ordered by cumulative bytes. For example, you may have two files, "big.gif" and "small.gif". In terms of hits, these files are treated equally -- each time they are accessed, one hit is created. However, in terms of byte count the files may differ considerably. A single hit of "big.gif" may produce more load on the webserver than several hits to "small.gif".


Command Line Options

SBLog supports several command line options to support automated operation:

option description
-autostart Automatically start sblog & close when completed
-input <pathname> Use <pathname> as the input logfile (can use wildcard)
-output <pathname> Use <pathname> as the output directory/filename

Registration:

Please see the accompanying file register.htm.


Contacting Me:

Full details are present in register.htm, but here is a summary:

email: smbaker@sb-software.com

www homepage: http://www.sb-software.com/

SBLog page: http://www.sb-software.com/sblog/sblog.html

US Mail:

Scott M Baker
2241 W Labriego
Tucson, Az 85741


Revision History:

  • Version 1.0
    • Initial public release
  • Version 1.1
    • Added hourglass cursor while writing reports
    • Copied bar images to destination directory
    • Added <View> button to automatically view results
    • Added prompt for setup on first load up
    • Added error message if no input files found
  • Version 1.2
    • Fixed setup button
    • removed extra text on hourly display
    • changed string table default hash size
  • Version 1.3
    • Added support for mustang format log files
    • Made all filename references lower case
  • Version 1.4
    • Fixed non-lowercase part of filenames
    • Added support for Microsoft Internet Server (IIS) log files
    • Added hyperlink to actual file in detailed file stats
  • Version 1.5
    • Converted project to Borland C++ Builder
    • Added registration page
    • Added non-registered nag to report if unregistered
    • Added custom bar selection
    • Better handling of long filenames
    • Multithreaded implementation
    • Quick start instructions
    • new user interface
  • Version 1.6
    • Added stats for referer and browser fields
    • Added skip-ip-resolution button
  • Version 1.7
    • Fixed y2k issue
  • Version 1.8
    • Fixed bug with really long browser/referer lines causing buffer overrun
  • Version 1.9
    • Added command line options
    • Fixed view button not always working
    • Support for extended log format files