Viewing logs using MultiTail. SARG - analyze Squid proxy server logs Installation and configuration of Sarg

Everyone who raises a proxy then wants to see who is using it, who is downloading how much. And sometimes it can be very useful to see in real time who is downloading what. The following programs will be discussed in this topic:
SqStat- Real Time statistics via web
- Squid log analyzer with subsequent HTML generation
SquidView- Interactive console monitor of Squid logs

0. Introduction

I won't tell you how to configure Apache here. There are so many manuals on this topic on the Internet, so go ahead and sing, I will talk about the features that I implemented at home.
Yes, I will tell you using Debian Etch as an example, your paths may differ, keep in mind...
Go…

1. SquidView

This program runs in the console, and displays everything that Squid does there.
Installation:

Aptitude install squidview

Let's wait a couple of seconds if you have fast internet. That's it, now we can see who is downloading what. If you have not changed the location of the logs and left most of the squid parameters default, then to view it you only need to run it, but with root rights, because the squid logs are written by it...

Sudo squidview

I think that this will be enough for you, but I will also tell you very useful things, you need to press the buttons and watch:

  • h - help, here we can learn even more;)
  • l - enter - report generation, you can also configure additional settings
  • T - recording of statistics on the size of the download will begin
  • O - viewing who downloaded what by user, after T

As for SquidView, everything seems to be all right, if I didn’t tell you anything, write me and I’ll add it!

2. SqStat

This is a script that allows you to view active connections, channel load, and average channel load.
I assume that you already have Apache configured.
Download latest version,

Wget -c samm.kiev.ua/sqstat/sqstat-1.20.tar.gz
tar xvfz sqstat-1.20.tar.gz
cd ./sqstat-1.20
mkdir /var/www/squid-stat
cp -R * /var/www/squid-stat*

That's it, now we need to configure Squid-cgi or cachemgr.cgi. Set:
aptitude install squid-cgi

Now you need to configure access...

Nano /etc/squid/squid.conf

Add
acl manager proto cache_object
http_access allow manager localhost
http_access deny manager
#This line sets the password secret and allows you to do everything
cachemgr_passwd secret all

Now you need to fix /etc/squid/cachemgr.conf
echo "*" >> /etc/squid/cachemgr.conf
Instead of * you can put the network address that squid is listening to

For some reason I couldn’t start it at the address 127.0.0.1, so I entered 192.168.0.1 and everything worked. Now you need to enter the external network address in the Cache Host field. What port do you have, in the login field, if you did everything according to the manual, you don’t have to enter anything, and write secret in the password field. If everything went well, then you will see a list of available parameters... You can take a look, and we move on to setting up SqStat...

Nano /var/www/squid-stat/config.inc.php
//This is the address where your squid listens
$squidhost="192.168.0.1";
$squidport=3128;
$cachemgr_passwd="secret";
//This parameter allows names to be resolved by records in your system
$resolveip=false;
//This file contains the IP and names of the computers, you can use the Cyrillic alphabet :)
$hosts_file="hosts";
$group_by="host";

In principle, the config itself is well documented, study it, fortunately there is nothing to study there))

Now we’re making a subdomain, it’s much more convenient)

Nano /etc/apache2/sites-enabled/sqstat

ServerAdmin [email protected]
DocumentRoot /var/www/squid-stat/
ServerName proxy.server.local

To resolve, write to /etc/hosts

Nano /etc/hosts
192.168.0.1 proxy.server.local

That's all :) almost everything

Squid -k reconfigure
/etc/init.d/apache2 reload

3. Sarg

This program generates html reports, draws graphs, etc...
We put:

Aptitude install sarg

Nano /etc/squid/sarg.conf
language Russian_koi8
graphs yes
title "Squid User Access Reports"!}
temporary_dir/tmp
output_dir /var/www/sarg
max_elapsed 28800000
charset Koi8-r

Of course, no one is stopping you from making fun of the display style of this entire facility - the config is provided with very detailed comments.

Crontab -u root -e
* 08-18/1 * * * /usr/sbin/sarg-reports today
* 00 * * * /usr/sbin/sarg-reports daily
* 01 * * 1 /usr/sbin/sarg-reports weekly
* 02 1 * * /usr/sbin/sarg-reports monthly

Epilogue

That's it :)) If you want, you can create a subdomain for it too! This has already been described...
I myself use all three programs and am satisfied.

UPD. To solve the problem with version 3 skid, you need to create a soft link:

Ln -s /var/log/squid3/access.log /root/.squidview/log1

UPD.2. The next article will talk about delay pools

I think many people know the utility tail, which allows you to view the last lines of a text file. The most convenient thing when viewing log files. To all other tail can work in the so-called follow-mode. In this mode, the utility “monitors” changes in the file and prints new lines to the output stream as they appear in real time. Quite often it is necessary to monitor more than one file at a time. Agree, switching between several terminals is not the most convenient way. Fortunately, there is MultiTail is a utility whose main purpose is to display the contents of more than one file simultaneously in one window.

The list of MultiTail features is impressive. Just call it improved tail It would be disrespectful, to put it mildly. Take a look at some of the main features of the utility:

  • output more than one file to the terminal, which is divided into the so-called. window;
  • the terminal can be divided into windows both horizontally and vertically;
  • windows can be created, moved, closed, merged, and temporarily hidden;
  • You can display more than one file in one window;
  • search both in one window and in all at once;
  • filtering strings before output using regular expressions;
  • screen flickering or sound alert when certain text is detected;
  • text color highlighting based on regular expressions;
  • directing output to text files (works similar to tee);
  • work in syslog server mode;
  • suppression of output of duplicate lines;
  • changing MultiTail configuration on the fly in response to changes in the observed file;
  • monitoring the standard input stream;
  • converting IP addresses to hostnames, values errno in text descriptions, etc;
  • various options for cutting long lines: right, left, a certain part;
  • horizontal and vertical scrolling, long line wrapping mode;
  • and many many others!

Impressive? Well, let's install it and try to monitor the log files on the system. MultiTail is present in the repositories of many systems, so you can install it regular means. IN Ubuntu the command is enough:

$ sudo apt-get install multitail

In FreeBSD from ports you can set it like this:

# portsnap fetch update # cd /usr/ports/sysutils/multitail # make install clean

After installation, you can start working immediately. This is what the log file looks like Squid(note the automatic reverse hostname resolution):

It turns out somewhat ugly considering the line breaks. You can ask multitail trim lines on the right (keep only the beginning of the line):

or on the left (leave only the end of the line):

We monitor the ping of google.com:

$ multitail -l "ping google.com"

Let's complicate it. We monitor the ping and look at the protocol Squid:

$ multitail -l "ping google.com" -i /var/log/squid/access.log

We observe two pings and a log file:

Multitail -l "ping ubuntu.com" -l "ping yandex.ru" -i /var/log/squid/access.log

Option using vertical areas (optional "-s 2" divides the terminal into two vertical areas):

$ multitail -s 2 -i /var/log/squid/access.log -l "ping ubuntu.com" -l "ping yandex.ru"

Outputting two commands in one window (option "-L" adds the output of the command to the previous window) and viewing two log files:

$ multitail -l "ping ubuntu.com" -L "ping yandex.ru" -p l -i /var/log/squid/access.log -i /var/log/dmesg

View the Apache log. Option "-cS apache" we inform multitail, that when outputting the log file, you must use the “apache” color scheme. Full list supplied color schemes can be obtained from /etc/multitail.conf.

Search and highlighting using regular expressions:

$ multitail -ec ""GET[^"]+"" -cS apache /var/log/apache2/access.log

Here is a small demonstration of the main, in my opinion, capabilities multitail it worked out. I repeat once again: the utility is very powerful and flexible. Just look at the possibility of executing the specified command if a string matches a regular expression! In general, install, try, read the documentation and you will have at your disposal a powerful system monitoring tool and notifications about the events you need, recorded by many applications in ordinary text log files. Good luck!

Everyone who raises a proxy then wants to see who is using it, who is downloading how much. And sometimes it can be very useful to see in real time who is downloading what. The following programs will be discussed in this topic:
SqStat- Real Time statistics via web
- Squid log analyzer with subsequent HTML generation
SquidView- Interactive console monitor of Squid logs

0. Introduction

I won't tell you how to configure Apache here. There are so many manuals on this topic on the Internet, so go ahead and sing, I will talk about the features that I implemented at home.
Yes, I will tell you using Debian Etch as an example, your paths may differ, keep in mind...
Go…

1. SquidView

This program runs in the console, and displays everything that Squid does there.
Installation:

Aptitude install squidview

Let's wait a couple of seconds if you have fast internet. That's it, now we can see who is downloading what. If you have not changed the location of the logs and left most of the squid parameters default, then to view it you only need to run it, but with root rights, because the squid logs are written by it...

Sudo squidview

I think that this will be enough for you, but I will also tell you very useful things, you need to press the buttons and watch:

  • h - help, here we can learn even more;)
  • l - enter - report generation, you can also configure additional settings
  • T - recording of statistics on the size of the download will begin
  • O - viewing who downloaded what by user, after T

As for SquidView, everything seems to be all right, if I didn’t tell you anything, write me and I’ll add it!

2. SqStat

This is a script that allows you to view active connections, channel load, and average channel load.
I assume that you already have Apache configured.
Download the latest version,

Wget -c samm.kiev.ua/sqstat/sqstat-1.20.tar.gz
tar xvfz sqstat-1.20.tar.gz
cd ./sqstat-1.20
mkdir /var/www/squid-stat
cp -R * /var/www/squid-stat*

That's it, now we need to configure Squid-cgi or cachemgr.cgi. Set:
aptitude install squid-cgi

Now you need to configure access...

Nano /etc/squid/squid.conf

Add
acl manager proto cache_object
http_access allow manager localhost
http_access deny manager
#This line sets the password secret and allows you to do everything
cachemgr_passwd secret all

Now you need to fix /etc/squid/cachemgr.conf
echo "*" >> /etc/squid/cachemgr.conf
Instead of * you can put the network address that squid is listening to

For some reason I couldn’t start it at the address 127.0.0.1, so I entered 192.168.0.1 and everything worked. Now you need to enter the external network address in the Cache Host field. What port do you have, in the login field, if you did everything according to the manual, you don’t have to enter anything, and write secret in the password field. If everything went well, then you will see a list of available parameters... You can take a look, and we move on to setting up SqStat...

Nano /var/www/squid-stat/config.inc.php
//This is the address where your squid listens
$squidhost="192.168.0.1";
$squidport=3128;
$cachemgr_passwd="secret";
//This parameter allows names to be resolved by records in your system
$resolveip=false;
//This file contains the IP and names of the computers, you can use the Cyrillic alphabet :)
$hosts_file="hosts";
$group_by="host";

In principle, the config itself is well documented, study it, fortunately there is nothing to study there))

Now we’re making a subdomain, it’s much more convenient)

Nano /etc/apache2/sites-enabled/sqstat

ServerAdmin [email protected]
DocumentRoot /var/www/squid-stat/
ServerName proxy.server.local

To resolve, write to /etc/hosts

Nano /etc/hosts
192.168.0.1 proxy.server.local

That's all :) almost everything

Squid -k reconfigure
/etc/init.d/apache2 reload

3. Sarg

This program generates html reports, draws graphs, etc...
We put:

Aptitude install sarg

Nano /etc/squid/sarg.conf
language Russian_koi8
graphs yes
title "Squid User Access Reports"!}
temporary_dir/tmp
output_dir /var/www/sarg
max_elapsed 28800000
charset Koi8-r

Of course, no one is stopping you from making fun of the display style of this entire facility - the config is provided with very detailed comments.

Crontab -u root -e
* 08-18/1 * * * /usr/sbin/sarg-reports today
* 00 * * * /usr/sbin/sarg-reports daily
* 01 * * 1 /usr/sbin/sarg-reports weekly
* 02 1 * * /usr/sbin/sarg-reports monthly

Epilogue

That's it :)) If you want, you can create a subdomain for it too! This has already been described...
I myself use all three programs and am satisfied.

UPD. To solve the problem with version 3 skid, you need to create a soft link:

Ln -s /var/log/squid3/access.log /root/.squidview/log1

UPD.2. The next article will talk about delay pools

Recently, our company needed to transfer a proxy server from MS ISA Server to free software. It didn’t take long to choose a proxy server (squid). Taking advantage of several practical recommendations, configured the proxy to suit our needs. Some difficulties arose when choosing a program for traffic accounting.

The requirements were:

1) free software
2) the ability to process logs from different proxies on one server
3) the ability to create standard reports with sending by mail, or a link on a web server
4) building reports for individual departments and distributing such reports to department heads, or providing access via a link on a web server

The developers provide very little information on traffic accounting programs: a laconic description of the purpose of the program plus an optional bonus of a couple of screenshots. Yes, it is clear that any program will calculate the amount of traffic per day/week/month, but additional interesting features that distinguish one program from others are not described.

I decided to write this post in which I will try to describe the capabilities and disadvantages of such programs, as well as some of their key features, in order to help those who have to make a choice a little.

Our candidates:

SARG
free-sa
lightsquid
squidanalyzer
ScreenSquid

Retreat

Information about the “age” of the program and the latest release is not a comparison parameter and is provided for information only. I will try to compare exclusively the functionality of the program. I also deliberately did not consider too old programs that have not been updated for many years.

Logs are sent to the analyzer for processing in the form in which squid created them and will not undergo any pre-processing to make changes to them. Processing of incorrect records and all possible transformations of log fields must be done by the analyzer itself and be present only in the report. This article is not a setup guide. Configuration and usage issues can be covered in separate articles.

So let's get started.

SARG - Squid Analysis Report Generator

The oldest among supported programs of this class (development started in 1998, former name - sqmgrlog). Latest release (version 2.3.10) - April 2015. After that there were several improvements and fixes that are available in the master version (can be downloaded using git from sourceforge).

The program is launched manually or via cron. You can run it without parameters (then all parameters will be taken from the sarg.conf configuration file), or you can specify parameters in command line or a script, for example the dates for which the report is generated.

Reports are created as html pages and stored in the /var/www/html/squid-reports directory (by default). You can set a parameter that specifies the number of reports stored in the catalog. For example, 10 daily and 20 weekly, older ones will be automatically deleted.

It is possible to use multiple config files with different parameters for various report options (for example, for daily reports, you can create your own config, in which the option to create graphs will be disabled and a different directory will be specified for outputting the report).

Details

When entering the main page with reports, we can select the period for which it was created (defined in the report creation parameters), the date of its creation, the number of unique users, the total traffic for the period, the average amount of traffic per user.

When you select one of the periods, we will be able to get a topusers report for this period. Below I will give descriptions and examples of all types of reports that SARG can make.

1) topusers - total traffic by users. A user is either the name of the host to which Internet access is granted, or the user's login. Sample report:

IP addresses are displayed here. When configured to enable the corresponding option, IP addresses are converted to domain names.

Are you using authentication? Accounts converted to real names:

The appearance can be customized in a css file. The displayed columns are also customizable, and unnecessary ones can be removed. Column sorting is supported (sorttable.js).

When you click on the graph icon on the left, you will see a graph like this:

When you click on the icon on the right, we get report 5.

2) topsites - report on the most popular sites. By default, a list of the 100 most popular sites is displayed (the value can be adjusted). Using regular expressions or setting aliases, you can combine traffic from domains of the 3rd and higher levels to a 2nd level domain (as in the screenshot) or set any other rule. For each domain, you can set a rule separately, for example, for yandex.ru and mail.ru to combine up to the 3rd level. The meaning of the fields is quite obvious.

3) sites_users - a report on who visited a specific site. Everything is simple here: the domain name and who accessed it. Traffic is not shown here.

4) users_sites - a report on sites visited by each user.

Everything is clear here too. If you click on the icon in the first column, we get the report viii).

5) date_time - distribution of user traffic by day and hour.

6) denied - requests blocked by squid. This displays who, when and where access was denied. The number of entries is configurable (default is 10).

7) auth_failures - authentication failures. HTTP/407.
The number of entries is configurable (default is 10).

8) site_user_time_date - shows what time the user visited which site and from which machine.

9) downloads - list of downloads.

10) useragent - report on programs used

The first part of the report displays the IP address and the useragents used.

The second contains a general list of useragents with distribution in percentages, taking into account versions.

11) redirector - the report shows who has had access blocked using the blocker. Squidguard, dansguardian, rejik are supported, the log format is customizable.

SARG has more than 120 settings parameters, language support (100% of messages are translated into Russian), support for regular expressions, work with LDAP, the ability to provide users with access only to their reports on the web server (via .htaccess), the ability to convert logs into their own format to save space, upload reports to text file for subsequent filling of the database, working with squid log files (splitting one or more log files by day).

It is possible to create reports for a specific set of specified groups, for example, if you need to make a separate report for a department. In the future, access to the web page with department reports can be provided, for example, to managers using a web server.

You can send reports by e-mail, however, for now only the topusers report is supported, and the letter itself will be simple text without HTML support.

You can exclude certain users or certain hosts from processing. You can set aliases for users, combining the traffic of several accounts into one, for example, all outstaffers. You can also set aliases for sites, for example, combine several social networks, in this case, all parameters for the specified domains (number of connections, traffic volume, processing time) will be summed up. Or, using a regular expression, you can “discard” domains above level 3.
Possible unloading in separate files list of users that exceeded certain volumes for the period. The output will be several files, for example: userlimit_1G.txt - exceeding 1 Gb, userlimit_5G.txt - exceeding 5 Gb and so on - 16 limits in total.

SARG also has a couple of PHP pages in its arsenal: viewing current connections to squid and for adding domain names to squidguard block lists.

In general, this is a very flexible and powerful tool that is easy to learn. All parameters are described in default configuration file, the project on sourceforge has more detailed description all parameters in the wiki section, divided into groups, and examples of their use.

free-sa

Domestic development. There have been no new versions since November 2013. More than stated quick creation reports compared to competing programs and less space required for ready-made reports. Let's check!

In terms of operating logic, this program is closest to SARG (and the author himself compares it with this program (for example,)), so we will compare it with it.

I was pleased that there were several design themes. The theme consists of 3 css files and 4 png icons corresponding to them.

Reports are actually done faster. The daily report was created at 4:30, when SARG's was 12 minutes. However, the volume occupied was not the case: the volume occupied by reports is 440 MB (free-sa) and 336 MB (SARG).

Let's try to give a more difficult task: process a 3.2 GB log file in 10 days, which contains 26.3 million lines.

Free-sa also made the report faster, in 46 minutes, the report takes up 3.7 GB of disk space. SARG spent 1 hour 10 minutes, the report takes up 2.5 GB.

But both of these reports will be awkward to read. Who, for example, would want to manually calculate which domain is more popular - vk.com or googlevideo.com and manually count the traffic of all their subdomains? If you leave only 2nd-level domains in the SARG settings, then creating a report will take about the same time, but now the report itself takes up 1.5 GB on disk (daily from 336 MB has decreased to 192 MB).

Details

When entering the main page we see something like the following (the blues theme is selected):

To be honest, the purpose of displaying the year and months is unclear; when you click on them, nothing happens. You can write something in the search field, but again nothing happens. You can select the period of interest.

List of blocked URLs:

CONNECT metdod report:

PUT/POST metdod report:

Popular sites:

The report on the effectiveness of the proxy server seemed interesting:

User report:

When you click on the graph icon in the second column, we get a graph of Internet usage by a specific user:

When you click on the second icon, we get a table of Internet channel loading by hour:

When you select an IP address, we get a list of sites by user in descending order of traffic:

All statistics are displayed in bytes. To switch to megabytes you need to set the parameter

reports_bytes_divisor="M"

The program does not accept compressed log files, does not accept more than one file with the -l parameter, and does not support filtering files by mask. The author of the program suggests circumventing these restrictions by creating named pipes.

An annoying glitch was discovered - when the length of the log line is too long, timestamps are entered instead of addresses:

When viewing the traffic of this “user” you can see the domain with the source of the error:

Thus, the number of users has increased several times.

If we compare these two programs, free-sa creates the report a little faster. I was not able to detect a 20-fold increase in speed, as stated by the author. Perhaps it can be seen under certain conditions. I think it doesn’t matter how long it takes to create a weekly report at night - 30 minutes or 50. In terms of the amount of space occupied by reports, free-sa has no advantage.

lightsquid

Perhaps the most popular traffic counter. It works quickly, reports do not take up much disk space. Although this program has not been updated for a long time, I still decided to consider its capabilities in this article.

The logic of the program is different: the program reads the log and creates a set of data files, which it then uses to create web pages. That is, there are no pre-created reports with data here; pages with data are generated on the fly. The advantages of this solution are obvious: to obtain a report, it is not necessary to parse all the logs for the period; it is enough to “feed” the accumulated log to lightsquid once a day. You can do this using cron several times a day, even several times a day, to quickly add a new piece of information.

There are some drawbacks: it is impossible to process logs from different servers and collect statistics in one place: when processing a log for a day from another server, the existing statistics for that day are erased.

There is a strange limitation: lightsquid “perceives” both uncompressed log files and compressed ones (gz - exactly), but in the second case the file name must be in the following format: access.log.X.gz, files with the name format access.log- YYYYMMDD.gz will not accept it.

Through simple manipulations we overcome this limitation and see what happens.

Details

The report for the month (total traffic 3 TB, 110 million lines) took up 1 GB of disk space.

On home page We see traffic by day for the current month.

When you select a day, we see a report for the day for all users:

If groups are specified, the name of the group to which the user belongs is displayed in the right column. Users who are not members of any group are grouped into group 00 no in group (they are marked with a question mark in this report).

When choosing on home page grp on the corresponding date we get to the report page of users divided into groups. Those not included in any group are listed first, then the groups in order.

When you click on the name of the group in the table on the right, we go below to the place on the page where the report for this group begins:

When you click on “Top sites report” we get a report on popular sites for the day:

Big files report:

Let's move on to the table on the right.
Here you can get a list of top sites for the month and for the whole year (they look the same, so no screenshot), general statistics for the year and month, as well as statistics for the year and month by group.

Statistics for the month:

By clicking on the clock icon we can see a table of sites, access time and traffic consumed per hour:

Statistics for the day are displayed here, but for the month and for the year it will look approximately the same, hourly statistics for domains will be summed up.

When you click on the graph icon, we can see the user’s traffic consumption during the month:

The graph columns are clickable: when you click on a column, you go to the user’s statistics for another day.

By clicking on [M], we will receive a report on the user’s traffic consumption during the month, indicating the volume for each day and for the full week.

When you click on the user's name, we get a list of sites that the user visited in descending order of traffic:

Well, that seems to be all. Everything is simple and concise. IP addresses can be converted to domain names. With the help of regular expressions, domain names can be combined into 2nd level domains, just in case I provide regular expression:

$url =~ s/(+:\/\/)??(+\.)(0,)(+\.)(1)(+)(.*)/$3$4/o;

If you have skills in perl, you can customize it to suit your needs.

squidanalyzer

A program similar to lightsquid and also written in Perl. Prettier design. The latest version 6.4 was released in mid-December of this year, many improvements have been made. Program website: squidanalyzer.darold.net.

Squidanalyzer can use multiple processors on your computer (-j option), which results in faster reporting, but this only applies to uncompressed files. For packed ones (gz format is supported), processing occurs using one processor core.

And one more comparison with lightsquid: the same report on the same server took about a day, it takes up 3.7 GB on disk.

Just like lightsquid, squidanalyzer will not be able to combine two or more log files from different servers for the same period.

More details

Home page - you can select the year of the report.

When selecting any period (year, month, week, day) appearance web pages will be similar: at the top there is a menu with the following reports: MIME types, Networks, Users, Top Denied, Top URLs, Top Domains. Below are proxy statistics for the selected period: Requests (Hit/Miss/Denied), Megabytes (Hit/Miss/Denied), Total (Requests/Megabytes/Users/Sites/Domains). Below is a graph of the number of requests per period and traffic.

There is a calendar in the upper right corner. When you select a month, you can see brief statistics and a download graph by day:

The calendar allows you to select a week. When selected, we will see similar statistics:

When you select a day, you see statistics by hour:

Content Type Report:

Networks report.

User report.

When you select a user, we get his statistics for the period.

Prohibited resources:

Report on 2nd level domains.

On my own behalf, I would like to note the very slow operation of the program as information accumulates. With each new log, statistics for the week, month and year are recalculated. Therefore, we recommend this program for processing logs from a server with big amount I wouldn't do traffic.

screensquid

This program has a different logic: the log is imported into the database MySQL data, then data is requested from it when working in the web interface.

More details

The program cannot import log files with an arbitrary name; it binds only to access.log.

Home page:

Brief statistics:

You can create aliases for IP addresses:

... and then they can be combined into groups:

Let's move on to the main thing - reports.

On the left is a menu with report types:

User traffic logins
IP address user traffic
Website traffic
Top sites
Top users
Top IP addresses
By time of day
User traffic logins expanded
IP address user traffic extended
IP address traffic with resolution
Popular sites
Who downloaded large files
Traffic by period (days)
Traffic by period (day name)
Traffic by period (months)
HTTP statuses
Login IP addresses
Logins from IP addresses

Examples of reports.

IP address user traffic:

Website traffic:

Top sites:

... further, to be honest, I didn’t have enough patience to study the possibilities, since the pages began to be generated in 3-5 minutes. The “time of day” report for the day, the log for which was not imported at all, took more than 30 seconds to create. For a day with traffic - 4 minutes:

That's all. I hope this material is useful to someone. Thank you all for your attention.

The entry was published by the author in the Uncategorized category. Add to bookmarks.

Recently, our company needed to transfer a proxy server from MS ISA Server to free software. It didn’t take long to choose a proxy server (squid). Using several practical recommendations, I configured the proxy to suit our needs. Some difficulties arose when choosing a program for traffic accounting.

The requirements were:

1) free software
2) the ability to process logs from different proxies on one server
3) the ability to create standard reports with sending by mail, or a link on a web server
4) building reports for individual departments and distributing such reports to department heads, or providing access via a link on a web server

The developers provide very little information on traffic accounting programs: a laconic description of the purpose of the program plus an optional bonus of a couple of screenshots. Yes, it is clear that any program will calculate the amount of traffic per day/week/month, but additional interesting features that distinguish one program from others are not described.

I decided to write this post in which I will try to describe the capabilities and disadvantages of such programs, as well as some of their key features, in order to help those who have to make a choice a little.

Our candidates:

SARG
free-sa
lightsquid
SquidAnalyzer
ScreenSquid

Retreat

Information about the “age” of the program and the latest release is not a comparison parameter and is provided for information only. I will try to compare exclusively the functionality of the program. I also deliberately did not consider too old programs that have not been updated for many years.

Logs are sent to the analyzer for processing in the form in which squid created them and will not undergo any pre-processing to make changes to them. Processing of incorrect records and all possible transformations of log fields must be done by the analyzer itself and be present only in the report. This article is not a setup guide. Configuration and usage issues can be covered in separate articles.


So let's get started.

SARG - Squid Analysis Report Generator

The oldest among supported programs of this class (development started in 1998, former name - sqmgrlog). Latest release (version 2.3.10) - April 2015. After that there were several improvements and fixes that are available in the master version (can be downloaded using git from sourceforge).

The program is launched manually or via cron. You can run it without parameters (then all parameters will be taken from the sarg.conf configuration file), or you can specify parameters on the command line or script, for example, the dates for which the report is generated.

Reports are created as html pages and stored in the /var/www/html/squid-reports directory (by default). You can set a parameter that specifies the number of reports stored in the catalog. For example, 10 daily and 20 weekly, older ones will be automatically deleted.

It is possible to use several config files with different parameters for different report options (for example, for daily reports, you can create your own config, in which the option for creating graphs will be disabled and a different directory for outputting the report will be specified).

Details

When entering the main page with reports, we can select the period for which it was created (defined in the report creation parameters), the date of its creation, the number of unique users, the total traffic for the period, the average amount of traffic per user.

When you select one of the periods, we will be able to get a topusers report for this period. Below I will give descriptions and examples of all types of reports that SARG can make.

1) topusers - total traffic by users. A user is either the name of the host to which Internet access is granted, or the user's login. Sample report:

IP addresses are displayed here. When configured to enable the appropriate option, IP addresses are converted to domain names.

Are you using authentication? Accounts are converted to real names:

The appearance can be customized in a css file. The displayed columns are also customizable, and unnecessary ones can be removed. Column sorting is supported (sorttable.js).

When you click on the graph icon on the left, you will see a graph like this:

When you click on the icon on the right, we get report 5.

2) topsites - report on the most popular sites. By default, a list of the 100 most popular sites is displayed (the value can be adjusted). Using regular expressions or setting aliases, you can combine traffic from domains of the 3rd and higher levels to a 2nd level domain (as in the screenshot) or set any other rule. For each domain, you can set a rule separately, for example, for yandex.ru and mail.ru to combine up to the 3rd level. The meaning of the fields is quite obvious.

3) sites_users - a report on who visited a specific site. Everything is simple here: the domain name and who accessed it. Traffic is not shown here.

4) users_sites - a report on sites visited by each user.

Everything is clear here too. If you click on the icon in the first column, we get report 8).

5) date_time - distribution of user traffic by day and hour.

6) denied - requests blocked by squid. This displays who, when and where access was denied. The number of entries is configurable (default is 10).

7) auth_failures - authentication failures. HTTP/407.
The number of entries is configurable (default is 10).

8) site_user_time_date - shows what time the user visited which site and from which machine.

9) downloads - list of downloads.

10) useragent - report on programs used

The first part of the report displays the IP address and used useragents.

In the second - a general list of useragents with distribution in percentage, taking into account versions.

11) redirector - the report shows who has had access blocked using the blocker. Squidguard, dansguardian, rejik are supported, the log format is customizable.

SARG has more than 120 settings parameters, language support (100% of messages are translated into Russian), support for regular expressions, work with LDAP, the ability to provide users with access only to their reports on the web server (via .htaccess), the ability to convert logs into their own format to save space, uploading reports to a text file for subsequent filling of the database, working with squid log files (splitting one or more log files by day).

It is possible to create reports for a specific set of specified groups, for example, if you need to make a separate report for a department. In the future, access to the web page with department reports can be provided, for example, to managers using a web server.

You can send reports by e-mail, however, for now only the topusers report is supported, and the letter itself will be simple text without HTML support.

You can exclude certain users or certain hosts from processing. You can set aliases for users, combining the traffic of several accounts into one, for example, all outstaffers. You can also set aliases for sites, for example, combine several social networks into a certain alias, in this case all parameters for the specified domains (number of connections, volume of traffic, processing time) will be summed up. Or, using a regular expression, you can “discard” domains above level 3.
It is possible to upload a list of users who exceeded certain volumes for a period to separate files. The output will be several files, for example: userlimit_1G.txt - exceeding 1 Gb, userlimit_5G.txt - exceeding 5 Gb and so on - 16 limits in total.

SARG also has a couple of PHP pages in its arsenal: viewing current connections to squid and for adding domain names to squidguard block lists.

In general, this is a very flexible and powerful tool that is easy to learn. All parameters are described in the default configuration file; the project on sourceforge has a more detailed description of all parameters in the wiki section, divided into groups, and examples of their use.

free-sa

Domestic development. There have been no new versions since November 2013. Claims faster report generation than competing programs and less space required for completed reports. Let's check!

In terms of operating logic, this program is closest to SARG (and the author himself compares it with this program (for example,)), so we will compare it with it.

I was pleased that there were several design themes. The theme consists of 3 css files and 4 png icons corresponding to them.

Reports are actually done faster. The daily report was created at 4:30, when SARG's was 12 minutes. However, the volume occupied was not the case: the volume occupied by reports is 440 MB (free-sa) and 336 MB (SARG).

Let's try to give a more difficult task: process a 3.2 GB log file in 10 days, which contains 26.3 million lines.

Free-sa also made the report faster, in 46 minutes, the report takes up 3.7 GB of disk space. SARG spent 1 hour 10 minutes, the report takes up 2.5 GB.

But both of these reports will be awkward to read. Who, for example, would want to manually calculate which domain is more popular - vk.com or googlevideo.com and manually count the traffic of all their subdomains? If you leave only 2nd-level domains in the SARG settings, then creating a report will take about the same time, but now the report itself takes up 1.5 GB on disk (daily from 336 MB has decreased to 192 MB).

Details

When entering the main page we see something like the following (the blues theme is selected):

To be honest, the purpose of displaying the year and months is unclear; when you click on them, nothing happens. You can write something in the search field, but again nothing happens. You can select the period of interest.

List of blocked URLs:

CONNECT metdod report:

PUT/POST metdod report:

Popular sites:

The report on the effectiveness of the proxy server seemed interesting:

User report:

When you click on the graph icon in the second column, we get a graph of Internet usage by a specific user:

When you click on the second icon, we get a table of Internet channel loading by hour:

When you select an IP address, we get a list of sites by user in descending order of traffic:

All statistics are displayed in bytes. To switch to megabytes you need to set the parameter


The program does not accept compressed log files, does not accept more than one file with the -l parameter, and does not support filtering files by mask. The author of the program suggests circumventing these restrictions by creating named pipes.

An annoying glitch was discovered - when the length of the log line is too long, timestamps are entered instead of addresses:

When viewing the traffic of this “user” you can see the domain with the source of the error:

Thus, the number of users has increased several times.

If we compare these two programs, free-sa creates the report a little faster. I was not able to detect a 20-fold increase in speed, as stated by the author. Perhaps it can be seen under certain conditions. I think it doesn’t matter how long it takes to create a weekly report at night - 30 minutes or 50. In terms of the amount of space occupied by reports, free-sa has no advantage.

lightsquid

Perhaps the most popular traffic counter. It works quickly, reports do not take up much disk space. Although this program has not been updated for a long time, I still decided to consider its capabilities in this article.

The logic of the program is different: the program reads the log and creates a set of data files, which it then uses to create web pages. That is, there are no pre-created reports with data here; pages with data are generated on the fly. The advantages of this solution are obvious: to obtain a report, it is not necessary to parse all the logs for the period; it is enough to “feed” the accumulated log to lightsquid once a day. You can do this using cron several times a day, even several times a day, to quickly add a new piece of information.

There are some drawbacks: it is impossible to process logs from different servers and collect statistics in one place: when processing a log for a day from another server, the existing statistics for that day are erased.

There is a strange limitation: lightsquid “perceives” both uncompressed log files and compressed ones (gz - exactly), but in the second case the file name must be in the following format: access.log.X.gz, files with the name format access.log- YYYYMMDD.gz will not accept it.

Through simple manipulations we overcome this limitation and see what happens.

Details

The report for the month (total traffic 3 TB, 110 million lines) took up 1 GB of disk space.

On the home page we see traffic by day for the current month.

When you select a day, we see a report for the day for all users:

If groups are specified, the name of the group to which the user belongs is displayed in the right column. Users who are not members of any group are grouped into group 00 no in group (they are marked with a question mark in this report).

When you select grp on the main page for the corresponding date, you are taken to the user report page, divided into groups. Those not included in any group are listed first, then the groups in order.

When you click on the name of the group in the table on the right, we go below to the place on the page where the report for this group begins:

When you click on “Top sites report” we get a report on popular sites for the day:

Big files report:

Let's move on to the table on the right.
Here you can get a list of top sites for the month and for the whole year (they look the same, so no screenshot), general statistics for the year and month, as well as statistics for the year and month by group.

Statistics for the month:

By clicking on the clock icon we can see a table of sites, access time and traffic consumed per hour:

Statistics for the day are displayed here, but for the month and for the year it will look approximately the same, hourly statistics for domains will be summed up.

When you click on the graph icon, we can see the user’s traffic consumption during the month:

The graph columns are clickable: when you click on a column, you go to the user’s statistics for another day.

By clicking on [M], we will receive a report on the user’s traffic consumption during the month, indicating the volume for each day and for the full week.

When you click on the user's name, we get a list of sites that the user visited in descending order of traffic:

Well, that seems to be all. Everything is simple and concise. IP addresses can be converted to domain names. Using regular expressions, domain names can be combined into 2nd level domains; just in case, here is a regular expression:

$url =~ s/(+:\/\/)??(+\.)(0,)(+\.)(1)(+)(.*)/$3$4/o;

If you have skills in perl, you can customize it to suit your needs.

SquidAnalyzer

A program similar to lightsquid and also written in Perl. Prettier design. The latest version 6.4 was released in mid-December of this year, many improvements have been made. Program website: squidanalyzer.darold.net.

SquidAnalyzer can use multiple processors on your computer (the -j option), which results in faster reporting, but this only applies to uncompressed files. For packed ones (gz format is supported), processing occurs using one processor core.

And one more comparison with lightsquid: the same report on the same server took about a day, it takes up 3.7 GB on disk.

Just like lightsquid, SquidAnalyzer will not be able to merge two or more log files from different servers for the same period.

More details

Home page - you can select the year of the report.

When you select any period (year, month, week, day), the appearance of the web pages will be similar: at the top there is a menu with the following reports: MIME types, Networks, Users, Top Denied, Top URLs, Top Domains. Below are proxy statistics for the selected period: Requests (Hit/Miss/Denied), Megabytes (Hit/Miss/Denied), Total (Requests/Megabytes/Users/Sites/Domains). Below is a graph of the number of requests per period and traffic.

There is a calendar in the upper right corner. When you select a month, you can see brief statistics and a download graph by day:

The calendar allows you to select a week. When selected, we will see similar statistics:

When you select a day, you see statistics by hour:

Content Type Report:

Networks report.

User report.

When you select a user, we get his statistics for the period.

Prohibited resources:

Report on 2nd level domains.

On my own behalf, I would like to note the very slow operation of the program as information accumulates. With each new log, statistics for the week, month and year are recalculated. Therefore, I would not recommend this program for processing logs from a server with a lot of traffic.

ScreenSquid

This program has a different logic: the log is imported into a MySQL database, then data is requested from it when working in the web interface. The database with the processed ten-day log mentioned earlier occupies 1.5 GB.

More details

The program cannot import log files with an arbitrary name; it binds only to access.log.

Home page:

Brief statistics:

You can create aliases for IP addresses:

... and then they can be combined into groups:

Let's move on to the main thing - reports.

On the left is a menu with report types:

User traffic logins
IP address user traffic
Website traffic
Top sites
Top users
Top IP addresses
By time of day
User traffic logins expanded
IP address user traffic extended
IP address traffic with resolution
Popular sites
Who downloaded large files
Traffic by period (days)
Traffic by period (day name)
Traffic by period (months)
HTTP statuses
Login IP addresses
Logins from IP addresses

Examples of reports.

IP address user traffic:

Website traffic:

Top sites:

... further, to be honest, I didn’t have enough patience to study the possibilities, since the pages began to be generated in 3-5 minutes. The “time of day” report for the day, the log for which was not imported at all, took more than 30 seconds to create. For a day with traffic - 4 minutes:

Add tags

Send anonymously