Close

Apache Tomcat logs analysis with ELK and Elassandra

In this post, we will setup Filebeat, Logstash, Elassandra and Kibana to continuously store and analyse Apache Tomcat access logs.

ELLK

By using a cassandra output plugin based on the cassandra driver, logstash directly sends log records to your elassandra nodes, ensuring load balancing, failover and retry to continously send logs into the Elassandra cluster. See the cassandra driver documentation for more details.

Step 1 - Filebeat

For Filebeat installation, please see installation instructions. Then, in /etc/filebeat/filebeat.yml, Filebeat is configured to send Tomcat access logs to the Logstash running locally on TCP port 5044.

filebeat.prospectors:
- input_type: log
  paths:
    - /opt/apache-tomcat-8.5.11/logs/*.txt

output.logstash:
  hosts: ["localhost:5044"]

Check your filebeat configuration :

/usr/share/filebeat/bin/filebeat -e -v -path.config /etc/filebeat/ -path.home /usr/share/filebeat/ -path.data /var/lib/filebeat
2017/09/12 12:20:26.920042 beat.go:297: INFO Home path: [/usr/share/filebeat/] Config path: [/etc/filebeat/] Data path: [/usr/share/filebeat//data] Logs path: [/usr/share/filebeat//logs]
2017/09/12 12:20:26.920080 beat.go:192: INFO Setup Beat: filebeat; Version: 5.6.0
2017/09/12 12:20:26.920180 logstash.go:90: INFO Max Retries set to: 3
2017/09/12 12:20:26.920246 metrics.go:23: INFO Metrics logging every 30s
2017/09/12 12:20:26.920303 outputs.go:108: INFO Activated logstash as output plugin.
2017/09/12 12:20:26.920662 publish.go:300: INFO Publisher name: strapdata-3
2017/09/12 12:20:26.922242 async.go:63: INFO Flush Interval set to: 1s
2017/09/12 12:20:26.922254 async.go:64: INFO Max Bulk Size set to: 2048
Config OK

If you want to re-process your files, don't forget to remove the filebeat registy file located in the /var/lib/filebeat directory.

Step-2 Logstash

For logstash installation, please see installation instructions. Install the logstash cassandra output plugin to directly send logs to cassandra :

sudo /usr/share/logstash/bin/logstash-plugin install logstash-output-cassandra

Configure the logstash input in /etc/logstash/conf.d/10-input-tomcat-logs.conf to receive filebeats messages.

input {
        beats {
            host => "127.0.0.1"
            port => "5044"
        }
    }

Add your logstash filter configuration in /etc/logstash/conf.d/20-filter-tomcat-logs.conf to parse tomcat access logs. It basically removes non-stored fields and set the target cassandra table name in the logstash metadata.

filter {
    # access.log
    if ([source] =~ /.*\.txt$/) {
        grok {
            # Access log pattern is %a %{waffle.servlet.NegotiateSecurityFilter.PRINCIPAL}s %t %m %U%q %s %B %T "%{Referer}i" "%{User-Agent}i"
            # 10.0.0.7 - - [03/Sep/2017:10:58:19 +0000] "GET /pki/scep/pkiclient.exe?operation=GetCACaps&message= HTTP/1.1" 200 39
            match => [ "message" , "%{IPV4:clientIP} - %{NOTSPACE:user} \[%{DATA:timestamp}\] \"%{WORD:method} %{NOTSPACE:request} HTTP/1.1\" %{NUMBER:status} %{NUMBER:bytesSent}" ]
            remove_field => [ "message" ]
            add_field => { "[@metadata][cassandra_table]" => "tomcat_access" }
        }
        grok{
            match => [ "request", "/%{USERNAME:app}/" ]
            tag_on_failure => [ ]
        }
        date {
            match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
            remove_field => [ "timestamp" ]
        }
        ruby {
            code => "event.set('ts', event.get('@timestamp'))"
        }
        mutate {
            lowercase => [ "user" ]
            convert => [ "bytesSent", "integer", "duration", "float" ]
            update =>  { "host" => "%{[beat][hostname]}" }
            remove_field => [ "beat","type","geoip","input_type","tags" ]
        }
        if [user] == "-" {
            mutate {
                remove_field => [ "user" ]
            }
        }
        # drop unmatching message (like IPv6 requests)
        if [message] =~ /(.+)/  {
            drop { }
        }
    }
}

Finally, in file /etc/logstash/conf.d/80-output-cassandra.conf, configure the cassandra output plugin to send logstash fields in your Elassandra cluster. Special attention should be given to fields types to match your cassandra schema, here for fields ts, clientIP and status (case sensitive) mapped to CQL types timestamp, inet and int.

output {
    cassandra {
        hosts => [ "localhost" ]
        port => 9042
        protocol_version => 4
        consistency => 'one'
        keyspace => "logs"
        table => "%{[@metadata][cassandra_table]}"
        username => "cassandra"
        password => "cassandra"
        # Cast fields to the rigtht cassandra driver types
        hints => {
            ts => "timestamp"
            clientIP => "inet"
            status => "int"
        }

        retry_policy => { "type" => "default" }
        request_timeout => 1
        ignore_bad_values => false
        flush_size => 500
        idle_flush_time => 1
    }
}

You can test your logstash configuration by running the following command, and check for any error messages in the logstash logs, located in /var/log/logstash :

/usr/share/logstash/bin/logstash -f test.conf --debug --path.config /etc/logstash/conf.d/ --path.settings /etc/logstash --path.logs /var/log/logstash/

Step-3 Elassandra

With cqlsh, create a keyspace and a Cassandra table to store your tomcat logs. To ensure uniqueness of a log entry, the Cassandra primary key is based on the host, source and offset provided by filebeat. Of course, we have a cassandra column for each logstash field.

CREATE KEYSPACE logs WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '1'}  AND durable_writes = true;
CREATE TABLE logs.tomcat_access (
    source text,
    offset bigint,
    app text,
    bytessent bigint,
    clientip inet,
    host text,
    method text,
    request text,
    status int,
    ts timestamp,
    user text,
    PRIMARY KEY ((host, source, offset))
);

Create an elasticsearch index with an automatic mapping discovery.

    curl -XPUT "http://localhost:9200/logs/" -d '{ "mappings" : { "tomcat_access":{ "discover":".*"}}}'

Then you should see your Elasticsearch index in your Elassandra cluster :

$ indices
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana fYbnnE6fTB6WxykDme1uIw 4 1 12 0 63.4kb 63.4kb
green open logs UFLug38ZToGsFNQ70QTaiw 4 0 0 0 412b 412b

Step-4 Kibana

In Kibana, create a new time-based index logs, using the time filter field ts as shown bellow.
Kibana new index screenshot

Then visualise your data and create nice kibana dashborads at your convenience.

Kibana Discover screenshoot

Conclusion

Apache Cassandra, built with native multi data center replication in mind, can ensure continuous logstash writes without any additionnal software, and our Elasticseach tight integration provides powerfull search and analytics features. Moreover, this design smoothly scale by adding more nodes without the need to overshard or reshard your Elasticsearch indices.