Tuesday, 17 December 2013

Track and trace with Camel and Splunk

We have been using Camel for system integration for some years now, and are very pleased with the flexibility it has provided us.
Some time ago I build a Camel component for Splunk which now is on Camel master to be released with the coming 2.13.0 release, and I think a blog post about how it came about would be in it's place.


The big picture

We are running a JMS hub and spokes architecture with a central message hub consisting of topics and queues.

On each side of the hub we have Camel routes that act as integration points to other system.
Some routes act as consumers that collects data from provider systems. These integrations are typically pull based e.g. databases of various flavors, file, ftp, S3 or JMS.
The data collected is transformed to a common format (xml), and published to a queue or topic on the hub.

On the other side of the hub we have Camel routes that act as event providers. These routes consume the messages from the hub, transforms them to a target system specific format, and sends them on to the destination system using a variety of protocols such as SOAP, HTTP, database tables, stored procedures, JMS, file, ftp and raw socket.

All in all we are very pleased with this architecture since it has provided us with the desired flexibility and robustness.

Tracing messages

Early in the process we discovered that integration often can be kind of a black box, and you have to think of insight and traceability from the start. We needed insight in what was going on when we routed messages between systems, and also keep history of the message flow.
Therefor every integration adapter is publishing audit's (a copy of the original payload received or published) with some additional meta data in the header about the integration, to a audit queue on the hub.

Example of a route with a custom Camel component that creates a audit trail.

Audit trail adapter

The audit trail adapter consumes the messages from the audit queue (AUDIT_HUB), and stores the message payload in a database where they are kept for short time storage. This is usually enough to answer questions like "what happened yesterday between 9 and 10 on integration x, what did the message contain, and did the target system receive the message".


There is also a Angular app. that makes it possible for users to search and view events passing through the integration platform.


This has made it possible to gain fine grained insight over a short period of time, but for a more holistic and proactive approach something else was needed.

Splunk

That was when I stumbled upon Splunk. It has a lot of features to ingest data of any kind, awesome search features on big data, alerting, and a really easy way to build dashboards with real time data if needed.
To get data into Splunk I created the Splunk component, and with that in place it's kind of easy to get data into Splunk as this example illustrates.


With data in a Splunk index the fun begins !!
Now we can use the Splunk web to search the data, and to build a dashboard with panels that should go on a display in our office, both for insight and proactive alerting.
First up is the data that we have ingested in the audit-trail index. We want to display a real time graph of events flowing through the platform by the top most active adapters.
This is done using the Splunk web creating a search query, and when happy with the result choose a way to visualise it. The end result is a panel in xml format :


Panels can be combined to build a dashboard page like this one.

Splunk comes with the possibility to install different apps. from a Splunk "App Store". The middle 2 panels are build using the Splunk jmx app. With this app. you can connect to running jvm's and ingest jmx data into a Splunk index.
The app. has a configuration file where you can configure which mbeans an attributes should be ingested.
Since Camel exposes a lot of jmx stats. you can even ingest that into Splunk as this sample snip config illustrates.

The Camel stats. can then be used in Splunk to do ad hoc. reporting, dashboards and alerting as you need it.

Search and view Camel jmx attributes Splunk


We had cases where event processing took too long since it we were dealing with recording of live streams. With the Camel stats. we could build a report that showed the integrations involved (routes), and at which times there were long processing times. With this information at hand it's easier to drill down and make decisions of where to fix a problem.



The final piece for the dashboard should be a platform health indicator like (Ok, Warn and Error).
Since our integration platform already has a rest endpoint where our monitoring system collects status information from, we can use that to ingest status data into Splunk.

For that we installed another Splunk app rest_ta The app. calls the rest endpoint, and ingests status information into a surveillance-status index.
The dashboard panel uses a range component from Splunk to indicate the status :


Final dashboard on our office wall, with the health status at the top.



Nearly forgot to mention that we also created alerts when certain events happen e.g when no data ingested on a given integration. 

My final words on Splunk would be that it's a swiss army knife for analyzing, understanding and using data, and that I'm only starting to the grip off the possibilities because there are so many.

If want to try out the Camel and Splunk combo there is a small twitter sample hosted at my Git hub repo.


17 comments:

  1. Great article. I love Splunk and Camel (this component suits my needs perfectly as well), I did notice many online webinars promoting how Splunk utilizes Mulesoft to connect Salesforce and NetSuite. How do these integrations co-exist (Mule and Camel)? Do you run Mule as an iPass service for one integration and Camel embedded in an on-premise application? I use Camel currently but I am being pressured to use Mule as well. I'm trying to find a way to make the two different applications co-exist without duplicating my efforts.

    ReplyDelete
  2. Thanks.
    I haven't used Mule so I can not comment on how it compares.

    Currently we are running our integration's as plain war files deployed on a couple of Weblogic nodes.

    One of Camel's powers is that it is flexible about how you want to run it, since it's 'only' a framework that can run standalone, in jee servers, or in osgi containers.

    Kai Wähner has written a couple of comparisons that might be helpful for you eg.
    http://www.kai-waehner.de/blog/2012/01/10/spoilt-for-choice-which-integration-framework-to-use-spring-integration-mule-esb-or-apache-camel/

    I tend to agree with with the conclusions.

    Also the latest Camel release includes a Camel Salesforce component http://camel.apache.org/salesforce.html

    ReplyDelete
  3. I appreciate the quick reply. I do enjoy the flexibility of Camel and currently will continue to utilize it as our singular integration solution.
    I have read Kai Wähner's article and his suggestions have been extremely helpful with my decision-making process so far (it seems he is all over the web with helpful advice). I was just hoping to see if anyone was using multiple integration architectures (frameworks and ESB's) in production and possibly in a collaborative manner. I will continue to dig. Thanks again.

    ReplyDelete
  4. Preben,

    Do you have any instructions for the Camel/JMX connection piece? I'm currently using Splunk and the JMX monitor on another application just fine. However, I am having issues connecting the JMX app to my Camel fabric.

    Great work!

    ReplyDelete
  5. Seems that there is a remote jmx connect problem.
    Is fabric set up to grant remote jmx access with or without credentials ?
    Can you remote connect to fabric using JConsole ?

    ReplyDelete
  6. Java opt set on our jvm with running Camel :

    JAVA_OPTIONS="${JAVA_OPTIONS} -Djava.rmi.server.hostname=xxxhost"
    JAVA_OPTIONS="${JAVA_OPTIONS} -Xmanagement:ssl=false,authenticate=false,autodiscovery=true,port=7090"

    The last one Xmanagement is proprietary. You should properly look at something like http://stackoverflow.com/questions/834581/remote-jmx-connection

    The corresponding Splunk jmx config in the cluster section:

    jmxserver host="xxxhost" jvmDescription="xxxhost" jmxport="7090"

    Hope it helps.

    ReplyDelete
  7. Wouldn't it be better if we just log whatever is that we are interested and make Splunk just monitor the log files/folders. Since, we may have logs from any layer , web layer , services layer and so and not just routes.

    Your approach seems to one way, but still it is tightly coupled with Camel and Splunk.

    It would become difficult to make changes if we later in the future decide to switch to another integration framework or another log analysis tool.

    So, I feel log files can serve that purpose very well with a standard log4j format. We just log what we want and have splunk or other tool monitor it.


    ReplyDelete
    Replies
    1. Sure - there are a lot of ways to achieve audit depending on your use case.
      Log forwarding works fine, but requires a forwarder on each box.

      The solution I describe is actually very decoupled from camel/splunk since all audit from different components are done to audit queues, and only the audit queue consumer component has knowledge/depends on camel/splunk.

      If we decide to switch to another tech. there is only one place that needs to be changed.

      Delete
  8. I do agree that forwarders can do the work except for the fact they need to be in all required nodes. But, that is a Ops concern.

    For the Dev, they only need to focus on the actual logic rather than logging , which will clutter the code if added at many places. so, better if they handled as a seperate aspect/concern. Also, if we are pushing to splunk indexer, then it would be a synchronous one. This might slow execution speed.

    So, I feel we may have to leverage messaging when trying to push to splunk indexer using the camel-splunk component.






    ReplyDelete
    Replies
    1. Agreed - And I would only do that if messaging already is part of your system design.
      Otherwise I would use the aspects as you mentioned, log to files, and then have a Splunk forwarder index them.

      Speaking logging we actually regard audit and logging as separated concerns.

      Audit is more about business transactions (system a sent message 1 to system b, system b received message 1 enriched/transformed it to x and send it to c .......
      In our case this is deliberate developer choice where to audit business transactions, and not hidden away in a aspect.
      This article is about that concern.

      At the same time we do logging is traditional log files with technical details about errors and what not.

      As a side node since it's a long time since i wrote the article I can say that Splunk has been a real joy. Not one hick-up i the years running, and the insight you get takes much guesswork out of your daily tasks, and enables you to be proactive in solving problems.

      Today though there are some strong alternatives, and I would certainly look at eg. Kibana as a strong oss alternative.

      Delete
  9. Hello Preben-
    Thanks for all your pointers! I am now able to pull in the needed JMX data. Now we will move into dashboard creation. :-)

    ReplyDelete
  10. we are offering best splunk online training with job support and high quality training facilities and well expert faculty . to Register you free demo please visit ,splunk training in hyderabad

    ReplyDelete
  11. Thank you for sharing your thoughts. I really appreciate your efforts and I am waiting for your next write ups thank you once again.
    Tableau Training
    Tableau Online Training

    ReplyDelete
  12. Great post thank you for sharing this article. We provide Splunk Online Training

    ReplyDelete
  13. Nice Blog. Thank you sharing the Splunk training info. Keep sharing

    Online SPLUNK Training
    Online SPLUNK Training in Hyderabad

    ReplyDelete
  14. Great Article. its is very very helpful for all of us .
    Mulesoft online training

    ReplyDelete