How to monitor and verify QRadar logs ingress queues

Logs ingestion is the root of QRadar collection and further, the detection, so it may be interesting to know some key commands to monitor what is going on.

The first step of log collection in QRadar is handled by 2 main services:

  • ecs-ec-ingress
  • ecs-ec

That's why it is very important to monitor these queues when you have collection issue and in general check regularly some key values. Indeed, an abnormal increase of these queues can be due to several known issues such as:

  • communication disruption between collector appliance and processor appliance
  • parsing issue of logs incoming
  • performance issue during the parsing process (tldr; the appliance collects too much logs compared to the time necessary to parse every property for each log)
  • ...

You have two solutions to have these data:

  • connect to the Java Beans
  • use OS tool to check directories size

Using Java Beans

/opt/qradar/support/jmx.sh -p 7787 -b 'com.q1labs.sem:application=ecs-ec-ingress.ecs-ec-ingress,type=sources,name=Source Monitor'

Here a sample of output datas:

com.q1labs.sem:application=ecs-ec-ingress.ecs-ec-ingress,type=sources,name=Source Monitor
-----------------------------------------------------------------------------------------
LongWindowLengthInSecs: 900
EventImmediateWindowAverage: 146.66997754832747
FlowRate: 0.0
FlowImmediateWindowAverage: 0.0
FlowLongWindowAverage: 0.0
ImmediateWindowLengthInSecs: 300
MaximumFlowRateSinceStartup: 0.0
EPSThreshold: 100.0
EventLongWindowAverage: 146.5133252461116
FPSThreshold: 0.0
EventRate: 143.3984350068397
MaximumEventRateSinceStartup: 6300.482630829263

As you can see, you will not have information about the global size available for the queue as well as the size taken by logs being processed. However, you have very useful information like:

  • EventRate: can be assimilated to EPS (Event Per Second), it is important to have a value that is below your license to avoid dropping events. Moreover, you have to verify if your appliance can handle this kind of volumetry, for that point, check the prerequisites in the IBM website.
  • EventLongWindowAverage + LongWindowLengthInSecs: combinate, these values allow you to verify the "average speed" of your EPS considering the EventRate is the "instant speed". For example here are some situations:
    • EventLongWindowAverage >> EventRate:
      • you had a big spike of logs and the situation is going back to normal
      • you are losing some important logs compared to your classical collection
    • EventLongWindowAverage << EventRate:
      • you are facing a spike of logs at the moment of the check
      • you had a major collection issue and you are collecting back important log sources
    • EventLongWindowAverage ~ EventRate:
      • you event rate is stable over time, it is a good point

You can have the same values for ecs-ec service using this command:

/opt/qradar/support/jmx.sh -p 7777 -b 'com.q1labs.sem:application=ecs-ec.ecs-ec,type=sources,name=Source Monitor'

Using Linux basics

du -sh /store/persistent_queue/*

Here a sample of output datas:

80M     /store/persistent_queue/ecs-ec.ecs-ec
52M     /store/persistent_queue/ecs-ec-ingress.ecs-ec-ingress

With this command, you will have the amount of space taken by logs in the queues. I am using the -h option to have a human readable output, but you can remove this option to have a more precise value. In addition to that, you will need to execute the command below.

# df -Th | grep '/store$'
/dev/mapper/storerhel-store      xfs       146G   62G   84G  43% /store

Then, you obtain the total size of the partition where queues are located. As well as the previous command, you can have human readable or not output by adding or removing the -h.

With these informations you have to check:

  • if the size of the queue is constantly increasing, you can combine the above commands with a watch to observe the evolution of the output every n seconds
    • check the logs of the associated service with journalctl -fu <my service> and execute the appropriate actions needed
  • if the size of the queue is high and steady (around giga or plus)
    • check the logs of the associated service too and check if you do not have EPS performance issues. You can check the audit and health logs of QRadar too.

Among these indications, the most important thing to keep in mind is to have enough space available for logs to be stored in queues during disruptions. Knowing how long for you can store collected data without processing it is the key.

If you want to deep dive into the history of these services I suggest you to read this comment of @JonathanP_QRadar in Reddit which describes what was the collection before, the evolution and some basics.


You reach the end of this article, huge thanks for reading 🫢

πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ To become a more friendly guru of QRadar, join the community and subscribe to the newsletter.

πŸ€— To become a nicer guru of QRadar, leave a comment, your feedback will always be welcome (when constructive of course).


Subscribe to qradar.guru

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe