Here is a quick tutorial on how to write a syslog FlexConnector using the ArcSight SmartConnector framework. As a sample I am using the logs generated from the Open Source pfSense firewall filter logs. We will cover the basics but also include some best practices and finish with creating categorization for the events, these will be made to follow the standard set by other firewalls already parsed by the ArcSight SmartConnector framework.
Understanding The Log Format
PfSense provide a nice description of their filter log format and it can be easily configured to send this to a syslog receiver, the log format is essentially a string of comma separated values which change depending on the log format.
The page describing the Filter Log Format can be found here - Filter_Log_Format_for_pfSense_2.2
There are a set of common fields:
- Rule Number
- Sub rule number
- Tracker - unique ID per rule, tracker ID is stored with the rule in config.xml for user added rules, or check /tmp/rules.debug
- Real interface (e.g. em0)
- Reason for the log entry (e.g. match)
- Action taken that resulted in the log entry (e.g. block, pass)
- Direction of the traffic (in/out)
- IP version (4 for IPv4, 6 for IPv6)
Messages are then split into either IPv4 or IPv6:
- Protocol ID
- Protocol text (tcp, udp, etc)
- Flow Label
- Hop Limit
- Protocol ID
IPv4 or IPv6:
- Source IP
- Destination IP
For IPv4 or IPv6 there is then UDP, TCP, or ICMP submessages types:
For TCP and UDP (Proto ID 6 or 17) on IPv4 or IPv6
- Source Port
- Destination Port
- Data Length
- TCP Flags
- Sequence Number
- ICMP Type, used to choose between the following possibilities
Below are a few sample logs of differing formats:
We can see the “common field” portion of each log type below:
Writing the FlexConnector Parser
Although this file is predominantly made up of comma separated values it will actually serve us better to use a regex processor which opens up the ability to have different sub-message formats to account for the non-common csv fields.
I have attached the finished parser with comments, as it’s probably a good idea to see the full structure before I attempt to break it out and describe its individual sections.
Because we have an idea of the set of common csv fields, it is a good starting point to create the initial regex and begin tokenizing this regex into known fields. The initial regex must be unique enough to ensure we capture the pfSense logs, but not any other logs which may also be sent to the syslog connector.
A sensible looking regex to capture this would be
We are using a catch all at the end of the regex for the section of the message which will vary depending on Transport Protocol and IP version. This will be passed off to a sub-message which will be discussed later.
As we have a good description of the pfSense log format we can use this to tokenize each of the extracted fields in the parser, I have tokenized these as follows:
After tokenizing it is worthwhile finding fields to map these tokens to. I have chosen the following which are hopefully self explanatory:
As you can see, some of the fields are a direct mapping, others however utilise what is known as a “double underscore operator”. There are plenty of these all detailed in the FlexConn Developers Guide. The two double underscore operators used above are:
__stringConstant = Sets the field as the string value.
__getDeviceDirection = Changes the text string “in” or “out” to a 0, or 1 as understood by ArcSight.
There are many more, one of which is a
__regexToken which provides a third dimension of regex parsing, if one was ever needed.
The full list of these operators is documented on page 171 of the attached FlexConnector Developer’s Guide
Based on our previous analysis we know that there are a few different types of log format which based on our current regex and tokenization will end up in token.name, the Message field. To deal with this we use a powerful feature of the ArcSight FlexConnector framework and that is the ability to have
submessages. Apologies if I over emphasise this feature, but it has proved useful in every log format I have ever needed to parse. Essentially you select a field to act as a submessage identifier and another field to act as the submessage token which needs parsing, after selecting a submessage identifier and submessage field you can have one or more of each. In our example we will select the IPVersion as the submessage identifier because we know we have IPv4 and IPv6, we then select the Message field as the submessage token field this will contain the different message formats under the IPv4 or IPv6 identifiers, namely the TCP, UDP and ICMP variations.
This is done by adding the following two lines:
For each of the sample events provided above, we should break out what would be extracted by the regex, tokenized into the Message field and passed through to the relevant submessage for further parsing by the submessage pattern regex. As an example I am only concerned with IPv4 so the 3 messageids we get are as follows:
Note the only difference is the number of comma separated values extracted, and they key udp/tcp/icmp field.
Completing the Submessage Parsers
I am going to use the IPv4 UDP submessage as an example by stepping through each line and explaining its purpose and how it works. The full submessage section for IPv4 UDP is:
This is the second submessage messageid (they begin at 0), this submessage is looking for
4 in the
IPVersion field (we previously defined this as the submessage.messageid.token). Anything with
4 in the
IPVersion field will have its
Message field passed through to this submessage section for additional parsing based on the submessage[x].pattern[x].regex detailed above.
This states there are 3 different possible patterns, we know these to be UDP, TCP and ICMP.
This is the first regex pattern for the submessage. As with submessages the pattern count also starts at 0.
These two lines should be treated together in this configuration. The mappings line is not strictly required if and only if you tokenize all of the fields extracted by the regex in the correct order. In this case we are not doing this, and only care about a subset of the fields.
When defining a subset of fields you must provide a fields and mappings configuration so that the framework knows where to place the field. In our case:
- event.transportProtocol = $8
Extramappings are incredibly useful when you want to add arbitrary string fields, or perform additional functions to the extracted field before mapping it to a field. In the example above I am making the event.name field more readable, and parsing the IP addresses, which are currently strings, into the required format by ArcSight, this is covered by the __regexTokenAsAddress operator which is documented fully in the FlexConnectors Developer Guide attached earlier.
Applying the Parser
At this stage we are ready to apply the parser to a syslog daemon Smart Connector. To do this we place the
pfsense.subagent.sdkrfilereader.properties file under
/opt/arcsight/connectors/syslog/current/user/agent/flexagent/syslog/ changing /opt/arcsight/connectors/syslog depending on where you have installed your connector.
Execute the following commands
Tail the agent.out.wrapper.log file and look for the following lines: