Parser Generator

This is a tool to create a parser to perform normalization on logs collected by Parser Generator. When monitor invalid logs in real-time monitoring and run Parser Generator, the original log is specified, and you can create a Parser policy in XML format using Separator or Regex and apply it to the linked equipment.

Code

Contents

Explanation

000

Normalize Success

001

log is null

Occurs when logging blank lines

002

Header is null

Missing header Infomation

003

Header Invalid

Missing header element

004

Can not Find ParserInfo

If there is no log parser corresponding to the header information, check the agent IP of the equipment management, the equipment IP, and the parser information, and if there is no parser, create and register the parser.

005

Can not Find Category

If you can't find the category that corresponds to the parser, check the parser classification method and classification value in parser management.

006

Can not Find ParserFormat

If the parser XML cannot be found, restart the normlizer process.

007

Invalid Log Format

If the log does not match the parser

Check the parser XML

008

Field Info Error

If field information cannot be found

Check the parser XML

009

Time Over

If the equipment time of the collection log exceeds the collection allowance time, adjust the Time-Over time of the normalization process (Noramlizer) or adjust the time of the linked equipment.

010

Etc Error

In case of other errors

011

Regex Parser Timeout

If the regular expression parser times out, check the regular expression and proceed again.

Log Normalization Procedure

Right-click on unnormalized logs
String conversion: Convert to Base64/URL/HEX/MD5/SHA values
Open XML Parser: A tool for generating parser XML code by analyzing sample (raw) logs, breaking them down, and mapping to user-defined fields in order to create normalization policies for search and analysis

Selecting Open XML Parser runs the Parser Generator
Depending on the log format, two types of parsers are supported: Separator and Regex
Separator: Used when logs are clearly divided by consistent delimiters
Regex: Used when it's difficult to separate logs using delimiters

Separator

Field Extraction Rules

For the following log formats, field values are automatically extracted:

Key-Value

JSON

CEF (Common Event Format)

For all other formats (e.g., pipe-delimited, space-delimited, CSV), field values must be manually defined during the parsing process.

Additionally, regardless of format, both original_log and eqp_type must always be included as mandatory fields.

When defining field values during log parsing, the following two fields must be included in every parsed entry:

original_log:

The full raw log string, exactly as received. Used for validation and traceability.

eqp_type:

Identifies the source device type (e.g., FW, WAF, IDS). Used for log classification.

Without these fields, logs may be rejected or not processed correctly.

Separator

Description and Examples

Pipe [\|]

Pipe (|) Delimiter

Example: 192.168.0.1|192.168.10.1|tcp|22|443

Description: Fields are clearly separated by the pipe (|) character. Commonly used for simple log parsing.

Space [\s]

Space Delimiter

Example: 192.168.0.1 192.168.10.1 tcp 22 443

Description: Fields are separated by space. Easy to read but problematic if field values contain spaces.

Tab [\t]

Tab Delimiter

Example: 192.168.0.1 192.168.10.1 tcp 22 443

(Note: Tabs may not be visually distinct.)

Description: Uses tab character as delimiter. Often used in structured text logs.

Comma [,]

Comma (,) Delimiter

Example: 192.168.0.1,192.168.10.1,tcp,22,443

Description: Standard CSV-style delimiter. May cause issues if fields contain commas.

CSV [,] [\”]

CSV Format with Double Quotes

Example: "192.168.0.1","192.168.10.1","tcp","22","443"

Description: Each field is enclosed in double quotes to ensure proper parsing, especially when commas are included in field values.

JSON

Syslog JSON Format

Example: {"timestamp":"2018-01-02 14:42:23","event_type":"alert","src_ip":"192.168.0.1"}

Description: Log is in JSON structure. Widely used in modern log processing systems.

Key&Value(2)

Key-Value Format

Example: sc_ip="192.168.0.1",dstn_ip="192.168.10.1",prtc="tcp",src_port="22"

Description: Fields are shown as key-value pairs. Easy to parse and widely used in security logs.

CEF

Common Event Format (CEF)

Example:

fenotify-20252856.warning: CEF:0|FireEye|CMS|7.8.1.468932|DM|domainmatch|1|rt=Oct 19 2016 01:04:40 UTC n3Label=cncPort cn3=53 cn2Label=sid cn2=80448589 shost=dns.example.com proto=udp spt=23619

Description: Used for integration with SIEM and ESM systems. Fields separated by pipe (|), with additional key-value fields at the end.

LEEF

LEEF (Log Extended Event Format)

Example:

LEEF:1.0|Microsoft|MSExchange|4.0 SP1|15345

Description: Used for IBM QRadar integration. Pipe (|) is used as the delimiter, followed by optional key-value pairs.

etc

Custom Delimiters

Example:

Backtick () as delimiter → 192.168.0.1`tcp`22`

Equal sign (=) as delimiter → src_ip\=192.168.0.1

Description: When using a custom delimiter not listed in predefined options, prefix it with a backslash (\) for recognition and correct parsing.

Regex

Regular expression

Explanation

Start of string

end of String

One random character

No preceding character or one preceding character

One or more preceding characters

0 or more preceding characters

[]

Used when specifying a character class

Indicates a set or range of characters, and the space between two characters is expressed with the '-' symbol If ^ appears before [], it means not

{}

Number of times or range of occurrences of the preceding character

a{3} → a is repeated 3 times

a{3,} → a is repeated 3 or more times

a{3,5} → a is repeated 3 or more but less than 5 times

(?: )

(?: ) Recognizes letters within special characters as a single character

gu(?:gg){2}le → If the guggggle string is included

Used when using OR operation in a pattern

hi|hello → means a string that contains hi or hello

Alphabet or number

Characters other than alphabets or numbers

Same as number [0-9]

Any character except numbers

(?i)

If you add the (?i) option in the front, it will not be case sensitive (question mark + lowercase i (i))

Frequently Used Regular Expression Examples in the Solution

Regular expression

Explanation

(?:String)?

There may or may not be a string

\s+

There is one or more spaces

[^\s]+

At least one non-empty string exists

[\d]+

There is one or more numbers

[^\d]+

At least one non-numeric string exists

\d{1,20}

There must be at least 1 number and no more than 20 numbers

[a-zA-Z0-9]

One character, either an English letter or a number

[a-zA-Z0-9]{5}

5 strings of uppercase/lowercase letters or numbers

ex>test1, t1234, 01234, testt

One or more random characters

(?:.+)?

With or without one or more random characters

It is recommended not to use the (.+) phrase when normalizing, as it may put a burden on the Normalizer, which is the normalization process.

To extract certain parts of the original log as searchable fields, wrap the desired values in parentheses ( ) to define them as fields.
You can also define fields using the "Use Field" checkbox.
When a regular expression is entered, matched text in the Sample Text will be highlighted in yellow, and selected field values will be highlighted in blue.
Once fields are properly separated using either delimiters or regular expressions, you must assign a field name.
If no field name is assigned, the separated data will not be normalized, even if it has been matched.
<Select Field>
Field Name Background Colors:
Red: Default fields used by eyeCloudSIM. These cannot be modified or deleted.
Blue: User-defined fields. These can be modified or deleted, except when currently in use.
If the field you want doesn't exist, you can create a new one by entering a Field Name & Alias as needed.

Convert Option

Convert Option > <No Change>

Convert Option

Explanation

replace(‘arg1’, ‘arg2’)

String conversion function

ex> middle → medium

replaceAll(‘arg1’, ‘arg2’)

String conversion function using regular expressions

replaceGet(‘arg1’)

String extraction function using regular expressions, extracting regular expressions enclosed in () ex> replaceGet(‘[a-z]+_(\d+)_[a-z]+’) : prefix_123_tail → 123

substr(‘arg1’)

String cutting function

ex> substr(‘8’) : 20130830100711 → 100711

date(‘arg1’, ‘arg2’)

DateFormat conversion feature

ex> date(‘yyyy/MM/dd HH:mm:ss’, ‘yyyyMMddHHmmss’) : 2013/08/30 10:07:11 → 20130830100711

How to apply Timezone and Locale→

Used when converting the time of a log with a different time zone (2013-08-30 AM 10:07:11)

arg1 : DateFormat,Timezone,Locale.Language,Locale.Country

ex> yyyy-MM-dd a HH:mm:ss,America/Los_Angeles,en,US

arg2 : DateFormat[,Timezone,Locale.Language,Locale.Country]

ex> yyyyMMddHHmmss,America/Los_Angeles,en,US ex> empty is yyyyMMddHHmmss,Asia/Seoul,ko,KR

arg3: jun 3 14:00:00

ex> MMM dd HH:mm:ss,America/Los_Angeles,en,US → yyyy-MM-dd HH:mm:ss,Asia/Seoul,ko,KR

unixTimestamp()

Function to convert Unix time format to yyyyMMddHHmmss

ex> 1351239803 → 20121026172323

hexRemoveHeader()

Convert Payload Hex value to String excluding header

hexToString()

Function to convert Payload Hex value to String

stringToMD5()

Convert a string to an MD5 encrypted String

ex) 192.168.0.1 → f0fdb4c3f58e3e3f8e77162d893d3055

decodeBase64()

Base64 Convert encrypted value to decrypted string

longToIP()

Convert IP values in Long format to general IP format

toLowerCase()

Convert string to lowercase

ex) Tcp → tcp, TCP → tcp

toUpperCase()

Convert string to uppercase

ex) Tcp → TCP, tcp → TCP

extractDomain()

Function to extract only the domain from a string

ex) If it is in IP format, extract it immediately

ex) www.google.com → google.com

ex) www.google.co.kr → google.co.kr

ifNull(‘arg1’)

If it is a null value, convert it to the entered string.

trim()

A function that removes spaces from a string.

Apply Convert Option for Time Value

For fields that mean time, the time format is different, so use the Convert Option option to unify it.

May 14, 2020 1:28:39 PM KST → yyyyMMddHHmmss

[[@replaceAll(',','')]] : , String discarding

[[@replaceAll('\w{1,3}$','')]] : KST String discarding

[[@date('MMM dd yyyy HH:mm:ss a,Asia/Seoul,en,KR','yyyyMMddHHmmss')]] : Convert to yyyyMMddHHmmssE format

Source: Contents of the currently selected field

Result: Check the final output value of the Tag Generation result using Convert Tag in advance.

Convert: Provides Converting using a function provided by Java as an option for field conversion

Parameter1: The value currently entered in Parameter1 is entered in the arg1 value of Convert Option. (Input range: There is no input value limit, but 1 to 65,535 characters or less is recommended)

Parameter2: The value currently entered in Parameter1 is entered in the arg2 value of Convert Option. (Input range: There is no input value limit, but 1 to 65,535 characters or less is recommended)

+ symbol: Enter the selected Convert Option and value as the Tag Generation value

- symbol: Delete the selected Convert Option and value from the Tag Generation value.

Tag Generation: You can check and edit the syntax of the Convert Option to be converted in the form of a delimiter. You can enter it in Convert Option, or you can create it by directly entering a Tag.

✔ replace(‘arg1’,‘arg2’): Replaces Parameter1 value with Parameter2 value once.

✔ replaceAll(‘arg1’,‘arg2’): Replaces all Parameter1 values with Parameter2 values.

✔ regexGet(‘arg1’): Extracts a specific value using a regular expression.

✔ substr(‘arg1’): Outputs a string starting from the first letter and the next number entered in Parameter1.

✔ substr(‘arg1’,‘arg2’): Outputs a string starting from the next number entered in Parameter1 to the next number entered in Parameter2.

✔ date(‘arg1’,‘arg2’): Replaces the date format.

· Example 1) If you change 2014-06-30 to 20140630, enter yyyy-MM-dd in the Parameter1 value and yyyyMMdd in the Parameter2 value.

· Example 2) If you change Jan 14 2014 to 20140114, enter MMM dd yyyy,Seoul,en,US in the Parameter1 value and yyyyMMdd,Seoul,ko,KR (yyyy represents the year, MM represents the month, and dd represents the day. Case sensitive. The city name of Seoul can be any city name, but it cannot be omitted, and the syntax of en,US and ko,KR must be followed.)

✔ unixTimestamp(): Converts Unix Time, which is counted in seconds since January 1, 1970, to the current time expression (HH:mm:ss).

✔ hexRemoveHeader(): Converts the value output as HEX code to a general string by excluding the header.

✔ hexToString(): Converts the value output as HEX code to a general string.

✔ stringToMD5(): Converts the field value to an MD5-encrypted string.

✔ decodeBase64(): Converts a Base64-encrypted value to a decrypted string.

✔ longToIP(): Converts an IP value in Long format to a general IP format.

✔ toLowerCase(): Converts a string to lowercase.

✔ toUpperCase(): Converts a string to uppercase.

✔ ifNull(): Converts a null value to the entered string.

✔ trim(): Removes spaces from the front and back of a string.