ClamAVのシグネチャをヒューマンリーダブルな形にパースしたい part8
はじめに
前回、Byte Comparison Subsignatureをパースした。
次はPCRE Subsignature
をパースする
Special Subsignature Types
PCRE Subsignature
これは名前どおりPCREを使うやつ。
フォーマットはTrigger/PCRE/[Flags]
で、TriggerはLogicalExpression
また、PCREの中で;
は後方互換性のために\x3B
と書かなければいけない。助かった。コレのおかげてシンプルなsplit(';')
で済むわけです。
Flagsは以下のような説明
Flags are a series of characters which affect the compilation and execution of PCRE within the PCRE compiler and the ClamAV engine. This field is optional.
g [CLAMAV_GLOBAL] specifies to search for ALL matches of PCRE (default is to search for first match). NOTE: INCREASES the time needed to run the PCRE.
r [CLAMAV_ROLLING] specifies to use the given offset as the starting location to search for a match as opposed to the only location; applies to subsigs without maxshifts. By default, in order to facilatate normal ClamAV offset behavior, PCREs are auto-anchored (only attempt match on first offset); using the rolling option disables the auto-anchoring.
e [CLAMAV_ENCOMPASS] specifies to CONFINE matching between the specified offset and maxshift; applies only when maxshift is specified.
Note: DECREASES time needed to run the PCRE.
i [PCRE_CASELESS]
s [PCRE_DOTALL]
m [PCRE_MULTILINE]
x [PCRE_EXTENDED]
A [PCRE_ANCHORED]
E [PCRE_DOLLAR_ENODNLY]
U [PCRE_UNGREEDY]
ちょっと厄介なのはPCREに\/
があった時かな
パースします。
use nom::{
character::complete::{char, none_of},
combinator::{map, recognize},
multi::many0,
sequence::preceded,
IResult, branch::alt, bytes::complete::take_while,
};
use crate::parser::logical::expression::{LogicalExpression, parse_expression};
#[derive(Debug, Eq, PartialEq)]
pub enum Flag {
Global,
Rolling,
Encompass,
Caseless,
Dotall,
Multiline,
Extended,
Anchored,
DollarEnodnly,
Ungreedy,
}
#[derive(Debug, Eq, PartialEq)]
pub struct PCRE<'p> {
pub trigger: LogicalExpression,
pub pcre: &'p str,
pub flag: Vec<Flag>,
}
fn parse_flags(input: &str) -> IResult<&str, Vec<Flag>> {
map(
many0(alt((
map(char('g'), |_| Flag::Global),
map(char('r'), |_| Flag::Rolling),
map(char('E'), |_| Flag::Encompass),
map(char('i'), |_| Flag::Caseless),
map(char('s'), |_| Flag::Dotall),
map(char('m'), |_| Flag::Multiline),
map(char('e'), |_| Flag::Extended),
map(char('a'), |_| Flag::Anchored),
map(char('d'), |_| Flag::DollarEnodnly),
map(char('U'), |_| Flag::Ungreedy),
))),
|flags| flags,
)(input)
}
fn parse_pcre<'p>(input: &'p str) -> IResult<&'p str, PCRE<'p>> {
let (input, trigger) = take_while(|c: char| c != '/')(input)?;
let trigger = parse_expression(trigger)?.1;
let (input, _) = char('/')(input)?;
let (input, pattern) = recognize(many0(alt((
preceded(char('\\'), none_of("")),
none_of("\\/"),
))))(input)?;
let (input, _) = char('/')(input)?;
let (input, flags) = parse_flags(input)?;
Ok((input, PCRE {
trigger,
pcre: pattern,
flag: flags,
}))
}
#[cfg(test)]
mod test {
use super::*;
use crate::parser::logical::LogicalExpression::*;
#[test]
fn parse_pcre_sample() {
let parsed = parse_pcre("0&1&2/\\/bin\\/clamav/ge").unwrap();
assert_eq!(
PCRE {
trigger: And(vec![SubExpression(0), SubExpression(1), SubExpression(2)]),
pcre: "\\/bin\\/clamav",
flag: vec![Flag::Global, Flag::Extended]
},
parsed.1,
);
let parsed = parse_pcre("0/^\\x2e(only|lowerBound|upperBound|bound)\\x28.*?\\x29.*?\\x2e(lower|upper|lowerOpen|upperOpen)/smi").unwrap();
assert_eq!(
PCRE {
trigger: SubExpression(0),
pcre: "^\\x2e(only|lowerBound|upperBound|bound)\\x28.*?\\x29.*?\\x2e(lower|upper|lowerOpen|upperOpen)",
flag: vec![Flag::Dotall, Flag::Multiline, Flag::Caseless]
},
parsed.1
);
}
}
しました。AIと丁寧なドキュメントのおかげでnomがだんだんわかってきた。
次はこれらのsubsigのパーサを組み合わせます。
終わりに
この記事はn01e0 Advent Calendar 2024の15日目の記事とします。
M@STER EXPO最高でした。
Comments