はじめに

前回、Byte Comparison Subsignatureをパースした。

次はPCRE Subsignatureをパースする

Special Subsignature Types

PCRE Subsignature

これは名前どおりPCREを使うやつ。

フォーマットはTrigger/PCRE/[Flags]で、TriggerはLogicalExpression

また、PCREの中で;は後方互換性のために\x3Bと書かなければいけない。助かった。コレのおかげてシンプルなsplit(';')で済むわけです。

Flagsは以下のような説明

Flags are a series of characters which affect the compilation and execution of PCRE within the PCRE compiler and the ClamAV engine. This field is optional.

  • g [CLAMAV_GLOBAL] specifies to search for ALL matches of PCRE (default is to search for first match). NOTE: INCREASES the time needed to run the PCRE.

  • r [CLAMAV_ROLLING] specifies to use the given offset as the starting location to search for a match as opposed to the only location; applies to subsigs without maxshifts. By default, in order to facilatate normal ClamAV offset behavior, PCREs are auto-anchored (only attempt match on first offset); using the rolling option disables the auto-anchoring.

  • e [CLAMAV_ENCOMPASS] specifies to CONFINE matching between the specified offset and maxshift; applies only when maxshift is specified.

    Note: DECREASES time needed to run the PCRE.
    
  • i [PCRE_CASELESS]

  • s [PCRE_DOTALL]

  • m [PCRE_MULTILINE]

  • x [PCRE_EXTENDED]

  • A [PCRE_ANCHORED]

  • E [PCRE_DOLLAR_ENODNLY]

  • U [PCRE_UNGREEDY]

ちょっと厄介なのはPCREに\/があった時かな

パースします。

use nom::{
    character::complete::{char, none_of},
    combinator::{map, recognize},
    multi::many0,
    sequence::preceded,
    IResult, branch::alt, bytes::complete::take_while,
};

use crate::parser::logical::expression::{LogicalExpression, parse_expression};

#[derive(Debug, Eq, PartialEq)]
pub enum Flag {
    Global,
    Rolling,
    Encompass,
    Caseless,
    Dotall,
    Multiline,
    Extended,
    Anchored,
    DollarEnodnly,
    Ungreedy,
}

#[derive(Debug, Eq, PartialEq)]
pub struct PCRE<'p> {
    pub trigger: LogicalExpression,
    pub pcre: &'p str,
    pub flag: Vec<Flag>,
}

fn parse_flags(input: &str) -> IResult<&str, Vec<Flag>> {
    map(
        many0(alt((
            map(char('g'), |_| Flag::Global),
            map(char('r'), |_| Flag::Rolling),
            map(char('E'), |_| Flag::Encompass),
            map(char('i'), |_| Flag::Caseless),
            map(char('s'), |_| Flag::Dotall),
            map(char('m'), |_| Flag::Multiline),
            map(char('e'), |_| Flag::Extended),
            map(char('a'), |_| Flag::Anchored),
            map(char('d'), |_| Flag::DollarEnodnly),
            map(char('U'), |_| Flag::Ungreedy),
        ))),
        |flags| flags,
    )(input)
}

fn parse_pcre<'p>(input: &'p str) -> IResult<&'p str, PCRE<'p>> {
    let (input, trigger) = take_while(|c: char| c != '/')(input)?;
    let trigger = parse_expression(trigger)?.1;
    let (input, _) = char('/')(input)?;
    let (input, pattern) = recognize(many0(alt((
        preceded(char('\\'), none_of("")),
        none_of("\\/"),
    ))))(input)?;
    let (input, _) = char('/')(input)?;
    let (input, flags) = parse_flags(input)?;

    Ok((input, PCRE {
        trigger,
        pcre: pattern,
        flag: flags,
    }))
}

#[cfg(test)]
mod test {
    use super::*;
    use crate::parser::logical::LogicalExpression::*;

    #[test]
    fn parse_pcre_sample() {
        let parsed = parse_pcre("0&1&2/\\/bin\\/clamav/ge").unwrap();
        assert_eq!(
            PCRE {
                trigger: And(vec![SubExpression(0), SubExpression(1), SubExpression(2)]),
                pcre: "\\/bin\\/clamav",
                flag: vec![Flag::Global, Flag::Extended]
            },
            parsed.1,
        );

        let parsed = parse_pcre("0/^\\x2e(only|lowerBound|upperBound|bound)\\x28.*?\\x29.*?\\x2e(lower|upper|lowerOpen|upperOpen)/smi").unwrap();
        assert_eq!(
            PCRE {
                trigger: SubExpression(0),
                pcre: "^\\x2e(only|lowerBound|upperBound|bound)\\x28.*?\\x29.*?\\x2e(lower|upper|lowerOpen|upperOpen)",
                flag: vec![Flag::Dotall, Flag::Multiline, Flag::Caseless]
            },
            parsed.1
        );
    }
}

しました。AIと丁寧なドキュメントのおかげでnomがだんだんわかってきた。

次はこれらのsubsigのパーサを組み合わせます。

終わりに

この記事はn01e0 Advent Calendar 2024の15日目の記事とします。

M@STER EXPO最高でした。