Hi Ralf, thanks for the green light on the PR.
While testing the IMAP search patch in production we noticed the same user-facing limitation also applies to the Mail filter rules. A user who learns to write invoice +overdue in the search box naturally tries the same in the “Subject contains” field of a filter rule, and the filter doesn’t fire on messages where the words are not adjacent.
From the end-user perspective, having the search box and the filter rules of the same module follow two different syntaxes is hard to justify: it forces them to keep two mental models for what looks like the same operation. Aligning both — the search-side patch you’ve already greenlit, plus this filter-side one — gives a single, consistent search/filter syntax across the Mail module, in line with what Addressbook, Calendar and InfoLog already do.
The fix is structurally identical to the search-side one, just applied to a different code path. I have implemented and tested it. Posting it here so we can discuss it before I open two PRs — one for each.
The problem on filter rules
EGroupware Mail filters end up as Sieve scripts on Dovecot, generated by api/src/Mail/Sieve/Script.php and uploaded via managesieve. For a filter “Subject contains invoice overdue”, the script emits:
if allof (
header :contains "subject" "invoice overdue"
) { ... }
:contains is a literal contiguous-substring test, so a message with subject “the invoice from 02/2026 is overdue” is not matched — the same gotcha as on the search side. Users currently work around this by typing +nuovo in the special “header” row and another in the standard “subject” row, then setting match all of — clunky.
The fix: same tokenisation, applied to Sieve generation
Two new static helpers on the Script class — buildTokenizedSieveTest() and parseSieveTokens() — and a tokenised branch in five case of the rule generator (FROM / TO / SUBJECT / custom header / body), gated on plain :contains mode (so wildcards and the regex checkbox are untouched).
A filter value invoice +overdue on Subject now emits:
if allof (
allof (
header :contains "subject" "invoice",
header :contains "subject" "overdue"
)
) { ... }
A single-token value (e.g. invoice only) still produces byte-identical output to the historical generator — no regression on existing filters.
User-facing syntax matches the search patch exactly, so there’s a single mental model across the whole Mail module:
| Input in any filter value |
Meaning |
invoice |
substring “invoice” anywhere |
invoice overdue |
“invoice” OR “overdue” (default) |
invoice and overdue |
“invoice” AND “overdue” |
invoice +overdue |
“invoice” AND “overdue” |
invoice -spam |
“invoice” AND NOT “spam” |
"invoice overdue" |
literal phrase (preserves legacy behaviour) |
Code and unified diff
Second gist (the Sieve one): https://gist.github.com/CActor/82621b8df78662e6117cb3beeedb94e0
It contains the full Script.php patched, the unified diff (patch -p0 applicable, ~286 lines), and a worked-out example of the generated Sieve script.
Tested
EGroupware 26.1 (26.5.20260507), official Docker image, Dovecot with fts_flatcurve enabled. The patched file is bind-mounted source-side to survive Watchtower image updates. Eight functional test cases (prova +filtro +nuovo on Subject, sent via SMTP from a separate account, observed routing to INBOX/TEST vs INBOX):
- five positives matched as expected, including the case where words are far apart in the subject and the case where one term is a substring of a larger word (“prova” inside “approvazione”)
- three negatives stayed in INBOX as expected (missing token / declension differences / split word)
️ Breaking change for existing filter rules with multi-word values
Since the tokeniser kicks in by default on whitespace (consistent with the documented EGroupware search syntax — “A B: contains A or contains B”), filter rules that today rely on whitespace being part of a contiguous-substring match will change behaviour after this patch is deployed.
Three example patterns we observed on a real-world deployment:
| Filter value (before) |
Old generated Sieve |
New generated Sieve |
Subject contains Project 70
|
header :contains "subject" "Project 70" (only matches contiguous) |
anyof (... "Project", ... "70") (matches “Project” OR “70” — much broader) |
Subject contains urgent meeting
|
only contiguous match |
anyof(...) — broader |
body :text :contains "J. Smith"
|
only contiguous match |
anyof(...) — much broader, “J.” is a substring of many words |
Filters that use wildcards (*/?) or the per-rule regex checkbox are unaffected (the tokeniser branch is gated off in those modes), so any rule that already disambiguates via *phrase* or regex stays as-is.
Migration for end-users: review existing filter rule values; for any with whitespace that you intend as a literal phrase, wrap the value in double quotes (e.g. change Project 70 to "Project 70"). The quoted-phrase token preserves the historical contiguous-substring behaviour.
Plan for the PR
For now I’ll open only the search PR (api/src/Mail.php, as discussed in the previous post). The filter patch (api/src/Mail/Sieve/Script.php) is implemented, tested in our staging environment, and ready as a second PR — but I’d like to wait for your feedback on the search PR first, and confirmation that you want the filter side merged too, before opening a second one. The two share the same tokeniser shape; if you’d prefer it factored into a small trait (e.g. Mail/SearchTokeniserTrait) used by both, happy to refactor either at PR-review time.
Just let me know on this thread (or directly on the search PR) whether you want me to open the filter PR as a follow-up, and I’ll do so within the day — with the BREAKING CHANGE note in the description and a changelog entry suggesting end-users audit their multi-word filter values.
Best regards,
Gabriele