[Bug 7729] New: body rules to match body only, not including the Subject

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] New: body rules to match body only, not including the Subject

bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

            Bug ID: 7729
           Summary: body rules to match body only, not including the
                    Subject
           Product: Spamassassin
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Rules
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: Undefined

The current body rule does not only match the actual body, but also includes
the Subject as pseudo-part of the body.

While this behaviour is clearly documented, people do get caught be this (meta
rule to match some phrases in the body AND subject). It is non-trivial to write
a rule matching the body only, or actually matching both the body and the
Subject.

M:SA:Conf
  "The message Subject header is considered part of the body
   and becomes the first paragraph when running the rules."

We should implement a tflags test flag for body rules to exclude the Subject
from matching.

Along with the next major branch 4.x we may consider making this the default,
with Subject prepending a tflags option.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] body rules to match body only, not including the Subject

bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

Karsten Br├Ąckelmann <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Hardware|PC                          |All
   Target Milestone|Undefined                   |4.0.0
                 OS|Linux                       |All

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

Henrik Krohns <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #1 from Henrik Krohns <[hidden email]> ---
Yes it seems quite pointless shortcut to "maybe" reduce the amount of duplicate
rules.

It makes no sense to add separate tflag for this. If one wants to match
Subject, then it should be a separate header rule!

I vote +1 to remove legacy subject matching from body.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #2 from Henrik Krohns <[hidden email]> ---
Only thing required to fix after that would probably be the schemeless uri
parser, which currently grabs uris and emails from the Subject. It's a one line
fix to code to add it back to it.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #3 from Henrik Krohns <[hidden email]> ---
Ok vetoing my vote. Tested which rules hit subject, and there are maaaany.
Thinking further, some people will probably stick with 3.4 for years and years,
it would be nightmare to just remove Subject and try to maintain compatibility
for all..

Instead here is the extremely simple code required to implement a "nosubject"
tflag. Just add a small documentation blurt, and it can be committed to 3.4.3
very safely, please vote, +1 me.

--- lib/Mail/SpamAssassin/Plugin/Check.pm (revision 1864688)
+++ lib/Mail/SpamAssassin/Plugin/Check.pm (working copy)
@@ -827,6 +827,9 @@
       $loopid++;
       my ($max) = $conf->{tflags}->{$rulename} =~ /\bmaxhits=(\d+)\b/;
       $max = untaint_var($max);
+      if ($conf->{tflags}->{$rulename} =~ /\bnosubject\b/) {
+        $sub .= "\n".'shift @_;'; # remove subject line
+      }
       $sub .= '
       $hits = 0;
       body_'.$loopid.': foreach my $l (@_) {
@@ -844,6 +847,9 @@
     else {
       # omitting the "pos" call, "body_loopid" label, use of while()
       # instead of if() etc., shaves off 8 perl OPs.
+      if (($conf->{tflags}->{$rulename}||'') =~ /\bnosubject\b/) {
+        $sub .= "\n".'shift @_;'; # remove subject line
+      }
       $sub .= '
       foreach my $l (@_) {
         '.$self->hash_line_for_rule($pms, $rulename).'

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #4 from Henrik Krohns <[hidden email]> ---
Slightly revised patch, should not modify @_ as it's used elsewhere..

Eval string will include two extra nosubj codelines if flagged

    if ($scoresptr->{q{FOO}}) {
      my $nosubj = 1; ### IF FLAGGED
      foreach my $l (@_) {
        if ($nosubj) { $nosubj = 0; next; } ### IF FLAGGED
        if ($l =~ /$qrptr->{q{FOO}}/o) {
          $self->got_hit(q{FOO}, "BODY: ", ruletype => "body");
          last;
        }
      }
    }


--- lib/Mail/SpamAssassin/Plugin/Check.pm (revision 1864688)
+++ lib/Mail/SpamAssassin/Plugin/Check.pm (working copy)
@@ -821,6 +821,12 @@
       dbg("rules-all: running body rule %s", q{'.$rulename.'});
       ';
     }
+    my $nosubject = ($conf->{tflags}->{$rulename}||'') =~ /\bnosubject\b/;
+    if ($nosubject) {
+      $sub .= '
+      my $nosubj = 1;
+      ';
+    }
     if (($conf->{tflags}->{$rulename}||'') =~ /\bmultiple\b/)
     {
       # support multiple matches
@@ -830,6 +836,13 @@
       $sub .= '
       $hits = 0;
       body_'.$loopid.': foreach my $l (@_) {
+      ';
+      if ($nosubject) {
+        $sub .= '
+        if ($nosubj) { $nosubj = 0; next; }
+        ';
+      }
+      $sub .= '
         pos $l = 0;
         '.$self->hash_line_for_rule($pms, $rulename).'
         while ($l =~ /$qrptr->{q{'.$rulename.'}}/go'. ($max? ' && $hits++ <
'.$max:'') .') {
@@ -846,6 +859,13 @@
       # instead of if() etc., shaves off 8 perl OPs.
       $sub .= '
       foreach my $l (@_) {
+      ';
+      if ($nosubject) {
+        $sub .= '
+        if ($nosubj) { $nosubj = 0; next; }
+        ';
+      }
+      $sub .= '
         '.$self->hash_line_for_rule($pms, $rulename).'
         if ($l =~ /$qrptr->{q{'.$rulename.'}}/o) {
           $self->got_hit(q{'.$rulename.'}, "BODY: ", ruletype => "body");

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #5 from Henrik Krohns <[hidden email]> ---
Created attachment 5666
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5666&action=edit
Complete tflags nosubject patch

Here complete patch along with documentation and
M::SA::Conf::has_tflags_nosubject.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] [review] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

Henrik Krohns <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|body rules to match body    |[review] body rules to
                   |only, not including the     |match body only, not
                   |Subject                     |including the Subject

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] [review] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

Henrik Krohns <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.0.0                       |3.4.3

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] [review] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

Kevin A. McGrail <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]
   Target Milestone|3.4.3                       |4.0.0

--- Comment #6 from Kevin A. McGrail <[hidden email]> ---
There is no way this should go into 3.4 to be clear. Changing the milestone and
I will -1 vote for a 3.4.  TOTALLY +1 for 4.0 though I need to test the patch
more and I don't have 4.0 setup at the moment, just 3.4 RC's.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] [review] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #7 from Henrik Krohns <[hidden email]> ---
Kevin, are you allowed to decide things with only 2 votes? It's currently +1
-1.

The latest patch is 100% working, guaranteed.

Please someone else vote.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] [review] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #8 from Henrik Krohns <[hidden email]> ---
Created attachment 5667
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5667&action=edit
Nosubject patch for 3.4

Patch for 3.4 attached. It 100% identical, just the sub has_tflags_nosubject {
1 } had to be put in place manually.

3.4 and trunk code are 100% identical the way the eval stuff works. There is
nothing this patch does differently between them.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] [review] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #9 from Henrik Krohns <[hidden email]> ---
Honestly I don't care if 3.4 is nuked, but this patch is super simple. If we
leave 3.4 without the feature, we might as well never use it, since 3.4 will be
around forever.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] [review] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #10 from Kevin A. McGrail <[hidden email]> ---
Thanks for the clarification.  I rescind my veto if the patch is different in
that 3.4 requires a tflag not to eval the subject as part of the body and 4.0
by default requires a tlag to eval the subject, that's ok.

How do you predict that each will handle the tflags for the other?  Assuming
harmlessly ignored?

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] [review] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

RW <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #11 from RW <[hidden email]> ---
(In reply to Kevin A. McGrail from comment #10)

> 4.0 by default requires a tlag to eval the subject, that's ok.

I think that's a bad idea. Few body rules need the subject excluded, most
benefit from it, so practically all body rules will end-up needing a tflags
line.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] [review] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #12 from Henrik Krohns <[hidden email]> ---
(In reply to Kevin A. McGrail from comment #10)
> Thanks for the clarification.  I rescind my veto if the patch is different
> in that 3.4 requires a tflag not to eval the subject as part of the body and
> 4.0 by default requires a tlag to eval the subject, that's ok.

No, I think I will always vote -1 for 4.0 if you are suggesting to drop subject
by default.

People are still using even 3.3, this subject stuff has been in so long there
that it would be probably silly to change now. And even sillier to commit 3.4
with a "nosubject" flag and 4.0 with a "subject" flag.

I think very few rules actually would NOT want to match subject. So it makes
sense to have a "nosubject" flag, instead of "subject" flag.

But I'm working on so many things right now, it's hard to concentrate, I
reserve the right to change my mind later. :-)

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] [review] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

John Hardin <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[hidden email]

--- Comment #13 from John Hardin <[hidden email]> ---
-1 for removing body-matches-subject-too behavior entirely from any version

+1 for adding "tflags nosubject" to 4.0

Undecided about adding it to 3.4 as well; if it breaks 3.3 and earlier then
definitely -1, but if it's just ignored then the concern becomes the effects of
a rules getting interpreted (slightly) differently by different versions.

If we feel we want it in 3.4 then something like
"can(Mail::SpamAssassin::Conf::body_tflag_nosubject)" might be a good idea.

Since rules updates are generated from trunk, then that can() might be a good
idea regardless...

so: +1 for adding it to 3.4 along with
can(Mail::SpamAssassin::Conf::body_tflag_nosubject)

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] [review] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #14 from Henrik Krohns <[hidden email]> ---
Already has the has_tflags_nosubject as mentioned.

No SpamAssassin version lints tflags, you can put any crap you want in them and
it won't croak.

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] [review] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #15 from Henrik Krohns <[hidden email]> ---
It seems "nosubject" is accepted for trunk, committing to get it out of the
way.

Sending        trunk/lib/Mail/SpamAssassin/Conf.pm
Sending        trunk/lib/Mail/SpamAssassin/Plugin/Check.pm
Transmitting file data ..done
Committing transaction...
Committed revision 1864791.

For 3.4 it's already +2 I think, I'll let Kevin say his final words, but I
think it's accepted regardless?

--
You are receiving this mail because:
You are the assignee for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug 7729] [review] body rules to match body only, not including the Subject

bugzilla-daemon-2
In reply to this post by bugzilla-daemon-2
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7729

--- Comment #16 from Henrik Krohns <[hidden email]> ---
Forgot docs..

Sending        trunk/UPGRADE
Transmitting file data .done
Committing transaction...
Committed revision 1864792.

--
You are receiving this mail because:
You are the assignee for the bug.
12