Remove SA tagging when learning as ham

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Remove SA tagging when learning as ham

@lbutlr
I have a script that runs when a mail is moved out of the Junk folder to pass the mail through sa-learn --ham, but it doesn’t removed the subject tagging (Spam: 05.5) nor does it remove the X-Spam-Flag header.

What would I need to do in the script to remove the SA tags on messages that are processed by this script?

--
Stone circles were common enough everywhere in the mountains. Druids
built them as weather computers, and since it was always cheaper to
build a new 33-Megalith circle than to upgrade an old slow one, there
were generally plenty of ancient ones around --Lords and Ladies

Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

Kevin A. McGrail-5
I'd look at https://serverfault.com/questions/817928/procmailrc-change-email-subject

--
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project

On Mon, Jun 18, 2018 at 8:13 AM, @lbutlr <[hidden email]> wrote:
I have a script that runs when a mail is moved out of the Junk folder to pass the mail through sa-learn --ham, but it doesn’t removed the subject tagging (Spam: 05.5) nor does it remove the X-Spam-Flag header.

What would I need to do in the script to remove the SA tags on messages that are processed by this script?

--
Stone circles were common enough everywhere in the mountains. Druids
built them as weather computers, and since it was always cheaper to
build a new 33-Megalith circle than to upgrade an old slow one, there
were generally plenty of ancient ones around --Lords and Ladies


Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

Tom Hendrikx
Hi,

"Moving out of the Junk folder" definitely sounds like IMAP. In the IMAP
standard, messages can't be changed after delivery. To alter the message
(change subject, remove headers), you'll need to delete the old message,
and create a new, altered message. This is bad for caching, and could
mess up your MUA because you might delete a message serverside when the
client is interacting with that same message.

When you don't want to see the result of Junk filtering in you MUA,
don't tag the subject, and do everything based on the SA headers. The
message is Spam when it ends up in your Spam folder, otherwise it's not.
And when moving around the message, you don't end up with non-spam
messages that have a spam tag in the subject (because you never added one).

Kind regards,

        Tom

On 18-06-18 14:22, Kevin A. McGrail wrote:

> I'd look
> at https://serverfault.com/questions/817928/procmailrc-change-email-subject
>
> --
> Kevin A. McGrail
> VP Fundraising, Apache Software Foundation
> Chair Emeritus Apache SpamAssassin Project
> https://www.linkedin.com/in/kmcgrail - 703.798.0171
>
> On Mon, Jun 18, 2018 at 8:13 AM, @lbutlr <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     I have a script that runs when a mail is moved out of the Junk
>     folder to pass the mail through sa-learn --ham, but it doesn’t
>     removed the subject tagging (Spam: 05.5) nor does it remove the
>     X-Spam-Flag header.
>
>     What would I need to do in the script to remove the SA tags on
>     messages that are processed by this script?
>
>     --
>     Stone circles were common enough everywhere in the mountains. Druids
>     built them as weather computers, and since it was always cheaper to
>     build a new 33-Megalith circle than to upgrade an old slow one, there
>     were generally plenty of ancient ones around --Lords and Ladies
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

RW-15
On Mon, 18 Jun 2018 14:42:39 +0200
Tom Hendrikx wrote:

 want to see the result of Junk filtering in you MUA,
> don't tag the subject, and do everything based on the SA headers.

And X-Spam-* headers are ignored.


Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

Martin Gregorie-2
In reply to this post by @lbutlr
On Mon, 2018-06-18 at 06:13 -0600, @lbutlr wrote:
> I have a script that runs when a mail is moved out of the Junk folder
> to pass the mail through sa-learn --ham, but it doesn’t removed the
> subject tagging (Spam: 05.5) nor does it remove the X-Spam-Flag
> header.
>
> What would I need to do in the script to remove the SA tags on
> messages that are processed by this script?
>
I normally use an awk script for this sort of job because they are
short, easy to write and run fast.

Here's one I use to remove SA headers from the messages I keep as an SA
test corpus. Its part of a larger, site-specific, shell script. Much of
its apparent complexity is because its capable of deleting multi-line
SA headers and because it recognises the blank line following the
headers and switches into a pure copy mode at that point.

====================================================================
#!/bin/bash
# First argument of this shell script is the file containing the
# message to be cleaned
# Second argument is the file the cleaned message is to be written to
#
awk '
BEGIN           { act = "copy";
                  body = "no";
                }
/^[A-Za-z]/     { act = "copy"    }
/^X-Spam/       { act = "skip"    }
/^$/            { body = "yes"; }
                {  
                  if (act == "copy" || body == "yes")
                  { print }
                }
' <$1 >$2
====================================================================

If you don't know awk, and the mere fact of you asking this question
suggests you don't, I strongly suggest you get hold of the O'Reilly
"Sed & Awk" book and learn how to write awk scripts because you're
likely to find all sorts of uses for awk once you know it.

Martin

Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

RW-15
On Mon, 18 Jun 2018 14:11:16 +0100
Martin Gregorie wrote:

   
> I normally use an awk script for this sort of job because they are
> short, easy to write and run fast.

There's no point in the OP doing this, since sa-learn ignores these
headers.
Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

Martin Gregorie-2
On Mon, 2018-06-18 at 14:26 +0100, RW wrote:

> On Mon, 18 Jun 2018 14:11:16 +0100
> Martin Gregorie wrote:
>
>    
> > I normally use an awk script for this sort of job because they are
> > short, easy to write and run fast.
>
> There's no point in the OP doing this, since sa-learn ignores these
> headers.
>
Sure, but other external tools are often easier to write and/or work
better if there's only the one set of SA headers in a message. Plus I
find its easier to check the operation of a new or modified rule if the
test message doesn't have SA headers in it.

That's why I clean SA headers out of my spam collection, but of course
YMMV.

Martin

Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

RW-15
In reply to this post by @lbutlr
On Mon, 18 Jun 2018 06:13:06 -0600
@lbutlr wrote:

> I have a script that runs when a mail is moved out of the Junk folder
> to pass the mail through sa-learn --ham,



Whether this is the Dovecot plugin or something local it's a poor way
of training Bayes. You're training on SA errors not Bayes errors. Most
imperfect Bayes results don't translate into misclassifications.
Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

RW-15
In reply to this post by Martin Gregorie-2
On Mon, 18 Jun 2018 15:44:15 +0100
Martin Gregorie wrote:

> On Mon, 2018-06-18 at 14:26 +0100, RW wrote:
> > On Mon, 18 Jun 2018 14:11:16 +0100
> > Martin Gregorie wrote:
> >
> >      
> > > I normally use an awk script for this sort of job because they are
> > > short, easy to write and run fast.  
> >
> > There's no point in the OP doing this, since sa-learn ignores these
> > headers.
> >  
> Sure, but other external tools are often easier to write and/or work
> better if there's only the one set of SA headers in a message.

If you have seen that happen it's a bug.
Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

@lbutlr
In reply to this post by RW-15

On 18 Jun 2018, at 08:47, RW <[hidden email]> wrote:

> On Mon, 18 Jun 2018 06:13:06 -0600
> @lbutlr wrote:
>
>> I have a script that runs when a mail is moved out of the Junk folder
>> to pass the mail through sa-learn --ham,
>
>
> Whether this is the Dovecot plugin or something local it's a poor way
> of training Bayes. You're training on SA errors not Bayes errors. Most
> imperfect Bayes results don't translate into misclassifications.

I’m not sure what you’re trying too say here/ Certainly SA does misclassify mail as spam at times, usually bulk mail the the user wants (for example, it marks Comixology mails as spam for me). Training the messages as ham is useful.

The script that runs is running out of dovecot, so procmail is not an option. What I have currently, but it doesn’t work well and I’m considering abandoning it entirely:

#!/bin/sh
exec /usr/local/bin/spamassassin -d ${1} && /usr/local/bin/sa-learn -u ${1} --ham

Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

@lbutlr
On 18 Jun 2018, at 10:13, @lbutlr <[hidden email]> wrote:
> #!/bin/sh
> exec /usr/local/bin/spamassassin -d ${1} && /usr/local/bin/sa-learn -u ${1} --ham

Sorry, tyop from memory.

#!/bin/sh
exec /usr/local/bin/spamassassin -d && /usr/local/bin/sa-learn -u ${1} --ham


I think what I am going to do is enable report_safe 1 and remove the subject tagging and see how that goes.

Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

RW-15
In reply to this post by @lbutlr
On Mon, 18 Jun 2018 10:13:04 -0600
@lbutlr wrote:

> On 18 Jun 2018, at 08:47, RW <[hidden email]> wrote:
> > On Mon, 18 Jun 2018 06:13:06 -0600
> > @lbutlr wrote:
> >  
> >> I have a script that runs when a mail is moved out of the Junk
> >> folder to pass the mail through sa-learn --ham,  
> >
> >
> > Whether this is the Dovecot plugin or something local it's a poor
> > way of training Bayes. You're training on SA errors not Bayes
> > errors. Most imperfect Bayes results don't translate into
> > misclassifications.  
>
> I’m not sure what you’re trying too say here/ Certainly SA does
> misclassify mail as spam at times, ...
> Training the messages as ham is useful.

The problem is that, unless there is something badly wrong, a typical
single user account wont generate enough FPs and FNs for a properly
trained database. I found that Bayes's identification of ham improved
until I'd trained about 1500 ham, but I wouldn't expect to get anything
like 1500 SpamAssassin FPs in a lifetime.

It's not even proper train-on-error because it's training on
SpamAssassin misclassifications  and not correcting Bayes's own
errors. It allows Bayes to go uncorrected until it results
in an FP or FN.

You can work around the plugin's deficiencies by using autotraining or
doing some additional training, but then the plugin is of limited
relevance.

IMO the plugin is best left to statistical filters like DSPAM.

Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

Matus UHLAR - fantomas
>> > On Mon, 18 Jun 2018 06:13:06 -0600 @lbutlr wrote:
>> >> I have a script that runs when a mail is moved out of the Junk
>> >> folder to pass the mail through sa-learn --ham,

I think this is what the dovecot's Antispam plugin does:

https://wiki2.dovecot.org/Plugins/Antispam

and maybe ImapSieve:
https://wiki2.dovecot.org/HowTo/AntispamWithSieve

>> On 18 Jun 2018, at 08:47, RW <[hidden email]> wrote:
>> > Whether this is the Dovecot plugin or something local it's a poor
>> > way of training Bayes. You're training on SA errors not Bayes
>> > errors. Most imperfect Bayes results don't translate into
>> > misclassifications.

still better than nothing. And it helps us solve the main problem -
misclassifications.

>On Mon, 18 Jun 2018 10:13:04 -0600 @lbutlr wrote:
>> I’m not sure what you’re trying too say here/ Certainly SA does
>> misclassify mail as spam at times, ...
>> Training the messages as ham is useful.

On 18.06.18 22:58, RW wrote:
>The problem is that, unless there is something badly wrong, a typical
>single user account wont generate enough FPs and FNs for a properly
>trained database. I found that Bayes's identification of ham improved
>until I'd trained about 1500 ham, but I wouldn't expect to get anything
>like 1500 SpamAssassin FPs in a lifetime.

>It's not even proper train-on-error because it's training on
>SpamAssassin misclassifications  and not correcting Bayes's own
>errors. It allows Bayes to go uncorrected until it results
>in an FP or FN.

Of course, training BAYES_999 as spam and BAYES_00 as ham won't help change
their score, but still can push possible BAYES_20 to BAYES_00 and BAYES_99 to
BAYES_999.

>You can work around the plugin's deficiencies by using autotraining or
>doing some additional training, but then the plugin is of limited
>relevance.

Of course, both autotraining AND the fixing errors are required to
work properly.

Unfortunately I have seen spam repeatedly trained as ham, because of some
negative scoring rules and too high autolearn threshold.

Same can happen in opposite way. having way to fix those manually helps
users.

>IMO the plugin is best left to statistical filters like DSPAM.

isn't dspam dead?

--
Matus UHLAR - fantomas, [hidden email] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I intend to live forever - so far so good.
Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

RW-15
On Tue, 19 Jun 2018 10:41:51 +0200
Matus UHLAR - fantomas wrote:

> >> > On Mon, 18 Jun 2018 06:13:06 -0600 @lbutlr wrote:  
> >> >> I have a script that runs when a mail is moved out of the Junk
> >> >> folder to pass the mail through sa-learn --ham,  
>

> >You can work around the plugin's deficiencies by using autotraining
> >or doing some additional training, but then the plugin is of limited
> >relevance.  
>
> Of course, both autotraining AND the fixing errors are required to
> work properly.

Then you have worst of both worlds. I'm not saying the plugin is
completely useless for Bayes, but 'not completely useless' is not
much of a recommendation.

Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

Matus UHLAR - fantomas
>> >> > On Mon, 18 Jun 2018 06:13:06 -0600 @lbutlr wrote:
>> >> >> I have a script that runs when a mail is moved out of the Junk
>> >> >> folder to pass the mail through sa-learn --ham,

>> >You can work around the plugin's deficiencies by using autotraining
>> >or doing some additional training, but then the plugin is of limited
>> >relevance.

>On Tue, 19 Jun 2018 10:41:51 +0200 Matus UHLAR - fantomas wrote:
>> Of course, both autotraining AND the fixing errors are required to
>> work properly.

On 19.06.18 22:27, RW wrote:
>Then you have worst of both worlds. I'm not saying the plugin is
>completely useless for Bayes, but 'not completely useless' is not
>much of a recommendation.

I'd say the best, or nearly the best:

- autolearning works
- user can correct mistakes.

one downside is that users will corerct only in case of score mismatch, not
bayes mismatch (so, even BAYES_999 won't be reported when not causing FP).

do you know of better way than manual reviewing all BAYES scores for all
mail?

--
Matus UHLAR - fantomas, [hidden email] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
My mind is like a steel trap - rusty and illegal in 37 states.
Reply | Threaded
Open this post in threaded view
|

Re: Remove SA tagging when learning as ham

RW-15
On Wed, 20 Jun 2018 09:20:56 +0200
Matus UHLAR - fantomas wrote:

> >> >> > On Mon, 18 Jun 2018 06:13:06 -0600 @lbutlr wrote:  
> >> >> >> I have a script that runs when a mail is moved out of the
> >> >> >> Junk folder to pass the mail through sa-learn --ham,  
>
> >> >You can work around the plugin's deficiencies by using
> >> >autotraining or doing some additional training, but then the
> >> >plugin is of limited relevance.  
>
> >On Tue, 19 Jun 2018 10:41:51 +0200 Matus UHLAR - fantomas wrote:  
> >> Of course, both autotraining AND the fixing errors are required to
> >> work properly.  
>
> On 19.06.18 22:27, RW wrote:
> >Then you have worst of both worlds. I'm not saying the plugin is
> >completely useless for Bayes, but 'not completely useless' is not
> >much of a recommendation.  
>
> I'd say the best, or nearly the best:
>
> - autolearning works
> - user can correct mistakes.

SA autotraining is can be too selective, and both the plugin and
autotraining are poor at learning ham. And many users wont correct all
mistakes.

It seem inferior to simple manual imap training folders, or webmail
training.
 
> do you know of better way than manual reviewing all BAYES scores for
> all mail?

I do, but I wouldn't recommend it for general users.

I use training folders and have a sieve script that does something
like this:

if score >= 15 && sanity-checks {

    # definitely spam (zero FPs)
    file into <high-scoring spam folder>

    if needs-training-as-spam  {
       file into <train spam folder>
    }

}
elsif score >= 5 {

    # low-scoring spam or spam that need inspection
    file into <low-scoring spam folder>

}
else {
   
    if needs-training-as-ham  {

       file a copy into <unsure ham folder>

    }
    # start of filing rules
    ...
}


Anything in <low-scoring spam folder> or <unsure ham folder> gets
manually moved to a training folder. I occasionally copy some manually
selected ham as well, to keep up the numbers.


Almost all my ham hits BAYES_00 these days, and with local rules  >99%
of spam is over the 15 points needed for automated handling. It
requires very little effort.