kam corpus

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

kam corpus

Rupert Gallagher

Is this the "official" version of kam.cf? 


http://www.pccc.com/downloads/SpamAssassin/contrib/


The file is huge, and consists of ad-hoc rules against spammy keywords. 


We use a completely different approach, resulting in few general rules and a short whitelist. We hardly see any kam-esque spam, but we are wise enough to verify. Is there an open corpus of kam-spam that we can process? 

Reply | Threaded
Open this post in threaded view
|

Re: kam corpus

Kevin A. McGrail-2
On 1/24/2018 6:08 AM, Rupert Gallagher wrote:

Is this the "official" version of kam.cf? 


http://www.pccc.com/downloads/SpamAssassin/contrib/

Yes.  Are there unofficial versions?

The file is huge, and consists of ad-hoc rules against spammy keywords. 


We use a completely different approach, resulting in few general rules and a short whitelist. We hardly see any kam-esque spam, but we are wise enough to verify. Is there an open corpus of kam-spam that we can process? 

Sorry, no, we do not provide a spam or ham corpora for verification.  I can tell you that we get about 2 problem reports a week average with 100's of millions of mailboxes using our cf.

Reply | Threaded
Open this post in threaded view
|

Re: kam corpus

Rupert Gallagher
We had three spam messages in about 8 months? I lost count. Our clients are so used to have a clean inbox that they spot a spam like the proverbial white fly.

Sent from ProtonMail Mobile


On Wed, Jan 24, 2018 at 13:34, Kevin A. McGrail <[hidden email]> wrote:
On 1/24/2018 6:08 AM, Rupert Gallagher wrote:

Is this the "official" version of kam.cf? 


http://www.pccc.com/downloads/SpamAssassin/contrib/

Yes.  Are there unofficial versions?

The file is huge, and consists of ad-hoc rules against spammy keywords. 


We use a completely different approach, resulting in few general rules and a short whitelist. We hardly see any kam-esque spam, but we are wise enough to verify. Is there an open corpus of kam-spam that we can process? 

Sorry, no, we do not provide a spam or ham corpora for verification.  I can tell you that we get about 2 problem reports a week average with 100's of millions of mailboxes using our cf.

Reply | Threaded
Open this post in threaded view
|

Re: kam corpus

@lbutlr
In reply to this post by Rupert Gallagher
On 24 Jan 2018, at 04:08, Rupert Gallagher <[hidden email]> wrote:
> Is this the "official" version of kam.cf?
>
> http://www.pccc.com/downloads/SpamAssassin/contrib/
>
> The file is huge, and consists of ad-hoc rules against spammy keywords.

Is less than 300K huge?

That does remind me, though, does SpamAssassin automatically load *.cf in /usr/local/etc/mail/SpamAssassin or do extra cf files like KAM need to be added somewhere to be loaded?

I seem to recall having to do something, but ti's been a long time since I did anything outside of local.cf

--
...but the senator, while insisting he was not intoxicated, could not
explain his nudity.

Reply | Threaded
Open this post in threaded view
|

Re: kam corpus

Kevin A. McGrail-2
On 1/24/2018 6:48 PM, @lbutlr wrote:
The file is huge, and consists of ad-hoc rules against spammy keywords. 
Is less than 300K huge?

That does remind me, though, does SpamAssassin automatically load *.cf in /usr/local/etc/mail/SpamAssassin or do extra cf files like KAM need to be added somewhere to be loaded?

I seem to recall having to do something, but ti's been a long time since I did anything outside of local.cf

It's a huge file and I need to bring our automation tools to bear on it to streamline it.

Any cf file including KAM.cf works if it is placed wherever your local.cf goes.

Regards,
KAM

Reply | Threaded
Open this post in threaded view
|

Re: kam corpus

Nix-15
In reply to this post by Kevin A. McGrail-2
On 24 Jan 2018, Kevin A. McGrail uttered the following:

> On 1/24/2018 6:08 AM, Rupert Gallagher wrote:
>>
>> Is this the "official" version of kam.cf?
>>
>> http://www.pccc.com/downloads/SpamAssassin/contrib/
>>
> Yes.  Are there unofficial versions?

I've long wondered whether there's an sa-update channel for KAM. It
seems... inelegant and impolite to your site to do a curl for it at
intervals and use the last-modified header (though it does work), when
sa-update's cheaper DNS lookups could do the same job with less overhead.

--
NULL && (void)
Reply | Threaded
Open this post in threaded view
|

Re: kam corpus

Kevin A. McGrail-5
I've considered it. I even run channels for others but just haven't ever set it up.  Focused on 3.4.2 right now.

--
Kevin A. McGrail
VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project

On Wed, Jun 6, 2018 at 11:13 AM, Nix <[hidden email]> wrote:
On 24 Jan 2018, Kevin A. McGrail uttered the following:

> On 1/24/2018 6:08 AM, Rupert Gallagher wrote:
>>
>> Is this the "official" version of kam.cf?
>>
>> http://www.pccc.com/downloads/SpamAssassin/contrib/
>>
> Yes.  Are there unofficial versions?

I've long wondered whether there's an sa-update channel for KAM. It
seems... inelegant and impolite to your site to do a curl for it at
intervals and use the last-modified header (though it does work), when
sa-update's cheaper DNS lookups could do the same job with less overhead.

--
NULL && (void)

Reply | Threaded
Open this post in threaded view
|

Re: kam corpus

Benny Pedersen-2
Kevin A. McGrail skrev den 2018-06-06 18:41:
> I've considered it. I even run channels for others but just haven't
> ever set it up.  Focused on 3.4.2 right now.

3.4.2 is long awaited

i like to see a wiki for how to build own rescores for local only tags,
that could imho speedup new very good tags that catch spam in more
general, as it is now we all miss more spam and rescore is thus biased
incorrect in how it keeps bayes learned, on the other side i reject
based on rbl in mta stage, no plan for me to limit that rbl testing, but
it neutralised bayes learning, with is imho good :=)

as long it works