Help with blocking Chinese Spam

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Help with blocking Chinese Spam

Jenny Lee-2


Dear SA Users,
 
I am getting this chinese spam every hour. I tried, ok_locales, ok_languages with texcat plugin... I tried matching the subject... but these people are always getting through.
 
http://www.pastebin.ca/2127622
 
What rules/modifications do I need to do to get rid of this?
 
J    
Reply | Threaded
Open this post in threaded view
|

Re: Help with blocking Chinese Spam

Martin Gregorie-2
On Tue, 2012-03-13 at 09:48 +0000, Jenny Lee wrote:

>
> Dear SA Users,
>  
> I am getting this chinese spam every hour. I tried, ok_locales,
> ok_languages with texcat plugin... I tried matching the subject... but
> these people are always getting through.
>  
> http://www.pastebin.ca/2127622
>  
> What rules/modifications do I need to do to get rid of this?
>  
If that UTF-8 prefix - =?utf-8?B? - is specific for Chinese, then a rule
something like:

header __FC1  From =~ /=?utf-8?B?/
header __FC2  From =~ /\.cn>/i
meta   FAKE_CHINESE  (__FC1 && !__FC2)

might do it.

Equally obviously, if all the spam is coming from Argentina, or
pretending to come from there, and your users never correspond with
anybody from that country, simply deep-six anything with that TLD in the
sender's address. I use a modification of that to treay all mail from
Russia as spam unless it comes from one of the three people I know
there:

describe MG_CYRILLIC  Russian cyrillic spam
header   __MG_CY1 From =~ /\.ru>/
header   __MG_CY2 From =~ /person1\@mail\.example1\.ru/
header   __MG_CY3 From =~ /(person2\@example2|person3\@example3)\.ru/
meta     MG_CYRILLIC  (__MG_CY1 && !(__MG_CY2 || __MG_CY3))
score    MG_CYRILLIC  12.5

This works well for me and could be trivially adapted to any country,
but ymmv.    


Martin


Reply | Threaded
Open this post in threaded view
|

Re: Help with blocking Chinese Spam

Robert Schetterer
Am 13.03.2012 13:09, schrieb Martin Gregorie:

> On Tue, 2012-03-13 at 09:48 +0000, Jenny Lee wrote:
>>
>> Dear SA Users,
>>  
>> I am getting this chinese spam every hour. I tried, ok_locales,
>> ok_languages with texcat plugin... I tried matching the subject... but
>> these people are always getting through.
>>  
>> http://www.pastebin.ca/2127622
>>  
>> What rules/modifications do I need to do to get rid of this?
>>  
> If that UTF-8 prefix - =?utf-8?B? - is specific for Chinese, then a rule
> something like:
>
> header __FC1  From =~ /=?utf-8?B?/
> header __FC2  From =~ /\.cn>/i
> meta   FAKE_CHINESE  (__FC1 && !__FC2)
>
> might do it.
>
> Equally obviously, if all the spam is coming from Argentina, or
> pretending to come from there, and your users never correspond with
> anybody from that country, simply deep-six anything with that TLD in the
> sender's address. I use a modification of that to treay all mail from
> Russia as spam unless it comes from one of the three people I know
> there:
>
> describe MG_CYRILLIC  Russian cyrillic spam
> header   __MG_CY1 From =~ /\.ru>/
> header   __MG_CY2 From =~ /person1\@mail\.example1\.ru/
> header   __MG_CY3 From =~ /(person2\@example2|person3\@example3)\.ru/
> meta     MG_CYRILLIC  (__MG_CY1 && !(__MG_CY2 || __MG_CY3))
> score    MG_CYRILLIC  12.5
>
> This works well for me and could be trivially adapted to any country,
> but ymmv.    
>
>
> Martin
>
>

more trival, if the sender address is always the same reject it on mta
level, if you arent afraid about loosing other mail from this ip
mailserver reject the ip in total

--
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria
Reply | Threaded
Open this post in threaded view
|

Re: Help with blocking Chinese Spam

RW-15
In reply to this post by Jenny Lee-2
On Tue, 13 Mar 2012 09:48:37 +0000
Jenny Lee wrote:

>
>
> Dear SA Users,
>  
> I am getting this chinese spam every hour. I tried, ok_locales,
> ok_languages with texcat plugin... I tried matching the subject...
> but these people are always getting through.
> http://www.pastebin.ca/2127622 
> What rules/modifications do I need to do to get rid of this?
>  
> J  


You can enable the TextCat plugin in v310.pre and set
ok_languages. UNWANTED_LANGUAGE_BODY scores 2.8 which should help a lot.
Reply | Threaded
Open this post in threaded view
|

Re: Help with blocking Chinese Spam

RW-15
On Tue, 13 Mar 2012 12:14:36 +0000
RW wrote:

> On Tue, 13 Mar 2012 09:48:37 +0000
> Jenny Lee wrote:
>
> >
> >
> > Dear SA Users,
> >  
> > I am getting this chinese spam every hour. I tried, ok_locales,
> > ok_languages with texcat plugin... I tried matching the subject...
> > but these people are always getting through.
> > http://www.pastebin.ca/2127622 
> > What rules/modifications do I need to do to get rid of this?
> >  
> > J  
>
>
> You can enable the TextCat plugin in v310.pre and set
> ok_languages. UNWANTED_LANGUAGE_BODY scores 2.8 which should help a
> lot.

Sorry, I missed that you'd tried textcat, but I ran the example through
spamassassin and it did hit UNWANTED_LANGUAGE_BODY which is absent in
your headers. Are you sure you actually turned it on?
Reply | Threaded
Open this post in threaded view
|

Re: Help with blocking Chinese Spam

Dianne Skoll
In reply to this post by Jenny Lee-2
On Tue, 13 Mar 2012 09:48:37 +0000
Jenny Lee <[hidden email]> wrote:

> I am getting this chinese spam every hour. I tried, ok_locales,
> ok_languages with texcat plugin... I tried matching the subject...
> but these people are always getting through.
> http://www.pastebin.ca/2127622 
> What rules/modifications do I need to do to get rid of this?

We use this rule, but it's aggressive.  It will block any Chinese message
with a Word or Excel attachment.  For our user-base, that's fine, but YMMV.

Regards,

David.

# Chinese spams
header __RP_SUBJ_UTF8 Subject:raw =~/=\?utf-8\?B/i
header __RP_SUBJ_GB2312 Subject:raw =~ /=\?gb2312\?B/i
header __RP_SUBJ_CJK  Subject =~ /[\xe4-\xe9]/
full   __RP_8BIT_FNAME /name=.{0,30}[\x80-\xff]/
full   __RP_EXCEL /application\/vnd.ms-excel/i
full   __RP_DOC   /application\/msword/i
full   __RP_GB2312_FNAME /name=.?=\?gb2312\?/i
meta     RP_D_00032 (__RP_SUBJ_UTF8 && __RP_SUBJ_CJK && (__RP_EXCEL || __RP_DOC || __RP_8BIT_FNAME)) || (__RP_SUBJ_GB2312 && (__RP_GB2312_FNAME || __RP_EXCEL || __RP_DOC || __RP_8BIT_FNAME))
describe RP_D_00032 Looks like a Chinese spam
score RP_D_00032 5.0

Reply | Threaded
Open this post in threaded view
|

RE: Help with blocking Chinese Spam

Jenny Lee-2
In reply to this post by Jenny Lee-2
> Dear SA Users,
>
> I am getting this chinese spam every hour. I tried, ok_locales, ok_languages with texcat plugin... I tried matching the subject... but these people are always getting through.
>
> http://www.pastebin.ca/2127622
>
> What rules/modifications do I need to do to get rid of this?
>
> J

 
My wrong for omitting info. It would help to mention that this is a freaking botnet. So IP, email, country, etc... are all random.
 
J
Reply | Threaded
Open this post in threaded view
|

RE: Help with blocking Chinese Spam

Jenny Lee-2
In reply to this post by Dianne Skoll
> Date: Tue, 13 Mar 2012 08:25:21 -0400

> From: [hidden email]
> To: [hidden email]
> Subject: Re: Help with blocking Chinese Spam
>
> On Tue, 13 Mar 2012 09:48:37 +0000
> Jenny Lee <[hidden email]> wrote:
>
> > I am getting this chinese spam every hour. I tried, ok_locales,
> > ok_languages with texcat plugin... I tried matching the subject...
> > but these people are always getting through.
> > http://www.pastebin.ca/2127622
> > What rules/modifications do I need to do to get rid of this?
>
> We use this rule, but it's aggressive. It will block any Chinese message
> with a Word or Excel attachment. For our user-base, that's fine, but YMMV.
>
> Regards,
>
> David.
>
> # Chinese spams
> header __RP_SUBJ_UTF8 Subject:raw =~/=\?utf-8\?B/i
> header __RP_SUBJ_GB2312 Subject:raw =~ /=\?gb2312\?B/i
> header __RP_SUBJ_CJK Subject =~ /[\xe4-\xe9]/
> full __RP_8BIT_FNAME /name=.{0,30}[\x80-\xff]/
> full __RP_EXCEL /application\/vnd.ms-excel/i
> full __RP_DOC /application\/msword/i
> full __RP_GB2312_FNAME /name=.?=\?gb2312\?/i
> meta RP_D_00032 (__RP_SUBJ_UTF8 && __RP_SUBJ_CJK && (__RP_EXCEL || __RP_DOC || __RP_8BIT_FNAME)) || (__RP_SUBJ_GB2312 && (__RP_GB2312_FNAME || __RP_EXCEL || __RP_DOC || __RP_8BIT_FNAME))
> describe RP_D_00032 Looks like a Chinese spam
> score RP_D_00032 5.0
>

Thank you David.
 
Will give this a go. What I don't understand is that... Why is this not catching this 'utf' which is on the subject?
 
I used this for testing purposes. It catches other botnet headers like 'Experian', etc.
 
header XX_CUSTOM_HEADER Subject =~ /Experian|\$1500|to your account on file today|into your account today|video|clip|movie| vid|episode|utf/i
score XX_CUSTOM_HEADER 8.0
describe XX_CUSTOM_HEADER XX Custom Rules - Header
 
J
Reply | Threaded
Open this post in threaded view
|

RE: Help with blocking Chinese Spam

Jenny Lee-2
In reply to this post by RW-15
> Date: Tue, 13 Mar 2012 12:19:38 +0000

> From: [hidden email]
> To: [hidden email]
> Subject: Re: Help with blocking Chinese Spam
>
> On Tue, 13 Mar 2012 12:14:36 +0000
> RW wrote:
>
> > On Tue, 13 Mar 2012 09:48:37 +0000
> > Jenny Lee wrote:
> >
> > >
> > >
> > > Dear SA Users,
> > >
> > > I am getting this chinese spam every hour. I tried, ok_locales,
> > > ok_languages with texcat plugin... I tried matching the subject...
> > > but these people are always getting through.
> > > http://www.pastebin.ca/2127622
> > > What rules/modifications do I need to do to get rid of this?
> > >
> > > J
> >
> >
> > You can enable the TextCat plugin in v310.pre and set
> > ok_languages. UNWANTED_LANGUAGE_BODY scores 2.8 which should help a
> > lot.
>
> Sorry, I missed that you'd tried textcat, but I ran the example through
> spamassassin and it did hit UNWANTED_LANGUAGE_BODY which is absent in
> your headers. Are you sure you actually turned it on?
I did turn it on in the .pre. It is also supposed to add a header, but it does not. How can I check if it is working or not?
 
I have:
 
ok_locales en
ok_languages en
 
Jenny
Reply | Threaded
Open this post in threaded view
|

RE: Help with blocking Chinese Spam

Daniel Lemke
Jenny Lee-2 wrote
I did turn it on in the .pre. It is also supposed to add a header, but it does not. How can I check if it is working or not?
 
I have:
 
ok_locales en
ok_languages en
 
Jenny    

Add this to your config file:

add_header all Language _LANGUAGES_
Reply | Threaded
Open this post in threaded view
|

RE: Help with blocking Chinese Spam

Jenny Lee-2
In reply to this post by Martin Gregorie-2


> Subject: Re: Help with blocking Chinese Spam
> From: [hidden email]
> To: [hidden email]
> Date: Tue, 13 Mar 2012 12:09:19 +0000
>
> On Tue, 2012-03-13 at 09:48 +0000, Jenny Lee wrote:
> >
> > Dear SA Users,
> >
> > I am getting this chinese spam every hour. I tried, ok_locales,
> > ok_languages with texcat plugin... I tried matching the subject... but
> > these people are always getting through.
> >
> > http://www.pastebin.ca/2127622
> >
> > What rules/modifications do I need to do to get rid of this?
> >
> If that UTF-8 prefix - =?utf-8?B? - is specific for Chinese, then a rule
> something like:
>
> header __FC1 From =~ /=?utf-8?B?/
> header __FC2 From =~ /\.cn>/i
> meta FAKE_CHINESE (__FC1 && !__FC2)
>
> might do it.

 
Dear Martin,
 
Thank you for your input.
 
Subject is always with utf-8. From is half of the time with utf-8.
 
I checked our regular mail and we never have utf-8 in the subject from anyone (last 2 months check).
 
Can some expert advise on blocking based on this utf-8 in the subject?
 
 

> Equally obviously, if all the spam is coming from Argentina,
 
Botnet. Country is not relevant on this.
 
Jenny    
Reply | Threaded
Open this post in threaded view
|

Re: Help with blocking Chinese Spam

Jared Hall-2
In reply to this post by Jenny Lee-2
Thank you David.
 
Will give this a go. What I don't understand is that... Why is this not catching this 'utf' which is on the subject?
 
I used this for testing purposes. It catches other botnet headers like 'Experian', etc.
 
header XX_CUSTOM_HEADER Subject =~ /Experian|\$1500|to your account on file today|into your account today|video|clip|movie| vid|episode|utf/i
score XX_CUSTOM_HEADER 8.0
describe XX_CUSTOM_HEADER XX Custom Rules - Header
 
J
Try: Subject:raw

From the manual:

Appending :raw to the header name will inhibit decoding of quoted-printable or base-64 encoded strings.


Regards,

Jared Hall

Reply | Threaded
Open this post in threaded view
|

Re: Help with blocking Chinese Spam

Jari Fredriksson
In reply to this post by Jenny Lee-2
13.3.2012 14:40, Jenny Lee kirjoitti:

>> Date: Tue, 13 Mar 2012 08:25:21 -0400
>> From: [hidden email]
>> To: [hidden email]
>> Subject: Re: Help with blocking Chinese Spam
>>
>> On Tue, 13 Mar 2012 09:48:37 +0000
>> Jenny Lee <[hidden email]> wrote:
>>
>> > I am getting this chinese spam every hour. I tried, ok_locales,
>> > ok_languages with texcat plugin... I tried matching the subject...
>> > but these people are always getting through.
>> > http://www.pastebin.ca/2127622
>> > What rules/modifications do I need to do to get rid of this?
>>
>> We use this rule, but it's aggressive. It will block any Chinese message
>> with a Word or Excel attachment. For our user-base, that's fine, but YMMV.
>>
>> Regards,
>>
>> David.
>>
>> # Chinese spams
>> header __RP_SUBJ_UTF8 Subject:raw =~/=\?utf-8\?B/i
>> header __RP_SUBJ_GB2312 Subject:raw =~ /=\?gb2312\?B/i
>> header __RP_SUBJ_CJK Subject =~ /[\xe4-\xe9]/
>> full __RP_8BIT_FNAME /name=.{0,30}[\x80-\xff]/
>> full __RP_EXCEL /application\/vnd.ms-excel/i
>> full __RP_DOC /application\/msword/i
>> full __RP_GB2312_FNAME /name=.?=\?gb2312\?/i
>> meta RP_D_00032 (__RP_SUBJ_UTF8 && __RP_SUBJ_CJK && (__RP_EXCEL ||
> __RP_DOC || __RP_8BIT_FNAME)) || (__RP_SUBJ_GB2312 && (__RP_GB2312_FNAME
> || __RP_EXCEL || __RP_DOC || __RP_8BIT_FNAME))
>> describe RP_D_00032 Looks like a Chinese spam
>> score RP_D_00032 5.0
>>
>
> Thank you David.
>  
> Will give this a go. What I don't understand is that... Why is this not
> catching this 'utf' which is on the subject?
>  
> I used this for testing purposes. It catches other botnet headers like
> 'Experian', etc.
>  
> header XX_CUSTOM_HEADER Subject =~ /Experian|\$1500|to your account on
> file today|into your account today|video|clip|movie| vid|episode|utf/i
> score XX_CUSTOM_HEADER 8.0
> describe XX_CUSTOM_HEADER XX Custom Rules - Header
>  
> J
Subject:raw catches the UTF format, Subject catches a subject containing
text "utf".



--

Today's weirdness is tomorrow's reason why.
                -- Hunter S. Thompson


signature.asc (268 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Help with blocking Chinese Spam

Jenny Lee-2
In reply to this post by Daniel Lemke
> Date: Tue, 13 Mar 2012 05:47:03 -0700

> From: [hidden email]
> To: [hidden email]
> Subject: RE: Help with blocking Chinese Spam
>
>
>
> Jenny Lee-2 wrote:
> >
> > I did turn it on in the .pre. It is also supposed to add a header, but it
> > does not. How can I check if it is working or not?
> >
> > I have:
> >
> > ok_locales en
> > ok_languages en
> >
> > Jenny
> >
>
>
> Add this to your config file:
>
> add_header all Language _LANGUAGES_
 
This adds the header. Thank you.
 
However, running: spamassassin -D < chinesespam
 
Does not catch this.
 
Jenny
 
Mar 13 17:06:36.294 [27011] dbg: plugin: Mail::SpamAssassin::Plugin::TextCat=HASH(0x1d50bc8) implements 'extract_metadata', priority 0
Mar 13 17:06:36.294 [27011] dbg: message: ---- MIME PARSER START ----
Mar 13 17:06:36.295 [27011] dbg: message: parsing multipart, got boundary: ----=_NextPart_000_004F_0181A2CA.182A5CF0
Mar 13 17:06:36.295 [27011] dbg: message: found part of type multipart/alternative, boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
Mar 13 17:06:36.296 [27011] dbg: message: added part, type: multipart/alternative
Mar 13 17:06:36.299 [27011] dbg: message: found part of type application/vndms-excel, boundary: ----=_NextPart_000_004F_0181A2CA.182A5CF0
Mar 13 17:06:36.299 [27011] dbg: message: added part, type: application/vndms-excel
Mar 13 17:06:36.299 [27011] dbg: message: parsing multipart, got boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
Mar 13 17:06:36.300 [27011] dbg: message: found part of type text/plain, boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
Mar 13 17:06:36.300 [27011] dbg: message: added part, type: text/plain
Mar 13 17:06:36.301 [27011] dbg: message: found part of type text/html, boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
Mar 13 17:06:36.301 [27011] dbg: message: added part, type: text/html
Mar 13 17:06:36.301 [27011] dbg: message: parsing normal part
Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
Mar 13 17:06:36.302 [27011] dbg: message: ---- MIME PARSER END ----
Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
Mar 13 17:06:36.310 [27011] dbg: textcat: classifying, skipping: yi sco lv is bs sl la ga sa eu et rm cy eo fy gd lt
Mar 13 17:06:36.328 [27011] dbg: textcat: can't determine language uniquely enough
Mar 13 17:06:36.328 [27011] dbg: textcat: X-Languages: "", X-Languages-Length: 671
Reply | Threaded
Open this post in threaded view
|

Re: Help with blocking Chinese Spam

Dianne Skoll
In reply to this post by Jenny Lee-2
On Tue, 13 Mar 2012 12:40:16 +0000
Jenny Lee <[hidden email]> wrote:

> Will give this a go. What I don't understand is that... Why is this
> not catching this 'utf' which is on the subject?

You need the :raw tag to see the raw, unencoded header.  The meta-rule:

    header __RP_SUBJ_CJK Subject =~ /[\xe4-\xe9]/

attempts to limit matches on UTF-8 subjects to Chinese characters
because the leading bytes e4-e9 in UTF-8 (mostly) cover CJK
ideographs.  It's not a perfect filter, but blocking all UTF-8-encoded
subjects would yield way too many FPs for us.

Regards,

David.

PS: I haven't looked at SA's Bayes implementation.  Can it handle
words in non-western character sets properly?
Reply | Threaded
Open this post in threaded view
|

RE: Help with blocking Chinese Spam

Daniel Lemke
In reply to this post by Jenny Lee-2

Jenny Lee-2 wrote
> Date: Tue, 13 Mar 2012 05:47:03 -0700
> From: lemke@jam-software.com
> To: users@spamassassin.apache.org
> Subject: RE: Help with blocking Chinese Spam
>
>
>
> Jenny Lee-2 wrote:
> >
> > I did turn it on in the .pre. It is also supposed to add a header, but it
> > does not. How can I check if it is working or not?
> >
> > I have:
> >
> > ok_locales en
> > ok_languages en
> >
> > Jenny
> >
>
>
> Add this to your config file:
>
> add_header all Language _LANGUAGES_
 
This adds the header. Thank you.
 
However, running: spamassassin -D < chinesespam
 
Does not catch this.
 
Jenny
 
Mar 13 17:06:36.294 [27011] dbg: plugin: Mail::SpamAssassin::Plugin::TextCat=HASH(0x1d50bc8) implements 'extract_metadata', priority 0
Mar 13 17:06:36.294 [27011] dbg: message: ---- MIME PARSER START ----
Mar 13 17:06:36.295 [27011] dbg: message: parsing multipart, got boundary: ----=_NextPart_000_004F_0181A2CA.182A5CF0
Mar 13 17:06:36.295 [27011] dbg: message: found part of type multipart/alternative, boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
Mar 13 17:06:36.296 [27011] dbg: message: added part, type: multipart/alternative
Mar 13 17:06:36.299 [27011] dbg: message: found part of type application/vndms-excel, boundary: ----=_NextPart_000_004F_0181A2CA.182A5CF0
Mar 13 17:06:36.299 [27011] dbg: message: added part, type: application/vndms-excel
Mar 13 17:06:36.299 [27011] dbg: message: parsing multipart, got boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
Mar 13 17:06:36.300 [27011] dbg: message: found part of type text/plain, boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
Mar 13 17:06:36.300 [27011] dbg: message: added part, type: text/plain
Mar 13 17:06:36.301 [27011] dbg: message: found part of type text/html, boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
Mar 13 17:06:36.301 [27011] dbg: message: added part, type: text/html
Mar 13 17:06:36.301 [27011] dbg: message: parsing normal part
Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
Mar 13 17:06:36.302 [27011] dbg: message: ---- MIME PARSER END ----
Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
Mar 13 17:06:36.310 [27011] dbg: textcat: classifying, skipping: yi sco lv is bs sl la ga sa eu et rm cy eo fy gd lt
Mar 13 17:06:36.328 [27011] dbg: textcat: can't determine language uniquely enough
Mar 13 17:06:36.328 [27011] dbg: textcat: X-Languages: "", X-Languages-Length: 671    


Looks like textcat is not working properly if the message is encoded. For the mail you posted on pastebin, textcat guessed "ja.shift-jis" which then triggered UNWANTED_LANGUAGE_BODY.

However, for other chinese spam that got through these days it was either not able to guess the language or it even guessed "en" as language.

Is this a general problem with SpamAssassin not really able to decode that sort of mails?

Daniel
Reply | Threaded
Open this post in threaded view
|

RE: Help with blocking Chinese Spam

Jenny Lee-2
In reply to this post by Dianne Skoll
> Date: Tue, 13 Mar 2012 09:14:10 -0400

> From: [hidden email]
> To: [hidden email]
> Subject: Re: Help with blocking Chinese Spam
>
> On Tue, 13 Mar 2012 12:40:16 +0000
> Jenny Lee <[hidden email]> wrote:
>
> > Will give this a go. What I don't understand is that... Why is this
> > not catching this 'utf' which is on the subject?
>
> You need the :raw tag to see the raw, unencoded header. The meta-rule:
>
> header __RP_SUBJ_CJK Subject =~ /[\xe4-\xe9]/
>
> attempts to limit matches on UTF-8 subjects to Chinese characters
> because the leading bytes e4-e9 in UTF-8 (mostly) cover CJK
> ideographs. It's not a perfect filter, but blocking all UTF-8-encoded
> subjects would yield way too many FPs for us.
>
> Regards,
>
> David.
>
> PS: I haven't looked at SA's Bayes implementation. Can it handle
> words in non-western character sets properly?

Thank you David, Jared and Jari.
 
Adding:
Subject:raw =~/=\?utf-8\?B/i
Subject =~ /[\xe4-\xe9]/
 
caused this crap get caught. Both works, so I will keep David's advice.
 
So I think I will just remove this TexCat plugin which does not identify it properly.
 
This is great list, thanks again for everyone. All help appreciated.
 
Jenny
Reply | Threaded
Open this post in threaded view
|

Re: Help with blocking Chinese Spam

John Hardin
In reply to this post by Dianne Skoll
On Tue, 13 Mar 2012, David F. Skoll wrote:

> PS: I haven't looked at SA's Bayes implementation.  Can it handle
> words in non-western character sets properly?

It seems to. All of the Chinese-language spam I get hits BAYES_99.

Make sure you train bayes with this garbage!

--
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  [hidden email]    FALaholic #11174     pgpk -a [hidden email]
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Windows and its users got mentioned at home today, after my wife the
  psych major brought up Seligman's theory of "learned helplessness."
                                              -- Dan Birchall in a.s.r
-----------------------------------------------------------------------
  Tomorrow: Albert Einstein's 133rd Birthday
Reply | Threaded
Open this post in threaded view
|

RE: Help with blocking Chinese Spam

Jenny Lee-2
> Date: Tue, 13 Mar 2012 06:42:05 -0700

> From: [hidden email]
> To: [hidden email]
> Subject: Re: Help with blocking Chinese Spam
>
> On Tue, 13 Mar 2012, David F. Skoll wrote:
>
> > PS: I haven't looked at SA's Bayes implementation. Can it handle
> > words in non-western character sets properly?
>
> It seems to. All of the Chinese-language spam I get hits BAYES_99.
>
> Make sure you train bayes with this garbage!
 
I did train with with these Chinese spam I got but it did not work. That is why I turned to the list. Otherwise my bayes db catches everything very accurately for me.
 
Jenny
Reply | Threaded
Open this post in threaded view
|

Re: Help with blocking Chinese Spam

Henrik K
In reply to this post by Daniel Lemke
On Tue, Mar 13, 2012 at 06:17:53AM -0700, Daniel Lemke wrote:

>
>
>
> Jenny Lee-2 wrote:
> >
> >
> >> Date: Tue, 13 Mar 2012 05:47:03 -0700
> >> From: [hidden email]
> >> To: [hidden email]
> >> Subject: RE: Help with blocking Chinese Spam
> >>
> >>
> >>
> >> Jenny Lee-2 wrote:
> >> >
> >> > I did turn it on in the .pre. It is also supposed to add a header, but
> >> it
> >> > does not. How can I check if it is working or not?
> >> >
> >> > I have:
> >> >
> >> > ok_locales en
> >> > ok_languages en
> >> >
> >> > Jenny
> >> >
> >>
> >>
> >> Add this to your config file:
> >>
> >> add_header all Language _LANGUAGES_
> >  
> > This adds the header. Thank you.
> >  
> > However, running: spamassassin -D < chinesespam
> >  
> > Does not catch this.
> >  
> > Jenny
> >  
> > Mar 13 17:06:36.294 [27011] dbg: plugin:
> > Mail::SpamAssassin::Plugin::TextCat=HASH(0x1d50bc8) implements
> > 'extract_metadata', priority 0
> > Mar 13 17:06:36.294 [27011] dbg: message: ---- MIME PARSER START ----
> > Mar 13 17:06:36.295 [27011] dbg: message: parsing multipart, got boundary:
> > ----=_NextPart_000_004F_0181A2CA.182A5CF0
> > Mar 13 17:06:36.295 [27011] dbg: message: found part of type
> > multipart/alternative, boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
> > Mar 13 17:06:36.296 [27011] dbg: message: added part, type:
> > multipart/alternative
> > Mar 13 17:06:36.299 [27011] dbg: message: found part of type
> > application/vndms-excel, boundary:
> > ----=_NextPart_000_004F_0181A2CA.182A5CF0
> > Mar 13 17:06:36.299 [27011] dbg: message: added part, type:
> > application/vndms-excel
> > Mar 13 17:06:36.299 [27011] dbg: message: parsing multipart, got boundary:
> > ----=_NextPart_001_034A_0181A2CA.182A5CF0
> > Mar 13 17:06:36.300 [27011] dbg: message: found part of type text/plain,
> > boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
> > Mar 13 17:06:36.300 [27011] dbg: message: added part, type: text/plain
> > Mar 13 17:06:36.301 [27011] dbg: message: found part of type text/html,
> > boundary: ----=_NextPart_001_034A_0181A2CA.182A5CF0
> > Mar 13 17:06:36.301 [27011] dbg: message: added part, type: text/html
> > Mar 13 17:06:36.301 [27011] dbg: message: parsing normal part
> > Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
> > Mar 13 17:06:36.302 [27011] dbg: message: parsing normal part
> > Mar 13 17:06:36.302 [27011] dbg: message: ---- MIME PARSER END ----
> > Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
> > Mar 13 17:06:36.303 [27011] dbg: message: decoding base64
> > Mar 13 17:06:36.310 [27011] dbg: textcat: classifying, skipping: yi sco lv
> > is bs sl la ga sa eu et rm cy eo fy gd lt
> > Mar 13 17:06:36.328 [27011] dbg: textcat: can't determine language
> > uniquely enough
> > Mar 13 17:06:36.328 [27011] dbg: textcat: X-Languages: "",
> > X-Languages-Length: 671    
> >
>
>
>
> Looks like textcat is not working properly if the message is encoded. For
> the mail you posted on pastebin, textcat guessed "ja.shift-jis" which then
> triggered UNWANTED_LANGUAGE_BODY.
>
> However, for other chinese spam that got through these days it was either
> not able to guess the language or it even guessed "en" as language.
>
> Is this a general problem with SpamAssassin not really able to decode that
> sort of mails?


Atleast try 3.3.2 since it has textcat fixes.
(that pastebin shows 3.3.1 as version)

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6229

12