Using UTF-8 characters to avoid spam filter rules.

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Using UTF-8 characters to avoid spam filter rules.

Mark London
Hi - Some of the words in the spam email below, are using UTF-8 characters, to avoid spam detection.  I.e. the phrase "bitcoin wallet address", are not the simple ASCII characters that they appear to be.

View the source of my email, to understand what I'm talking about.  Is there any rule I canu se, to detect messages that are mostly plain ASCII characters, but are using enough UTF-8 characters, that obviously have been put in to avoid spam rules?   Thanks. - Mark

-------- Forwarded Message --------
Subject: GKJ: [[hidden email]] 26.06.2018 03:39:27 You can easily get off
Date: Tue, 26 Jun 2018 8:39:27 +0800
From: Kash Cedeno [hidden email]
Organization: zccdvgwtlekz
To: [hidden email]


Tiскеt Details: GKJ-686-81085
Email: [hidden email]
Camera ready,Notification: 26.06.2018 03:39:27
Status: Waiting for Reply 76xuWaCy7A0f11wJnXmAkO3WrK8Cy96Du8_Priority: Normal

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



What's up,


If you were more alert while playing with yourself, I wouldn't worry you. I don't think that playing with yourself is very awful, but when all your friends, relatives, сolleagues get video of it- it is unpleasant for u.

I adjusted malisious soft on a web-site for adults (with porn) which you have visited. When the object tap on a play button, device begins recording the screen and all cameras on ur device begins working.

Moreover, soft makes a dedicated desktop supplied with key logger function from your device , so I was able to collect all contacts from your e-mail, messengers and other social networks. I'm writing on this e-mail cuz It's your working address, so u must check it.

In my opinion 410 usd is pretty enough for this little false. I made a split screen vid(records from screen (interesting category ) and camera ohh... its funny AF)

So its your choice, if u want me to delete this сompromising evidence use my bitcоin wаllеt аddrеss-  1BkpfU6f7KJXxuc3Yg75cHC8kCJCT2xow4 
You have one day after opening my message, I put the special tracking pixel in it, so when you will open it I will see.If ya want me to share proofs with ya, reply on this letter and I will send my creation to five contacts that I've got from ur contacts.

P.S. You are able to complain to cops, but I don't think that they can solve ur problem, the inquisition will last for one year- I'm from Ukraine - so I dgf LOL



Reply | Threaded
Open this post in threaded view
|

Re: Using UTF-8 characters to avoid spam filter rules.

Benny Pedersen-2
Mark London skrev den 2018-06-26 06:33:
> Hi - Some of the words in the spam email below, are using UTF-8
> characters, to avoid spam detection.  I.e. the phrase "bitcoin wallet
> address", are not the simple ASCII characters that they appear to be.

sa-laern --spam spam-msg-file

> View the source of my email, to understand what I'm talking about.  Is
> there any rule I canu se, to detect messages that are mostly plain
> ASCII characters, but are using enough UTF-8 characters, that
> obviously have been put in to avoid spam rules?   Thanks. - Mark

no rule is needed, train bayes

i could help more if you provided original msg on pastebin

> So its your choice, if u want me to delete this сompromising evidence
> use my bitcоin wаllеt аddrеss-
> 1BkpfU6f7KJXxuc3Yg75cHC8kCJCT2xow4

btc address is not changing lenght :=)

body BTC_ADDRS /^[a-zA-Z0-9]{lenght}$/

untested
Reply | Threaded
Open this post in threaded view
|

Re: Using UTF-8 characters to avoid spam filter rules.

RW-15
In reply to this post by Mark London
On Tue, 26 Jun 2018 00:33:11 -0400
Mark London wrote:

> Hi - Some of the words in the spam email below, are using UTF-8
> characters, to avoid spam detection.  I.e. the phrase "bitcoin wallet
> address", are not the simple ASCII characters that they appear to be.
>
> View the source of my email, to understand what I'm talking about. Is
> there any rule I canu se, to detect messages that are mostly plain
> ASCII characters, but are using enough UTF-8 characters, that
> obviously have been put in to avoid spam rules?  

You can test for specific obfuscated words like this:

body            FUZZY_BITCOIN       /<B>(?!itcoin)<I><T><C><O><I><N>/i
replace_rules   FUZZY_BITCOIN


For anything more general you'd have to match on lookalike characters
from non-roman codepages embedded in ASCII (or roman) words. Finding
Accented characters or general multibyte UTF-8 is not particularly
suspicious.
Reply | Threaded
Open this post in threaded view
|

Re: Using UTF-8 characters to avoid spam filter rules.

Mark London
In reply to this post by Mark London
On 6/28/2018 1:46 PM, [hidden email] wrote:
Subject:
Re: Using UTF-8 characters to avoid spam filter rules.
From:
RW [hidden email]
Date:
6/26/2018 12:12 PM
To:
[hidden email]

On Tue, 26 Jun 2018 00:33:11 -0400
Mark London wrote:

Hi - Some of the words in the spam email below, are using UTF-8 
characters, to avoid spam detection.  I.e. the phrase "bitcoin wallet 
address", are not the simple ASCII characters that they appear to be.

View the source of my email, to understand what I'm talking about. Is 
there any rule I canu se, to detect messages that are mostly plain
ASCII characters, but are using enough UTF-8 characters, that
obviously have been put in to avoid spam rules?   
You can test for specific obfuscated words like this:

body            FUZZY_BITCOIN       /<B>(?!itcoin)<I><T><C><O><I><N>/i
replace_rules   FUZZY_BITCOIN


For anything more general you'd have to match on lookalike characters
from non-roman codepages embedded in ASCII (or roman) words. Finding
Accented characters or general multibyte UTF-8 is not particularly
suspicious. 

Thanks for the info.   I had never come across this issue before, and was afraid that more spammer would start doing it.  

In which case, I would think that if a plain text message contained a lot of "suspicious" multibyte UTF-8 characters embedded into roman characters words , that this would make it suspicious enough to flag.   However, for now, this spam message was the only one I've seen like that. So I won't worry about it for now.

- Mark
Reply | Threaded
Open this post in threaded view
|

Re: Using UTF-8 characters to avoid spam filter rules.

Zinski, Steve

I see that a lot in sextortion emails. So far, I’ve seen the word “bitcoin” encoded (obfuscated) the following ways:

 

bitc%D0%BEin

bit%D1%81oin

bit%D1%81%D0%BEin

 

And the word “wallet” as:

 

w%D0%B0ll%D0%B5t

 

These sextortion scammers are clever. So, instead of filtering on the word “bitcoin”, I now filter on a bitcoin regex (see below) and some other words such as “pixel”, “virus”, etc. which are always a part of the sextortion message.

 

body      __BITCOIN          /\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b/

 

Steve

 

 

 

 

From: Mark London <[hidden email]>
Date: Thursday, June 28, 2018 at 2:26 PM
To: "[hidden email]" <[hidden email]>
Subject: Re: Using UTF-8 characters to avoid spam filter rules.

 

On 6/28/2018 1:46 PM, [hidden email] wrote:

Subject:

Re: Using UTF-8 characters to avoid spam filter rules.

From:

RW [hidden email]

Date:

6/26/2018 12:12 PM

 

To:

[hidden email]

 

On Tue, 26 Jun 2018 00:33:11 -0400
Mark London wrote:
 
Hi - Some of the words in the spam email below, are using UTF-8 
characters, to avoid spam detection.  I.e. the phrase "bitcoin wallet 
address", are not the simple ASCII characters that they appear to be.
 
View the source of my email, to understand what I'm talking about. Is 
there any rule I canu se, to detect messages that are mostly plain
ASCII characters, but are using enough UTF-8 characters, that
obviously have been put in to avoid spam rules?   
You can test for specific obfuscated words like this:
 
body            FUZZY_BITCOIN       /<B>(?!itcoin)<I><T><C><O><I><N>/i
replace_rules   FUZZY_BITCOIN
 
 
For anything more general you'd have to match on lookalike characters
from non-roman codepages embedded in ASCII (or roman) words. Finding
Accented characters or general multibyte UTF-8 is not particularly
suspicious. 


Thanks for the info.   I had never come across this issue before, and was afraid that more spammer would start doing it.  

In which case, I would think that if a plain text message contained a lot of "suspicious" multibyte UTF-8 characters embedded into roman characters words , that this would make it suspicious enough to flag.   However, for now, this spam message was the only one I've seen like that. So I won't worry about it for now.

- Mark

Reply | Threaded
Open this post in threaded view
|

Re: Using UTF-8 characters to avoid spam filter rules.

John Hardin
On Thu, 28 Jun 2018, Zinski, Steve wrote:

> I see that a lot in sextortion emails. So far, I’ve seen the word “bitcoin” encoded (obfuscated) the following ways:
>
> bitc%D0%BEin
> bit%D1%81oin
> bit%D1%81%D0%BEin
>
> And the word “wallet” as:
>
> w%D0%B0ll%D0%B5t
>
> These sextortion scammers are clever. So, instead of filtering on the word “bitcoin”, I now filter on a bitcoin regex (see below) and some other words such as “pixel”, “virus”, etc. which are always a part of the sextortion message.
>
> body      __BITCOIN          /\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b/
Ok, I've added those to my sandbox in case those are common. I wouldn't
know, I generally get lots of 419 fraud and photo retouching spams
instead... :)



--
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  [hidden email]    FALaholic #11174     pgpk -a [hidden email]
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The one political issue that strips all politicians bare is
   individual gun rights.
-----------------------------------------------------------------------
  6 days until the 242nd anniversary of the Declaration of Independence
Reply | Threaded
Open this post in threaded view
|

Re: Using UTF-8 characters to avoid spam filter rules.

Alex Regan
In reply to this post by Zinski, Steve


On Thu, Jun 28, 2018 at 3:59 PM, Zinski, Steve <[hidden email]> wrote:

I see that a lot in sextortion emails. So far, I’ve seen the word “bitcoin” encoded (obfuscated) the following ways:

 

bitc%D0%BEin

bit%D1%81oin

bit%D1%81%D0%BEin

 

And the word “wallet” as:

 

w%D0%B0ll%D0%B5t

 

These sextortion scammers are clever. So, instead of filtering on the word “bitcoin”, I now filter on a bitcoin regex (see below) and some other words such as “pixel”, “virus”, etc. which are always a part of the sextortion message.

 

body      __BITCOIN          /\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b/


This rule is creating false positives:


alex


 

 

Steve

 

 

 

 

From: Mark London <[hidden email]>
Date: Thursday, June 28, 2018 at 2:26 PM
To: "[hidden email]" <[hidden email]>


Subject: Re: Using UTF-8 characters to avoid spam filter rules.

 

On 6/28/2018 1:46 PM, [hidden email] wrote:

Subject:

Re: Using UTF-8 characters to avoid spam filter rules.

From:

RW [hidden email]

Date:

6/26/2018 12:12 PM

 

To:

[hidden email]

 

On Tue, 26 Jun 2018 00:33:11 -0400
Mark London wrote:
 
Hi - Some of the words in the spam email below, are using UTF-8 
characters, to avoid spam detection.  I.e. the phrase "bitcoin wallet 
address", are not the simple ASCII characters that they appear to be.
 
View the source of my email, to understand what I'm talking about. Is 
there any rule I canu se, to detect messages that are mostly plain
ASCII characters, but are using enough UTF-8 characters, that
obviously have been put in to avoid spam rules?   
You can test for specific obfuscated words like this:
 
body            FUZZY_BITCOIN       /<B>(?!itcoin)<I><T><C><O><I><N>/i
replace_rules   FUZZY_BITCOIN
 
 
For anything more general you'd have to match on lookalike characters
from non-roman codepages embedded in ASCII (or roman) words. Finding
Accented characters or general multibyte UTF-8 is not particularly
suspicious. 


Thanks for the info.   I had never come across this issue before, and was afraid that more spammer would start doing it.  

In which case, I would think that if a plain text message contained a lot of "suspicious" multibyte UTF-8 characters embedded into roman characters words , that this would make it suspicious enough to flag.   However, for now, this spam message was the only one I've seen like that. So I won't worry about it for now.

- Mark


Reply | Threaded
Open this post in threaded view
|

Re: Using UTF-8 characters to avoid spam filter rules.

John Hardin
On Fri, 29 Jun 2018, Alex wrote:

> On Thu, Jun 28, 2018 at 3:59 PM, Zinski, Steve <[hidden email]> wrote:
>
>> These sextortion scammers are clever. So, instead of filtering on the word
>> “bitcoin”, I now filter on a bitcoin regex (see below) and some other words
>> such as “pixel”, “virus”, etc. which are always a part of the sextortion
>> message.
>>
>> body      __BITCOIN          /\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b/
>>
>
> This rule is creating false positives:
>
> If your email program has trouble displaying this email, view it as a web
> page   [
> http://s255356359.t.en25.com/e/es?s=255356359&e=6361&elqTrackId=78D8A052C380BCBFF284D754BEBE9730&elq=1dc278553a2445bb88bcc9b73bf4ef85&elqaid=57&elqat=1
> ]
@steve: could you pastebin a couple of sextortion spamples for me pls?
Thanks.


--
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  [hidden email]    FALaholic #11174     pgpk -a [hidden email]
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The tree of freedom must be freshened from time to time
   with the blood of tyrants and tyrannosaurs.
                      -- DW, commenting on the GM6 Lynx .50BMG bullpup
-----------------------------------------------------------------------
  5 days until the 242nd anniversary of the Declaration of Independence
Reply | Threaded
Open this post in threaded view
|

Re: Using UTF-8 characters to avoid spam filter rules.

RW-15
In reply to this post by Alex Regan
On Fri, 29 Jun 2018 10:20:45 -0400
Alex wrote:

> On Thu, Jun 28, 2018 at 3:59 PM, Zinski, Steve <[hidden email]>
> wrote:

> > These sextortion scammers are clever. So, instead of filtering on
> > the word “bitcoin”, I now filter on a bitcoin regex (see below)
> >
> >
> >
> > body      __BITCOIN          /\b[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b/
> >  
>
> This rule is creating false positives:


try:

body      __BITCOIN          /\b(?<!=)[13][a-km-zA-HJ-NP-Z1-9]{25,34}\b/