Question regarding auto-learning

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Question regarding auto-learning

J Doe
Hello,

I have a question regarding autolearning and Bayes functionality.

From reading the documentation, it appears that to train the Bayesian filter I require a minimum of 1,000 pieces of ham and 1,000 pieces of spam.  I am currently collecting spam on one of my servers via a spam trap address and slowly reaching that number.  I was wondering, though, if I can use auto learning (bayes_auto_learn 1), before training the database ?

When autolearn fires on messages at the moment, it is correctly detecting ham and spam based on the default ham and spam thresholds:

    bayes_auto_learn_threshold_nonspam 0.1
    bayes_auto_learn_threshold_spam 12.0

Can this be used before training the database or is it more often used to supplement (on an ongoing basis), a database that has already be trained ?

Thanks,

- J


Reply | Threaded
Open this post in threaded view
|

Re: Question regarding auto-learning

Matus UHLAR - fantomas
On 03.07.18 12:17, J Doe wrote:
> From reading the documentation, it appears that to train the Bayesian
> filter I require a minimum of 1,000 pieces of ham and 1,000 pieces of
> spam.

no. You need at least 200 hams and spams for bayes to start firing but you
can tune it bu setting bayes_min_ham_num and bayes_min_spam_num.

note that too few mails trained can result in false positives/negatives.

> I am currently collecting spam on one of my servers via a spam trap
> address and slowly reaching that number.  I was wondering, though, if I
> can use auto learning (bayes_auto_learn 1), before training the database ?

autolearning does training instead of you. manual training is still faster
and more precise.

> When autolearn fires on messages at the moment, it is correctly detecting
> ham and spam based on the default ham and spam thresholds:
>
>    bayes_auto_learn_threshold_nonspam 0.1
>    bayes_auto_learn_threshold_spam 12.0
>
> Can this be used before training the database or is it more often used to
> supplement (on an ongoing basis), a database that has already be trained ?

those don't contradict each other.
you can use manual and automatic learning both.

--
Matus UHLAR - fantomas, [hidden email] ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Chernobyl was an Windows 95 beta test site.