# Mailserver with ZFS HAST and CARP



## Sylhouette (Aug 22, 2011)

Hello all.

We have a mailserver running on FreeBSD 8.2.
Some time ago we had a power glitch, and the server went down.

You guess it  real panic !!

Now we want some failover.
I was thinking about ZFS on top of HAST and carp to switch between the machines if things drop out.

The Mailserver is build with the following software.

Postfix
Dovecot
Apache (roundcube webmail)
Mysql Server
PostgreSQL (davical)

and that is about it.

My main question is, is this do able.

And is it possible to make the switch if postfix or dovecot dies?

My concerns are with HAST and the mail storage.

We have about 500 GB mail stored, so it could even be done without ZFS on one 2 TB disk.
If not going with ZFS, can i use a gmirror with HAST?
So that i at least have a mirror of my data on the master.
I would like ZFS, because i can use some smaller disk and use raidz{2|3} whatever

My main concern is in case of a server shutdown, are things going to be working flawlessly.
And what if the Master comes back up.

So master dies, and all new data(mails) is stored on the Slave hast providers, what happens if the master gets alive again.
Does it promote itself back as master? and if so how does it cope with the new data on the slave server?

I nice moment to switch to 9.0 also 

Thaks for your time.
regards,
Johan Hendriks


----------



## SirDice (Aug 22, 2011)

Sylhouette said:
			
		

> We have a mailserver running on FreeBSD 8.2.
> Some time ago we had a power glitch, and the server went down.
> 
> You guess it  real panic !!
> ...


Keep in mind that it isn't going to work if the failover gets the same power. I'd invest in a UPS.



> I was thinking about ZFS on top of HAST and carp to switch between the machines if things drop out.


I wouldn't use HAST or even CARP. Email isn't really real-time so it shouldn't be a problem if the email gets queued for a few hours.


----------



## wblock@ (Aug 22, 2011)

As it says in the Handbook, HAST is mirroring, just over the network instead of local disks.  Combining local mirroring and HAST seems like overkill.  If I had to pick one for reliability, HAST might be better because the drives will be in different locations and on different power supplies.  (Different rooms or buildings in case of fire/flood, connected by fiber in case of lightning strikes...)

Do the easiest things that will make the most difference first, AKA bang for the buck.  A good monitored UPS for the existing mail server.  Maybe a separate UPS for the network switches and routers leading to it.  Then consider mirroring and other forms of RAID.


----------



## Sylhouette (Aug 22, 2011)

Thanks for the pointers, you both mentioned the UPS, like we did not use that. 

But in this case, the UPS gave way.
Just dropped dead.

Nothing we could do about that.
All was fine, we even replaced the battery 3 months earlier.
Stats and logs prior to the fail were all clean.

We have 2 of these, and the mailserver happend to be on the one that failed.
The relay server was on the other, so there was no mail loss, only the mail service could not be provided anymore.
Another solution is to put in a redundant power supply, so we have ordered that and the machine is now running with one PS to UPS1 and the other to UPS2 
But the server itself could fail also, Motherboard, memory raid controller, you name it.

I know mail sits in the queue, but mail is become way to important, orders come in and go out and so on.
I remember the days, when i came along and told the employees i am going to configure there e-mail.
What!! no i do not want that, it distracts me and i do not want that stuff it gives me only garbish, was there first reaction then..

Now these same fellows are the first to call to tell me there is something wrong. :e

That is the problem, most folkes needs mail 24 hour a day now.
I have trouble updating the darn thing if takes more than 30 minutes, even in the weekend.

Maybe we need an other solution, this sytem was a 5.0 machine at first, and upgraded to 8.2 now.
Swapped disk somewhere down the road to higher capacity, swaped the server it self down the road!
We did go from courier to dovecot all from the old install from 5.0 

It is a server with history.

Gr
Johan Hendriks


----------



## AndyUKG (Aug 22, 2011)

Hi,

  I have setup a standby mail server using ZFS with ZFS send/receive and MySQL replication for SMTP/IMAP/POP3 config (Exim and Dovecot). Replication runs once an hour, though I could easily run this every few minutes if I wanted. If the first server goes down I have a live copy of the data and of the MySQL database on the standby. Works for me...

cheers Andy.


----------



## SirDice (Aug 22, 2011)

Sylhouette said:
			
		

> I know mail sits in the queue, but mail is become way to important, orders come in and go out and so on.


You haven't instructed them properly. There's absolutely NOTHING in the email protocols that will guarantee an email is delivered within 5 minutes. It can actually take up to 5 days before the email will get bounced. It's people's expectations that need changing. Thus, as long as it's delivered within 5 days everything works as expected.



> I remember the days, when i came along and told the employees i am going to configure there e-mail.
> What!! no i do not want that, it distracts me and i do not want that stuff it gives me only garbish, was there first reaction then..
> 
> Now these same fellows are the first to call to tell me there is something wrong. :e
> ...


Again, this boils down to expectations. So what if an email is delivered in 60 minutes instead of the usual 5. Even if your mailserver pushes it out within a minute the email can still take days before it reaches it's final destination.


----------



## wblock@ (Aug 22, 2011)

Sylhouette said:
			
		

> Thanks for the pointers, you both mentioned the UPS, like we did not use that.



Well, you didn't say, and often companies do have critical systems without power backup.

Contacting the UPS people and asking why their expensive box failed is worthwhile, might expose a problem or at least be worth a credit.  A regular test of the UPS would be good.  Not a self-test, but something physical like throwing an upstream breaker.

After thinking about it a bit more, mirroring alone would not have prevented the outage.  To protect against that, HAST plus CARP may in fact be the way to go.  Factorial combinations of local mirrors and HAST... well, it'd probably work.  Whether it would be more reliable is hard to tell.

There are probably more HAST users on the mailing lists.


----------



## Sylhouette (Aug 23, 2011)

Thank you all.

I wil set up a test envirement with 2 server hast and on top of that ZFS.
Is a nice project to start with.
Now with a second power supply, and a backup server as a standby, we can cope with the worst case senario of a server dying.
In case of trouble, it is out of order for some time, and those are the facts.

[OT]
Educating my users is a no go.
They have no idea how things work, they come in and start typing there password.
He it ask for it again, well try again.
Hmm strange well here is my password again anoying computer.
Strange still no go.
Ooh wait there is another username on top.
Later i get called by the other user, i can not log in my account is disabled.

And that goes on and on.
Some of them call me at least once a month that all there word documents are lost.
really all of them, and then again i can tell them that you cannot see and open word documents within Microsoft Excel.

It is hopeless, in there opinion it just must work, and if it is not, the computer is to blame.

Well i can open a topic about this, and can fill up an 10TB zfs volume with stories i guess 

Most of the times i can laugh about it, but sometimes i really can not cope with it and get really frustrated.

[/OT]
regards,
Johan


----------



## SirDice (Aug 23, 2011)

Sylhouette said:
			
		

> [OT]
> Educating my users is a no go.


Well, it sounds to me like they could actually use a little education.



> It is hopeless, in there opinion it just must work, and if it is not, the computer is to blame.


It's usually more like PEBKAC :e


----------



## Sylhouette (Aug 27, 2011)

PEBKAC it is 

Well lets move on.

The thing i want is the following.

Master server.
This would be running all the services.

Slave server.
This will also run all services, but mainly sits as resque server.

The most safe thing should be as following

Master server fails, the slave server takes over, and become master.
Now if the master server comes up for whatever reason, it would be safest thing to not make it master again, but wait for human interference to switch the roles back.

This to prevent a possible split-brain of the hast.

How must i configure my carp interfaces.
Is ifstated the solution for this.

regards,
Johan


----------

