Saturday, 24 August 2013

Questions About DRBD

Questions About DRBD

Backround: We are in need of a HA server in a small office environment and
are looking at DRBD to provide it. We only have about 100GB that needs to
be on the HA server and server load will be extremely low. The data will
probably increase about 10%-25% per year if we archive older office data,
and 50%-75% each year if we don't.
Point is we use a mix of consumer grade and used enterprise grade hardware
which WILL be a problem if we don't preemptively plan for it; and
pre-built quality servers DO fail, so redundant servers seems like the way
to go.
The Plan: We are thinking it would be good to find (2) of the best
bang-for-our-buck used servers and synchronize them. We simply need
SATA/SAS capable servers and space for as many drives as can be had for
the price. These servers seem like they can be had for $100-$200 (+some
parts and additional drives) if you catch a deal.
This would theoretically mean a server could fail and if we took days to
get to it, as long as we didn't have another coincidental failure, things
would still hum along until our IT department (me) could get to it. We
would use Debian as an OS.
Some Questions
(A) How does DRBD handle drive or controller failure? That is This shows
DRBD before the storage driver, so what happens when the controller fails
and writes dirty data or the drive fails but doesn't crash immediately? Is
the data mirrored to the other server or not and is there risk of data
corruption across servers in cases like these?
(B) What are the fail points for DRBD; that is theoretically as long as
one server is up and running there are no issues EVER. But we know that
there are issues so what are the fail modes using DRBD since most of them
should theoretically be software?
If we are going to have two servers for this, would it be reasonable to
run VM's on each with MYSQL and Apache for database and web server
replication? (I am assuming so)
Is DRBD reliable enough? If not, is the unreliability isolated to certain
tasks, or is it more random. Searching turned up people with various issue
but this IS the internet with seemingly more bad info than good.
If data is being synchronized over LAN, does DRBD use double the
bandwidth? That is, should we double up on NICS and do some link
aggregation and trunking? Then maybe put them on separate routers on
separate circuits and UPS's in separate rooms and now you really have some
redundancy!
Is this too crazy for an office in terms of server management? Is there a
simpler REALTIME alternative (granted DRBD seems simple in theory).
We already have a server. So it seems to me a second USED server with a
dedicated drive for DRBD could easily be had for around $150-$250 with
some smart shopping. Add a second router, more drives, more NIC's (Used),
and (2) UPS's and were talking $1,000 +/-. That is relatively cheap! And I
am hoping this would mainly buy us time during a server fault. Drive
failures seem like the easier thing to handle with RAID these days. It's
other hardware failures like controllers, memory, or power supplies that
might require downtime to diagnose and fix that are the concern.
Redundant servers for us means used hardware becomes more viable with more
up time and more flexibility for me to fix things when my schedule allows
vs having to stop everything to repair the server.
Hopefully I didn't miss that these questions have easy searchable answers.
I did a quick search and didn't find what I was looking for.

No comments:

Post a Comment