MXroute Outage?

armandorg · October 8, 2019, 8:17pm

@Jarland seems like internal server error, check logs it says.

Mason · October 8, 2019, 8:35pm

Over here might be a good place to debug further -

Jarland · October 8, 2019, 8:48pm

Outage in Vegas took out the slab mounted to /home. Stupid me didn’t try loading the good ol’ stallion panel before rebooting it, and that slab is in fstab. So Longhorn is down for the duration of the event over there.

Friendly · October 8, 2019, 9:10pm

How long have the outage been? Most data centers have a UPS(s) that lasts for anywhere between 5-15 minutes allowing the diesel generator(s) plenty of time to get up the speed. Assuming they are regularly tested for reliability.

Then those generator(s) are normally hooked up to 24-48 hours fuel reserves. To keep the flow going long enough until the grid to be restored in most cases (even where I lives where the power company is garbage it doesn’t takes them 24 hours to restore power, even when we got hit quite hard by Sandy).

Jarland · October 8, 2019, 9:12pm

Looks like about an hour now, but i’m just judging by this thread as the actual VM didn’t go down until I rebooted (not that it mattered, /home was busted since the storage node was down). I think I just learned the value of monitoring the storage separately.

Jarland · October 8, 2019, 9:12pm

Fran’s word:

No, it seems fiberhub lost a strip and one of my switches wasn't dual cabled.

Friendly · October 8, 2019, 9:14pm

I would switch if they cannot keep the lights on for even under a day. Most budget data centers will at least the power going to surpass all but very extreme storms that get the grid out of commission for days on end. Which would requires priority refueling agreements (typically just below hospitals and such or inline of them).

Jarland · October 8, 2019, 9:14pm

I mean it’s Fran though

Friendly · October 8, 2019, 9:15pm

I see now, I saw the updated reply as I was replying. That made sense now. I thought the DATA CENTER lose power completely. Not that the B feed wasn’t hooked up on that switch.

Neoon · October 9, 2019, 8:31am

Well, lets say it that way, when I monitored a few BuyVM slab’s i noticed more downtime compared to other VPS providers. So I ditched them.

Even a cheap russian vps had more uptime then LA/LUX.

Solaire · October 9, 2019, 8:46am

Has been pretty solid for me actually. Uptime is > 99.9% for both LA and LUX measured over the last 6 months (don’t have any data prior to that).

Neoon · October 9, 2019, 9:03am

No idea how you messured it and how often.
I was actually kinda excited for the anycast dns, but as the uptime turned out worse, I ditched it.

Solaire · October 9, 2019, 9:06am

HetrixTools / PHP server monitor from NA and EU location measuring every minute. SmokePing running at 9 boxes from all around the world and UptimeRobot measuring every 5 minutes.

Friendly · October 9, 2019, 9:29am

Indeed, you can probably gets similar results by leasing an atom/trash Xeon from a data center and chucking the biggest HDD(s) they got in store. If your lucky you can shove drives that are 12TB or even larger per bay.

Or… just get an auction server or something.

aaronstuder · October 9, 2019, 12:31pm

Can you share your HetrixTool report?

Solaire · October 9, 2019, 12:41pm

Sure. I’d rather not make it public though, so here’s a screencap:

armandorg · October 9, 2019, 1:21pm

Looks good enough.

Also things with mxroute /fran were fixed within a couple of hours, no big deal since it was night aswell.

We still love @Jarland

Francisco · October 9, 2019, 8:25pm

Sorry.

The power stuff was fixed quickly, but it fried a switch in the process so we had to wait for that to be recabled.

The switch that went out provided connectivity to both Billing and Stallion so when things came back I couldn’t mass boot people since Stallion wasn’t alive.

An infiniband switch got banged up which cut off some nodes. We’ve already asked the DC to fix this issue since they were the ones that didn’t cable it properly when we had the B side installed.

I wouldn’t mind knowing a time when you checked that. We had issues in LU in the spring due ransom DDOS attacks. We discussed this on LET and over time the guy realized he wouldn’t get a dime out of us.

We had issues during the initial launch of slabs due the very same power issue we just had yesterday. The Christmas outage was my fault since I should’ve had A+B setup in that cabinet but I didn’t finish it in time. It was there this time so it was fine.

If you ever want to give our slabs another test run, let me know. I’ll give you a chunk of space for a few months so you can feel it out.

Francisco

Neoon · October 9, 2019, 8:32pm

It could be that it was the result of DDoS attacks, happens sometimes.
Maybe I get some time to buy Slabs and check it again.

Harambe · October 9, 2019, 8:43pm

Take him up on the free trial, I have no complaints about slabs. Someone who hammers theirs harder said it performs better than DO/Vultr SSD block storage.