Entropia Support News: Server issues

Discussion in 'Entropia News' started by EP-Newsbot, Aug 1, 2014.

  1. July 26,
    Entropia Universe had an unexpected production disturbance. It turned out that the root cause of the disturbance was hardware issues that resulted in failing servers. The failing hardware has now been removed. In the meantime we are running EU with reduced HW resources and that is the reason for the exceptional lag.

    Replacement hardware is scheduled for delivery beginning of next week. With the new and more powerful servers installed, end next week at the latest, we should see a considerable improvement of the lag situation.

    We are very sorry for this inconvenience and will get back to you when we have more information.

    Klas Moreau

    CEO MindArk


    Source
     
    Last edited by a moderator: Aug 1, 2014
    • Like Like x 2
    • Thanks Thanks x 2
  2. Cheers and thanks for the updates.

    26thJulyserveroutage.jpg
     
    Last edited: Aug 2, 2014
  3. I am wondering now a little about the way the HP Servers are setup.

    I am assuming that the c7000 system was selected over its little sister the c3000 for the scale of resources required to run EU across the various planets and that a dual-star style topology was used ? Or was it ?

    The economy C class HP Bladesystem is capable of ~ 5 Tb on the mid-plane fabric and with comments from Klas saying reduced resources my guess is that one (or more though highly unlikely) of the proliant server modules died and no fall-over hardware was configured in stand-by to take over the role in event of failure. (Just assumptions base on what Klas has provided).

    The mid-plane did not fail (Assumption based on the news from Klas) which would wipe out all blades and modules in the chassis and this mid-plane fabric usually can not be removed from the chassis in most of the low end/economy class bladesystem models available on the market.

    Never the less, overall the system is not what you would consider the cheapest route possible, though it is one of the economy-best-bang-for-buck systems of the day. Maybe they were trying to lower TCO but went a little to far in not configuring a dual-star topology with redundant modules in stand-by mode.

    I am not sure that this system is capable of load balancing between modules with one of the blade server modules picking up the entire workload if the other blade server module fails. Don't quote me though, I would have to start reading the whitepapers on the HP Bladsystem c-class range and associated Proliant server blades;

    It is not something that interests me these days, especially when you have the appropriate design engineers at both Benzler as HP's partner in Sweden and properly ? qualified staff ? at MindArk's EUSO AB working in tandem on the final solution.

    It is obviously not grey channel with Benzler but by the same token, what maintenance contract did MA (EUSO AB) take out to support failing hardware over the years; As HP would definitely not take more than 1-3 days to have have the hardware available to them to swap out (Even in Sweden).

    So MindArk have taken the route to upgrade/replace hardware by the sounds of it. I wonder which Proliant server blades they have chosen this time around; though this still does not negate the fact that in lieu of the new upgraded hardware coming from HP/Benzler that the current one should be made available within the terms of the maintenance contract (If EUSO AB paid for this level of service) while the new hardware arrives.

    It all just doesn't gel to well.

    It is likely to be one of the following problems:
    1. Not the best choice of topology in the design in that it did not allow for standby blade fall over. (Poor Design to cut on costs or lower TCO when they should not have)
    2. Ongoing maintenance contract was not sufficient to support the mission critical nature of the implemented topology used. (Poor Management, again skimping to save a dollar)


    Anyway, I have attached some info.
    Hopefully once rectified and when we see VU/Update notes come through from MindArk about improving server stability & improving performance and other factors relating to the such for over the period of more than a year that some of these points are taken into consideration. (As of right now, MA are looking rather stupid)

    DCB/Viper.
    PDF attached.

    Assumed system used being the C7000 series and not the C3000 attached in post attachments.

    c7000redundancy.jpg

    Dual-Star topology assumed to be used ... Or is it ?
    The other topologies offer increased performance over thie dual-star topology.
    topologyusedcounts.jpg

    How are the interconnects setup ?
    bladeinterconnectredundancy.jpg

    Redundant configurations ?
    The B c-Class enclosure minimizes the chances of a failure by providing redundant power supplies, cooling fans, and interconnect modules. For example, customers have the option of using power supplies in an N+N redundant configuration or an N+1 configuration. You can place the interconnect modules side-by-side for redundancy. In addition, you can use redundant Onboard Administrator modules in an active-standby configuration.
    The c-Class architecture provides redundant paths using multiple facility power feeds into the enclosures, blade-to-interconnect bay connectivity, and blade-to-enclosure manager connectivity. Because all c-Class components are hot pluggable, you can quickly reestablish a redundant configuration in the event of a failure.
     

    Attached Files:

    Last edited: Aug 7, 2014
  4. Some further info worth noting on the system (Still unbder the assumption the c7000 was selected):

    Our BladeSystem enclosures accommodate half-height blades, full-height blades, or both. Server blades can use single-, double-, or quad-wide form factors. LAN on Motherboard (LOM) adapters and optional mezzanine cards on the server blades route network signals to the interconnect modules in the rear of the enclosure. The connections between server blades and a network fabric can be redundant.

    The c7000 enclosure has eight interconnect bays that accommodate up to eight single-wide interconnect modules. They can be Virtual Connect modules, Ethernet or Fibre Channel switches, or a combination of single-wide and double-wide interconnect modules such as InfiniBand switches.
    The c3000 enclosure has four interconnect bays. The bays can hold four single-wide or one double-wide and two single-wide interconnect modules.

    The c-Class enclosure also holds one or two Onboard Administrator management modules. A second Onboard Administrator module acts as a redundant controller in an active-standby mode. The Insight Display panel on the front of the enclosure provides an easy way to access the Onboard Administrator locally.

    The c-Class enclosures use flexible power architecture. The c7000 enclosure uses single-phase or three-phase AC or DC power inputs. The c3000 enclosure, by contrast, only uses single-phase (auto-sensing high-line or low-line) power inputs. You can connect its power supplies to low-line (100VAC to 120VAC) wall outlets. In either enclosure, you can configure power redundantly. Power supplies connect to a passive power backplane that distributes shared power to all components.
    High-performance, high-efficiency Active Cool fans provide redundant cooling across the enclosure and ample cooling capacity for future needs. The fans are hot-pluggable and redundant.

    (General Info) The BladeSystem c-Class enclosure supports many blades and interconnect device options:
    • ProLiant server blades using AMD or Intel x86 processors
    • ProLiant workstation blades
    • Integrity server blades
    • StorageWorks storage blades
    • Tape blades
    • PCI-X or PCI Express (PCIe) expansion blades

    (General Info) BladeSystem interconnect modules support a variety of networking standards:
    • Ethernet
    • Fibre Channel
    • Fibre Channel over Ethernet (FCoE)
    • InfiniBand
    • iSCSI
    • Serial Attached SCSI (SAS)

    Star topology
    The device bays and interconnect bays connect in a fan-out, or star, topology centered around the interconnect modules. The exact topology depends on your configuration and the enclosure. For example, if you place two single-wide interconnect modules side-by-side (Figure 6a), the architecture is a dual-star topology. Each blade has redundant connections to the two interconnect modules. If you use a double-wide interconnect module, it is a single star topology, providing more bandwidth to each server blade. Figure 6b shows the redundant configuration using double-wide interconnect modules.

    All points to consider plus some other useful info.
    Cost cutting does not equal lowering TCO nor does it equal performance over QoS levels in the long run in such configurations of a mission critical nature.
     
  5. Nor Alien

    Nor Alien Wisker Fish

    I swear I posted links to these exact pages not that long ago!:sneaky: On PCF though!:cautious:
     
    • Like Like x 1
  6. hehe I saw that now :)
    I was using the yahoo news link and another which were failing, a search brought up that link. It is hard to post a digital saved copy, links that work are better :thumbsup: Good to see your on the ball. :alien:

    The message has been itching at me since it was posted by Klas; So I had to look further in to reading between the lines after Kim saying, "That is pretty much accurate" in response to Neil initially regarding running on backup hardware.
     
    Last edited: Aug 7, 2014
    • Like Like x 1
  7. Wistrel

    Wistrel Kick Ass Elf

    Wow that is SOME detail there...!
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.