
Boyan Ivanov is co-founder and CEO storage pool Since November 2011.
Getty
Rising energy prices directly impact cloud builders such as managed service providers, managed service providers, cloud service providers, enterprises and SaaS providers. Because many of these companies operate their own on-premises cloud infrastructure, they have been hit with rising overhead and derailment of TCO forecasts.
This trend has forced many companies to review and optimize electricity usage by shutting down parts of their infrastructure and seeking other cost optimization strategies. In this article, we outline how cloud providers can optimize their operating costs from a hardware perspective.
hardware planning
Hardware components are often considered “cash consumers” because, in addition to upfront costs, each component has operating expenses such as electricity bills. So doing the same job with less hardware can significantly reduce costs and free up cash for other purposes.
While dual socket servers (servers that support two CPUs) are common in data centers, if a single socket server can do the job, dual sockets are not needed. The presence of a second CPU tends to only result in increased power consumption (by the CPU itself, the link between the two CPUs, etc.) and can quickly exhaust the power limit of a single rack.
However, if the chassis requires more CPU cores than a single unit can provide, a two-socket architecture may be more power efficient than two single-socket servers. This is because dual socket nodes still use a single chipset on the motherboard, the same NICs, GPUs, coolers, etc.
Unless mandated by disaster recovery RPO and RTO goals, having two nodes running is an example of a pure expense with no value. Due to the doubling of hardware components (power supplies, coolers, NICs, etc.), two half-loaded physical servers will consume more power than one physical server that is 85% to 90% loaded. If the workload and software allow, if the first node fails or needs maintenance, or if additional resources are needed to handle peak loads, the second node can be shut down and ready to resume work. Of course, there is a trade-off between the startup time of an offline node and the power the same node uses to stay idle.
For larger enclosures, enclosures designed with a modular system offer high density, performance, efficiency, and cost-effectiveness. Such systems share some hardware (eg, power supplies) and provide rack space savings (eg, eight servers in a 4U chassis).
An integral part of providing hardware planning is helping customers design their cloud with the right hardware for their specific use case. In the early stages of every project, always ask the customer what components they plan to use for their specific use case (e.g. database, IaaS service, web hosting, etc.), not only to confirm hardware compatibility or check for known issues, but also to help to select components. By using less and right-sized hardware to achieve their goals, users can also ultimately achieve greater energy savings.
hardware utilization
Acquiring new silicon, computing, networking and storage resources to achieve their hardware planning goals can be time-consuming due to the current shortage of components in the market and the significant increase in lead times. This is the period when the cloud builder’s business suffers or cannot grow due to lack of resources.
this market This situation is forcing companies to rethink and optimize their hardware utilization to achieve energy savings goals with existing hardware. “hardware “Packaging” is a set of methods designed to run the most valuable workloads with the least amount of hardware resources. There are generally three considerations.
• Get rid of unused hardware components. Why spend effort and money on a RAID controller when the server has only NVMe devices on the PCIe slot or only one boot device? Why buy and “feed” a dual port 100GbE NIC when the motherboard’s integrated 2×10 GbE NIC is more than adequate for the job? If a 100 GbE NIC is unavoidable (eg, already purchased or integrated into the motherboard), connecting it to a 10 or 25 Gbps network will use less power. For a single server, the difference isn’t huge, but for a fleet of hundreds of nodes, the difference becomes significant.
• Concentrate workloads on fewer servers. Servers without bottlenecks can take on additional tasks, so workloads in the cloud can be concentrated on subsets of nodes. No-load servers will use less power when idle or even powered off until they are required to take on peak loads. If resources are sufficient, packing 20 virtual machines on a hypervisor is more efficient than spreading them across two, three, or more nodes.
• Develop workload schedules and plan required resources for them. During the workday, developers and QA may need thousands of virtual machines or containers to do their daily work, but such a fleet does not need to be online and draining power on weekends. Nightly builds and tests can be run on the same set of nodes that generate daily reports during business hours – there is no value in having spare resources online 24/7. Dynamic scheduling of the cloud management platform used can help here. Depending on the level of automation, these schedulers can stop unnecessary workloads and free up used resources based on various criteria. Ideally, they could automatically migrate and package workloads on just a few servers, freeing up the rest and hibernating the unloaded servers.
in conclusion
In the past, the way to build cloud infrastructure was usually just to buy lots of RAM, CPU cores, and storage. Today, this is no longer so easy.Due to difficulty in purchasing hardware As electricity grows, it’s a more cost-effective way to build a more efficient cloud from the start. Working with the right vendor to build a right-sized cloud capable of running demanding applications is an ideal way to control costs caused by hyperinflation or constrained energy supplies.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs, and technology executives. Am I eligible?