Amazing things can happen…

…for those who dare!

Last year, while still busy with a full time job, a family and the pandemic, I embarked in an unknown and uncharted adventure: I decided to co-author a book with one of my favourite customers – and friend- Giuliano.

After 6 hectic months, I am proud to announce my new book “The Road to Azure Cost Governance” will be out on 18th Feb. Writing it has been an awesome experience. Sometimes when I read passages of it, I still think “I didn’t know I had this in me”, which I am told is part of the writing process.

If you are curious, willing to learn something new, or simply looking for some guidance on Azure Cost Governance and Optimization, this book might be for you!

The Road to Azure Cost Governance | Packt (packtpub.com)

I had so much fun writing it…I hope you’ll have fun reading it as well 🙂

Paola

Advertisement

The Carbon Monkey

How principles of Chaos Engineering and using carbon monkeys to simulate real-life energy events help us achieve our sustainable software engineering goals.

Photo by Singkham on Pexels.com

According to Principles of chaos engineering, Chaos Engineering is the discipline of experimenting on a system in order to build confidence in that system’s capability to withstand turbulent conditions in production. I have followed this discipline through the years finding it fascinating, especially when applied to large scale applications and systems. As the site explains:

“Even when all of the individual services in a distributed system are functioning properly, the interactions between those services can cause unpredictable outcomes. Unpredictable outcomes, compounded by rare but disruptive real-world events that affect production environments, make these distributed systems inherently chaotic.

We need to identify weaknesses before they manifest in system-wide, aberrant behaviors. Systemic weaknesses could take the form of improper fallback settings when a service is unavailable; retry storms from improperly tuned timeouts; outages when a downstream dependency receives too much traffic; cascading failures when a single point of failure crashes; etc. We must address the most significant weaknesses proactively, before they affect our customers in production.

We need a way to manage the chaos inherent in these systems, take advantage of increasing flexibility and velocity, and have confidence in our production deployments despite the complexity that they represent. An empirical, systems-based approach addresses the chaos in distributed systems at scale and builds confidence in the ability of those systems to withstand realistic conditions. We learn about the behavior of a distributed system by observing it during a controlled experiment. We call this Chaos Engineering.”

Build a Hypothesis around Steady State Behavior

Let’s start with the first step: a steady state behavior is the condition our application should aspire to be in. If we translate this principle into a sustainable one, this becomes the most beautiful and efficient state of an application: one where no energy is wasted, and efficiency and performance is at its best.

Call for more “carbon monkeys”

The most difficult part is how to measure and set this initial state. My colleagues have shared numerous ideas on the Sustainable Software Engineering blog that might help you jumpstart your measurement. However, I feel that at some point, this will have to reach a standardized and widely accepted form where we have a “carbon limit” where an application is considered inefficient and not sustainable.

Vary Real-world Events

This is the principle that represents how close chaos engineering and sustainable software engineering are. There is no steady and predictable flow of energy coming from the same renewable source. From the challenging big picture of using solar, wind or hydro energy down to when we plug our device into the outlet, we still have limited ways to retrieve exactly how the energy that is powering the device is produced in that exact moment in time. Doing so precisely requires considering things like seasonality, time of day, peak hours as well as weather conditions that trigger renewable power supplies usage. The variables around this concept are too many!

Imagine now that your application is running on a virtual datacenter where you have even less information of its carbon impact. We still need to start somewhere, though, and set an amount of carbon usage for the application. This will be useful to measure its increase and decrease to drive efficiency.

Back to chaos engineering. Simulating power outages is just a start. We can think of it as the starting point for a sustainable application:

  • What if the renewable power sources are suddenly unavailable and therefore, I have spikes of energy consumption that I could not foresee even in the greenest application?
  • What if at some point my application has become a “carbon monster,” greedy with energy because a query has gone wrong and it’s suddenly taking most of its energy just to search for that item in your cart? Or because at some point the network path has changed due to an outage in the network route and its latency spikes? And so, trying to replicate real-life energy events directly into an application will make it more resilient to lower energy availability and overall, more efficient.

Enter the “Carbon Monkey”

This concept is a “carbon” monkey: a process or system that triggers energy inefficiencies at random, testing how your application reacts, and measuring differential performance that can relate to the differential carbon impact.

Instead of measuring how much energy an application consumes, we should test adding energy events to see how the application behaves and then drive change to improve its reaction to events that make it less green. 

We have given the problem of how to measure an application’s carbon efficiency a lot of thought. But this approach offers a change of perspective. Instead of measuring how much energy an application consumes, we should test adding energy events to see how the application behaves and then drive change to improve its reaction to events that make it less green. 

As a result, we won’t have a carbon impact exact measurement, but only a differential. With time, this differential can become an absolute number when  other systems allow us to retrieve more precise energy consumption metrics.  In the meanwhile, let the carbon monkey help us reduce impact regardless of the metric standardization!

Photo by Alexandr Podvalny on Pexels.com

Call for more “Carbon Monkeys”

I’d like to see developer communities creating one or more “carbon monkeys” that can introduce energy-impacting events into applications, to foster resiliency towards sustainability. 

The main trigger is defining a set of incorrect assumptions about energy usage that can prevent our application from performing “green”. These would include assumptions such as the highest energy cost/carbon use/region, the shortest/longest queries, the shortest/longest network paths, the highest compute and memory usage among other things. 

These assumptions should then be introduced by an automated process (our monkey) that will make sure that the application patterns are resilient enough to overcome those issues without completely failing. At the end of the run, we could set up a carbon resiliency value that can help set a standard for the application carbon impact differential evaluation.

Originally published in the Microsoft Developer Blogs

A dapper sustainability

When talking about carbon footprint of an application, we should normally consider two angles. How much energy was used to run it (i.e., number of cores, time of execution, hardware efficiency, etc.) and how much producing that energy impacted on the environment, which is called carbon intensity and depends on the location, time, and type of energy utilized (gas, coal, wind, etc.) for datacenters. A study from 2016 proved that around 55% of the consumed energy depends on the computing systems, and the remaining 45% is used for supporting the compute (cooling, ups, etc.). In addition, if 80% of U.S. small datacenters were moved to hyperscale providers, the electricity consumption usage could drop as much as 25%.

The year 2020 marked the beginning of a common global awareness in the IT world: software and applications have a footprint that must be taken in consideration, and algorithms are being developed to assess such footprints.

I have been recently exploring with a bunch of fellow Cloud Solution Architects the innovation of Dapr, a recently created open-source project for a distributed application runtime. According to its main page, “Dapr is a portable, event-driven runtime that makes it easy for any developer to build resilient, stateless and stateful applications that run on the cloud and edge and embraces the diversity of languages and developer frameworks.”(Find additional information) The main dream-like features of Dapr are indeed its simplicity of implementation and its ability to work across any programming language, framework, and infrastructure.

In a recently released Dapr book, there is a specific passage on sustainability with Dapr, which prompts developers to start thinking about sustainability when they approach their software architecture. While figuring out the way Dapr works, I have tried to apply the sustainability angle to it, and found out three main aspects where a green developer might want to focus:

  1. How the application can measure its own carbon impact.

This might not seem difficult, but you need to remember that in a distributed solution we can have several different infrastructure environments and programming languages. Ideally this should be done via a dedicated microservice that can continuously monitor the carbon impact across every source of energy, and that is able to feed the other parts of the application with this information. Dapr is very precise in measuring the performance impact of its infrastructure and to offer guidance on how to measure the performances used by the microservices adopting Dapr. As mentioned in the public documentation, there are ways to retrieve CPU and memory usage that can help carve out the overall carbon impact. Also, in a recursively fashion, this microservice should also monitor its own carbon impact. 😊

Image pexels ann h 2646530
  1. How the application can drive/change its carbon impact by steering its IaC (Infrastructure as a Code)

Imagine that you have a microservice that can instantly monitor the carbon impact and feed the results to any subscriber of this information. Because the carbon impact is not a fixed amount, and largely depends on the energy conditions of the datacenter where the infrastructure resides, the application can have an automation that triggers the move of all or parts of the infrastructure towards less impacting sites or regions. This might not be immediate or even feasible for some regions (think of data sovereignty and latency, for example), but where applicable, the result would be a highly optimized sustainable infrastructure for the distributed application, that can guarantee to run in the less impacting infrastructure (and probably the cheapest, for the same reason) at any given time. As a side effect, this adds to its resiliency as the application can span several different environments.

  1. How the application can drive/change its carbon impact depending on the user behavior.

This is an Sustainable Software Engineering technique that requires special attention because it involves the user experience and the education of users towards sustainability. The application can provide different levels of features depending on the carbon impact of such features (as measured by the above mentioned microservice) and offer a diversified user experience, according to the impact level of the feature, leaving the informed choice to the end user. Have more time to spare? Why don’t you try this slightly-higher-latency level which saves xx% of carbon by using a greener infrastructure? Do you really need to load all high-res pictures? Try the low-quality site for some additional carbon saving, and so on.

This obviously presents some overhead on the programming side, but the beauty of its execution is that the application will define and educate the end users to have a saying on their impact, and after a while by monitoring the choices users made, you can also have feedback on how they want to interact with your software. With time, you’ll have a clear picture of which combination provides a better trade-off of performance and energy savings. Dapr favors asynchronous architectural patterns, especially relying on a publish/subscribe interaction between microservices. Handling requests with this approach by scaling out and in can achieve the best compromise between dynamic response to user demands and control on the resources we want to provide for our workload.

Since Dapr is a open source and community-driven project, this is a call for action for developers to create an explicit branch of it dealing with the green impact of a distributed application: a blueprint to measure, control and optimize the energy efficiency of the microservices that cannot be left out of a modern innovative software architecture.

originally published on the Microsoft Tech Blog

Reducing the carbon and financial cost of your cloud applications

your CIO surprised by the cloud bill

In my job as a cloud architect working with large enterprises, there has always been a specific “moment of truth” when the customer realizes that cloud costs are something that needs to be monitored daily to avoid unpleasant surprises with the end of the month’s bill.Enterprises can take several steps toward understanding cloud costs and avoid surprises:

  • establish control and governance.
  • analyze possible savings.
  • plan the required changes to reach identified savings.

The main goal of cloud governance is how to bring down the monthly billing or to keep it level, while absorbing the usual growth or the new projects. But what if we looked at cost optimization from a different angle? At Microsoft we can calculate the saved carbon emissions using tools, such as the Microsoft Sustainability Calculator, so, what if our efforts were drawn towards “carbon+cost efficiency” rather than simply cost saving?

The pillars of cost control

Depending on the cloud maturity and model adopted by each customer, a few fundamental actions that are directly linked to cost (and carbon) saving can be taken.

Right-sizing

Right-sizing is about understanding exactly what your applications need and not provisioning due to guessing. But wrong-sizing has a much higher cost than the monthly bill: trying to mirror the exact sizing of your on-prem infrastructure without applying right-sizing can lead to larger monthly bills, reduced capacity for other customers, and unnecessary electricity usage. Right-sizing is not just picking a VM size for a workload, but planning to change the sizing during the day according to its workload and the carbon efficiency of the region where it runs. For a PaaS environment, it can mean changing a service plan according to the time of day and/or expected usage. Sometimes companies can even try and set usage expectations by clearly giving an informed choice to their internal customers and end- users that their choice in an application can lead to less carbon impact, offering a green option for the applications where users can swap to less features or even a slower connection but knowing they are polluting less.

Reservations

Reservations are typically a commercial discount on cloud services in exchange for a yearly or multi-yearly commitment. However, this saving does not necessarily equate to a green usage of the reserved resources: the cloud provider is happy for the commitment and the customer is happy for the discount, but reservations are not necessarily a greener option. To ensure your reservations are minimizing the electricity usage, you need to frequently monitor reservations and try to use them at the maximum utilization. Anything lower than 99% utilization must be right-sized.

Cleanup

Cleanup is the most difficult part of cost management, especially within large organizations and cloud deployments. When you have thousands of VMs and applications running, it’s quite difficult to scour the cloud tools for any inconsistency. For example, if you delete a VM and forget to delete its disks or IP address, those resources will continue to run, impacting your monthly bill as well as your carbon footprint. So, cleanup should be the first choice of carbon+cost efficiency. This is typically a task that can be automated and is included in most optimization tools, such as Azure Cost Management.

Scheduling Operations

The last pillar of carbon+cost management is scheduling operations. This means being able to switch off and on again most of your servers/applications/services. With PaaS, it could be just changing the service tier to a less costly one during off-peak hours. Initially, many customers are hesitant to switch off applications on a schedule. However, in my experience, if customers test scheduling on a small number of applications, they quickly see the cost benefits and are open to scheduling more of their applications. The first step is to enforce any type of scheduling. It could be just turning the application off on the weekends or nights.

Photo by Pixabay on Pexels.com

Once customers start understanding the cloud operations, what they should be thinking next is: “why keeping a valuable and costly resource switched-on even for just one hour, if it’s not used?” The carbon+cost efficient scheduler must consider:

  • The time it takes to bring the application down. For example, if the whole reboot takes one hour, then the timeframes planning cannot be less than two hours.
  • The real usage of the workloads. For example, if there is only one user logged into the portal at night, the user may consider not offering that service or application at night.
  • Scheduling should not be just on-off option, but also include reducing PaaS service plans and tiers of resources, where possible.

Carbon+cost management should become a new standard that involves the entire company to be more efficient and greener. Application owners and developers should be involved in keeping infrastructure as close to carbon neutral as possible by starting with cleanup, scheduling and right sizing, and then to proceed to include the principles of sustainable software within the application itself. It’s important to look beyond the raw performance and financial costs of your infrastructure and start considering the ethical costs of how much carbon your infrastructure is emitting. Visit Azure’s cloud cost optimization page for cost optimization techniques and suggestions, as well as this blog periodically to preview our new tools and ideas.

originally published in the Microsoft Tech Blog

Sustainable cloud native software with serverless architectures

Living in Milan, I have had to deal with extraordinary air pollution values since December 2019, in some days controversial graphs compared Milan to much more densely populated and polluted cities, in China and India, at least by common perception.

Then came covid-19, and obviously our concerns moved elsewhere. Like everyone, at least in Lombardy, I was in lockdown from 21 February to 4 May. In the midst of a thousand worries, a little voice in the back of my head continued to point out that, however, suddenly, the air was no longer polluted, the CO2 levels had dropped significantly, which in short meant that an important change and with impactful results was, indeed, possible.

Fast forward to now … do we want to go back to the impossibly polluted air of January 2020? If the answer is no, then something needs to change.

First, let’s see why a change is due and important. The whole scientific community agrees that the world has a pollution problem. Carbon dioxide in our atmosphere has created a layer of gas that traps heat and changes the earth’s climate. Earth’s temperature has risen by more than one degree centigrade since the industrial revolution of the 1700s.

If we don’t stop this global warming process, scientists tell us that the results will be catastrophic:

  • Further increase in temperature
  • Extreme weather conditions, drought, fires (remember the Australian situation at the beginning of the year?)
  • The rising of the waters could make areas where more than two hundred million people live uninhabitable
  • The drought will necessarily lead to a food shortage, which can impact over 1 billion people.

To summarize, we must drastically reduce CO2 emissions and prevent the temperature from rising above 1.5°C.

Problem. Every year the world produces and releases more than 50 billion gas into the atmosphere.

CO2 emissions are classified into three categories:

Scope 1 – direct emissions created by our activities.

Scope 2 – indirect emissions that come from the production of electricity or heat, such as traditional energy sources that power and heat our homes or company offices.

Scope 3 – indirect emissions that come from all other daily activities. For a company, these sources are several and must include the entire supply chain, the materials used, the travel of its employees, the entire production cycle.

When we speak of “carbon efficiency” we know that greenhouse gases are not made up only of carbon dioxide, and they do not all have the same impact on the environment. For example, 1 ton of methane has the same heating effect as 80 tons of carbon dioxide, therefore the convention used is to normalize everything to the CO2-equivalent measure.

International climate agreements have ratified to reduce “carbon” pollution and stabilize the temperature at a 1.5°C increase by 2100.

Second problem. The increase in temperature does not depend on the rate at which we emit carbon, but on the total quantity present in the atmosphere. To stop the rise in temperature, we must therefore avoid adding to the existing, or, as they say, reaching the zero-emission target. Of course, to continue living on earth, this means that for every gram of carbon emitted, we must subtract as much.

Solution to both problems: emissions must be reduced by 45% by 2030, and zero emissions by 2050.

Let’s now talk about what happens with datacenters, and in this specific case,  public cloud datacenters.

  • The demand for compute power is growing faster than ever.
  • Some estimates indicate that data center energy consumption will account for no less than a fifth of global electricity by 2025.
  • A server/VM operates on average at 20-25% of its processing capacity, while consuming a lot of unused energy.
  • On the other hand, in an instance where applications are run using physical hardware, it is still necessary to keep servers running and use resources regardless of whether an application is running or not.
  • Containers have a higher density and can bring a server/VM up to 60% of use of compute capacity.
  • Ultimately, it is estimated that 75-80% of the world’s server capacity is just sitting idle.

While browsing for solutions, I found very little documentation and formal statements about sustainable software engineering. While talking to fellow Microsoft colleague Asim Hussain, I found out that there is a “green-software” movement, which started with the principles.green website, where a community of developers and advocates is trying to create guidelines for writing environmentally sustainable code, so that the applications we work with every day are not only efficient and fast, but also economic and with an eye to the environment. The eight principles are:

  1. Carbon. First, the first step is to have the environmental efficiency of an application as a general target. It seems trivial but to date there is not much documentation about it in computer textbooks or websites.
  2. Electricity. Most of the electricity is produced from fossil fuels and is responsible for 49% of the CO2 emitted into the atmosphere. All software consumes electricity to run, from the app on the smartphone to the machine learning models that run in the cloud data centers. Developers generally don’t have to worry about these things: the part of electricity consumption is usually defined as “someone else’s problem”. But a sustainable application must take charge of the electricity consumed and be designed to consume as little as possible.
  3. Carbon intensity. The carbon intensity is the measure of how many CO2equivalent emissions are produced per kilowatt-hour of electricity consumed. Electricity is produced from a variety of sources each with different emissions, in different places and at different times of the day, and most of all, when it is produced in excess, we have no way of storing it. We have clean sources as wind, solar, hydroelectric, but other sources such as power plants have different degrees of emissions depending on the material used to produce energy. If we could connect the computer directly to a wind farm, the computer would have a zero-carbon intensity. Instead we connect it to the power outlet, which receives energy from different sources and therefore we must digest the fact that our carbon intensity is still always a number greater than zero.
  4. Embedded or embodied carbon is the amount of pollution emitted during the creation and disposal of a device. So efficient applications that run on older hardware also have an impact on emissions.
  5. Energy Proportionality. The maximum rate of server utilization must always be the primary objective. In general, in the public cloud this also equates to cost optimization. The most efficient approach is to run an application on as few servers as possible and with the highest utilization rate.
  6. Networking. Reducing the amount of data and the distance it must travel across the network also has its impact on the environment. Optimizing the route of network packages is as important as reducing the use of the servers. Networking emissions depend on many variables: the distance crossed, the number of hops between network devices, the efficiency of the devices, the carbon intensity of the region where and when the data is transmitted.
  7. Demand shifting and demand shaping. Instead of designing the offer based on demand, a green application draws demand based on the energy supply. Demand shifting involves moving some workloads to regions and at times with lower carbon intensity. Demand shaping, on the other hand, involves separating the workloads so that they are independently scalable, and prioritizing them to support the features based on energy consumption. When the energy supply is low, therefore the carbon intensity is at that time higher than a specific threshold, the application reduces the number of features to a minimum, keeping the essential. Users can also be involved in the choice by presenting the “green” option with a minimum set of features.
  8. Monitoring and optimization. Energy efficiency must be measured in all parts of the application to understand how to optimize it. Does it make any sense to spend two weeks reducing network communication by a few megabytes when a db query has ten times the impact on emissions?

The principles are generic for any type of application and architecture, but what about serverless?

Serverless applications are natively prone to the optimization of emissions. Since the same application at different times consumes differently depending on the place of execution, demand shifting is a technique that can be easily applied to serverless architectures. Of course, with serverless we have no control over the infrastructure used, we must trust that cloud providers want to use their servers at 100% capacity. 😊

Cost optimization is generally also an indication of sustainability, and with serverless, we can have a direct impact on execution times, on the network data transport, and in general on building efficient applications not only in terms of times and costs, but also of emissions.

The use of serverless brings measurable benefits:

  • The use of the serverless allows for a more efficient use of the underlying servers, because they are managed in shared mode by the cloud providers, and built for an efficient use of energy for optimal data center temperature and power.
  • In general, cloud datacenters have strict rules and often have ambitious targets for emissions (for instance, Microsoft recently declared its will as a company to become carbon negative by 2030). Making the best use of the most optimized resources of a public cloud provider implicitly means optimizing the emissions of your application.
  • Since serverless only uses on-demand resources, the server density is the highest possible.
  • Serverless workloads are ready for demand-shifting / shaping executions.
  • From a purely theoretical point of view, writing optimized and efficient code is always a good rule of thumb, regardless of the purpose for which you do it 😊

Developers can immediately have an IMPACT on application sustainability:

  • By making a program more accessible to older computers.
  • By writing code that exchanges less data, has a better user experience and is more environmentally friendly.
  • If two or more microservices are highly coupled, by considering co-locating them to reduce network congestion and latency.
  • By considering running resource-intensive microservices in a region with less carbon intensity.
  • By optimizing the database and how data is stored, thus reducing the energy to run the database and reducing idle times, pending completion of queries.
  • In many cases, web applications are designed by default with very low latency expectations: a response to a request should occur immediately or as soon as possible. However, this may limit the sustainability options. By evaluating how the application is used and whether latency limits can be eased in some areas, reducing further emissions can be possible.

In conclusion, I am convinced that serverless architectures, where properly used, are the future not only because they are beautiful, practical and inexpensive, but also because they are the developer tools that today have the least impact on emissions. With the help of the community, we can create specific guidelines for the serverless and maybe even an “carbon meter” of our serverless application, which in the future could also become “low-carbon certified”.

COVID-19 was an inspiring moment in terms of what we managed to do on a global level: all the countries stopped, all the flights, the traffic, the non-essential production. We know that something can be done and that this is the right time to act: rebuilding everything from scratch, it is worth rebuilding in the right direction.