Selection of components for a test measurements
Decision: We are measuring software libraries and IDEs. Approach: Define selection criteria for the libraries, chose a methodology for measuring the library or IDE, where do we measure (compiler, operating system level, etc.)
Selecting the test subjects
Measurement approaches
- For libraries, we have chosen to use the unit tests of each library as a usage scenario. This allows us to record the power usage for each test run. A test suite in many cases covers > 95% of the code of the libraries and its thus safe to assume that running the full test suit can be equaled to ‘running all of the libraries code’
- For IDEs, which are desktop software, we are going to use the existing Birkenfeld approach, defining a manual usage scenario (e.g. run compile, edit some code) and measure the underlying power consumption. We will attempt to automate this ‘click and measure process’.
Selection criteria & matrix of libraries
Justification
In order to ensure a rather random approach to selecting software components, we simply used the list of most popular libraries for each programming language from Github. As Github represents the majority of open-source code repositories worldwide, the popularity ranking can be considered representative. This has the added the side-effect that the measurements of energy-use for these is relevant to a very large audience of developers who are already using these components.
Further, these popular components often have a large community of contributors who are actively working on the software code of the component. This means that rapid improvements can be made, thus there is potential users of our tooling & energy-measurement who can use it to reduce the total energy consumption of the component, which in turn could impact thousands of software applications using that component.
Technically, we set a few more conditions in order to be able to perform our tests and that our experiments are easily repeatable:
- Automated test coverage is available. The component must have at least functional tests that can be executed. Without those our approach would not work. Luckily the majority of open-source components analyzed have tests.
- Open source license. It would be difficult for us to publish results on software components which can not be improved by a community and it would further not be sensible to contribute to proprietary component-development using public funding, therefore the component must have an open-source license
- No user interface. The piece of software that we are analyzing should really be a component, not a tool or application itself. That’s why we chose to exclude any open-source tool that includes a user-interface (which would also bring us back to the need for defining standardized scenarios for each interface)
- Tests are passing. Optional, but we still paid attention that the majority of the components we are selecting have a test-suite that not only exists, but that is also passing (meaning the code is functional).
- Container-based test suite. Optional, to increase the repeatability of the tests its easier if the component contains the instructions on how to set up the development environment so that the tests can run (e.g. any dependencies, or environmental configurations). This is a good practice, yet not that common, therefore its not an exclusion criterion.
- Update-to-date. The component should have an active community, therefore we make sure there has been contributions to the component over the last 3 months (which turned out as a non-issue for the popular components)