Monday, November 23, 2020

Different boot and testing schemes

In the news, interesting to see the details of the M1-based Apple devices. As a firmware person, I was curious about how the machines boot. 

They mention the omission of bootcamp https://support.apple.com/boot-camp https://discussions.apple.com/thread/252021896 or the support of 3rd party OS's https://www.zdnet.com/article/top-apple-exec-native-windows-10-on-m1-macs-is-a-choice-microsoft-needs-to-make/.

Does that mean the boot paradigm of the Macs has changed from UEFI http://refit.sourceforge.net/info/boot_process.html? Apple was never fully UEFI compliant and often eschewed more complex capabilities like SMM-protected Authenticated Variables and other rich runtime services, although they did support ACPI.

The answer was formally revealed in the video https://developer.apple.com/videos/play/wwdc2020/10686 at 16:22



which mentions that the boot process is leveraged from the iPhone/iPad, namely iBoot https://en.wikipedia.org/wiki/IBoot

The native iBoot launch was confirmed by Apple UEFI engineer https://twitter.com/NikolajSchlej/status/1275951574200709120, along with using device tree https://elinux.org/Device_Tree_What_It_Is in lieu of ACPI https://twitter.com/NikolajSchlej/status/1275951827754774528. Flattened Device Tree predates ACPI and was part of how the original PowerPC MACs booted, with OpenFirmware Forth-based FCode and FDT as the equivalent of UEFI/ACPI for the programmatic and declarative runtime tables interfaces, respectively. Open Firmware was circa 1994 https://en.wikipedia.org/wiki/Open_Firmware whereas ACPI was 1996 https://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface#:~:text=First%20released%20in%20December%201996,Play%20BIOS%20(PnP)%20Specification.&text=ACPI%20defines%20a%20hardware%20abstraction,UEFI)%20and%20the%20operating%20systems., respectively. EFI, the precursor the UEFI, is the youngest of all three, dating its earliest appearance in 1998 https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface.

As Apple showed in slide 14, the T2-based Mac's already support the bridgeOS kernel going from iBoot to UEFI. This sort of layering of UEFI implementations on top of a non-PI based platform code is pretty common, with the https://github.com/tianocore/edk2/tree/master/UefiPayloadPkg providing an EDK2-based implementation of UEFI for coreboot https://doc.coreboot.org/ and slim bootloader https://slimbootloader.github.io/index.html. We discussed this in figure 6-17 of https://link.springer.com/chapter/10.1007/978-1-4842-0070-4_6 and we are trying to generalize the use of payloads in https://github.com/universalpayload. We compare a T2-based secure boot chain to other implementations in chapter 4 of https://www.amazon.com/Building-Secure-Firmware-Armoring-Foundation/dp/1484261054, too.

I suspect the disk layout is still based upon the GUIDed Partition Table (GPT), which was an invention of EFI. You can support GPT even without performing a non-UEFI OS boot, as showed by Chromebooks https://www.chromium.org/chromium-os/chromiumos-design-docs/disk-format, for example. To support a non-UEFI OS you simple can omit the EFI System Partition (ESP) and have your boot loader load a kernel from some arbitrary GUID-defined partition type.

We also show how the concept of a phase-based boot and clean interfaces to payloads enables more radical firmware stacks, such as slide 24 of https://cfp.osfc.io/media/osfc2020/submissions/SLFJTN/resources/OSFC2020_Rust_EFI_Yao_Zimmer_NDK4Dme.pdf

 


Another interesting example of stacking a Rust-based UEFI implementation on an alternate platform initialization layer can be found in https://github.com/retrage/rust-hypervisor-firmware/tree/coreboot-support, too, that I saw on the open source firmware's oreboot slack channel. 

Speaking of another A-prefixed company that's not Apple, Amazon has been doing some pretty interesting things in firmware. Former Intel colleague Mark Tuttle's work in mentioned in the write-up https://www.freertos.org/2020/02/ensuring-the-memory-safety-of-freertos-part-1.html about using the C Bounded Model Checker (CBMC) https://www.cprover.org/cbmc/ to ensure memory safety of FreeRTOS. This is a nice balance of providing better assurance with an existing type-weak language like C versus the forklift upgrade to something like Rust. We discuss CBMC a bit in chapter 21 of https://www.amazon.com/Building-Secure-Firmware-Armoring-Foundation/dp/1484261054, including an example of usage in listing 21-2. 


This work in the rebar-embossed book is nothing like the fully operationalized example of the real-time OS listed above, though. Beyond this specific work, I appreciate how Amazon endeavors to make formal methods more developer friendly, as described in the work by Nathan, Mark, and their colleagues in https://assets.amazon.science/d0/de/cbec0b4547e3ae7ff077f8aa978f/code-level-model-checking-in-thesoftware-development-workflow.pdf. This type of analysis is much more relevant to my day-job than the former treatise on FM they wrote regarding TLA https://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-services-uses-formal-methods/fulltext, although some of Hillel Wayne's https://www.hillelwayne.com/post/why-dont-people-use-formal-methods/ work may help build a bridge for me. More examples of the good stuff published by their engineers/'builders' that I also referenced in my last blog posting.

And formal verification of system software is popping up in other places, such as the mention "Formal verification of the code running at EL2" in https://linuxplumbersconf.org/event/7/contributions/780/attachments/514/925/LPC2020_-_Protected_KVM_.pdf. I haven't dug into this too much, but I wonder if it is aligned with Google Project Oak, including the Hafnium hypervisor https://hafnium.googlesource.com/hafnium/+/HEAD/docs/Architecture.md and its Coq based verification https://github.com/project-oak/hafnium-verification/tree/hfo2/coq-verification? Or some other KVM-focused verification? 


Tuesday, October 27, 2020

Silicon valley, innovation, and logos

As we near the end of SeptemberOctober I realize that I haven't updated this blog in a bit. Sometimes my entries are inspired by recent events or material I bump into, such as https://spectrum.ieee.org/tech-history/cyberspace/todays-internet-still-relies-on-an-arpanetera-protocol-the-request-for-comments. This format is refreshingly simple when I compare the effort in curating the https://www.rfc-editor.org/rfc/rfc5970.txt versus some of the more painful processes in other venues.

Speaking of ipv6, I remember visiting Facebook in 2011 or 2012 to help with some UEFI ipv6 network boot issues. Since we didn't retrofit IPV6 boot to legacy PXE, datacenter folks going all ipv6 https://www.datacenterknowledge.com/archives/2010/06/10/facebook-deploys-ipv6 UEFI was the only game in town for deployment. I still recall the rough, open interior of the Facebook campus https://www.theregister.com/2011/02/08/facebook_in_menlo_park/, but the best memory was when I was leaving. My host walked me through a courtyard and a recall passing a building with a windowed corner office where I saw Zuckerberg leaning back in a chair in his T-shirt with a phalanx of suited men standing in front of his desk. It was as if a king of yore was holding court with his many knights and vassals. Fascinating stuff.

People may decry the "Fall of silicon valley" https://www.robrhinehart.com/the-fall-of-silicon-valley/, but from my first trip there in 1997, to that 2012 trip, to the training session http://www.chromium.org/chromium-os/2014-firmware-summit/2014%20Chrome%20OS%20Firmware%20Summit-%20Overview.pdf at the CHM https://computerhistory.org/, to my last trip to Intel HQ last year....



including catching a glimpse of Grove http://vzimmer.blogspot.com/2014/02/anniversary-day-next-next.html 




Speaking of Grove and his collected wisdom



such as "disagree and commit," there has always been a battle to enforce this, including folks who might "disagree and de-commit." Regrettably, for '20 a new one has crept in. namely, "agree and ignore. But as always, a culture is only as strong as the folks who continually fight to uphold it. 

Regarding values I now find myself often using the triple of 'business first, team second, and career third' to explain to engineers the priority they should use in their career. Namely, focus on solving business problems first, even if the issues are outside of your organization. Next ensure that you have a healthy team and support your organization. Finally, worry about your career. Mistakes often happen when engineers put their career's as a first priority. In the latter case it may yield a small local victory but in the end is corrosive to both the business and the team.

In addition, the I found the value of 'always leaving' to be of interest from the Google culture https://www.oreilly.com/library/view/software-engineering-at/9781492082781/. It doesn't mean that you should always be switching jobs but instead it argues that employees always ensure that they provide sufficient documentation and collateral such that they can 'leave' a role for a more important mission and easily have someone take their place. Someone once remarked "I guess that I have job security since my code is so complicated." That's not demonstrating 'always leaving' but instead a more pernicious 'always nesting.' And in fact the reply to that person was that such a type of code will lead to disruption because long term the business cannot sand for that state of affairs.

Beyond Grove's book and recent musings in the Google text, some of the insights from Amazon are quite interesting. These include the role of PE's in their organization https://deloitte.wsj.com/cio/2020/09/16/at-aws-engineers-drive-architecture-shape-products/ to the Builders' Library https://aws.amazon.com/builders-library/?cards-body.sort-by=item.additionalFields.customSort&cards-body.sort-order=asc. The latter is interesting in that it solves many problems - a scalable way for PE's to mentor, and by providing the material publicly, demonstrates the expertise and competence of their technical leads to customers along with their internal population.

Beyond 'ignoring' and other things, I do like the quote "Innovation takes something that people use and improves upon it. "from the above-cited "Fall of Silicon Valley" article. I should reprise my thoughts http://vzimmer.blogspot.com/2013/12/invention-and-innovation.html on this topic. 

On the subject of improvements, we are also trying to enhance the workflow across many different firmware technologies, such as slimboot, coreboot, and PI-based implementations like EDKII - the span of which we discuss a bit in chapter 1 of https://www.springer.com/us/book/9781484261057, too. This work includes an effort to allow for interoperability between these environments via a more standardized 'payload' We have a draft specification at https://universalpayload.github.io/documentation/spec/spec.html and various implementations of payloads, including Linux https://github.com/universalpayload/linuxpayload and EDKII https://github.com/universalpayload/edk2, alongside various bootloaders like slimboot https://github.com/universalpayload/slimbootloader and coreboot https://github.com/universalpayload/coreboot to invoke the payload. The latter will grow to EDKII Min Platform, Oreboot, and the former to U-Boot and potentially Skiboot or Hostboot.

In addition to curating works on the universal payload above, the earlier-promised 'FSP SDK' mentioned in slide 20 of https://2018.osfc.io/uploads/talk/paper/1/OSFC_Keynote-005.pdf is being developed in https://github.com/universalpayload/fspsdk.

Regarding additional improvements, we continue to explore Rust for firmware, including the talk https://www.youtube.com/watch?v=dCu0-frSURE and presentation https://uefi.org/sites/default/files/resources/Enabling%20RUST%20for%20UEFI%20Firmware_8.19.2020.pdf. We elaborate on these points in chapter 20 of the upcoming https://www.springer.com/us/book/9781484261057, too. 

In addition to improving code through language based security, the boothole vulnerability led to some interesting exchanges on Twitter https://twitter.com/vincentzimmer/status/1290377140223934465, including a curation of the various defense-in-depth activities underway for EDKII https://github.com/jyao1/SecurityEx/blob/master/Summary.md. This work is something that the tianocore infosec https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Security-Issues can help drive, along with tightening up the CVE allocation process https://cve.mitre.org/news/archives/2020/news.html#September182020_TianoCore_Added_as_CVE_Numbering_Authority_CNA. Writing CVE's appears to be as much art as science.

The NSA also weighed in on UEFI Secure boot usage in https://www.nsa.gov/news-features/press-room/Article/2347822/nsa-releases-cybersecurity-technical-report-on-uefi-secure-boot-customization/, including https://trustedcomputinggroup.org/wp-content/uploads/TCG_EFI_Platform_1_22_Final_-v15.pdf. The latter has an interesting list of contributors who have moved on - Lee and Wooten and Springfield retired, Shiva at HPE, Bill at Amazon, Monty at GE....

I guess that I've been on this project too long. I even saw a reference in the TPM dev miniconference to figure from https://people.eecs.berkeley.edu/~kubitron/courses/cs194-24-S14/hand-outs/SF09_EFIS001_UEFI_PI_TCG_White_Paper.pdf. This was before Brogan became master of Mu https://github.com/microsoft/mu_tiano_platforms and when I had a lot more hair.

Ironically I even find people who sometimes try to explain to me the intent behind some thing I created 20 years ago. Recently it was some nuance of the PEI infrastructure. I didn't have the heart to tell them that I invented the item in question, wrote the initial implementation and the specification on the topic, too. Or mention why the Terse Executable (TE) image has the 'VZ' signature. I just smiled and nodded by head. Ah, youth.

As a final thought, the Intel logo change inspired a trip down memory lane http://vzimmer.blogspot.com/2014/01/.

The new logo https://images.anandtech.com/doci/16063/20200902172826_575px.jpg



is pretty compelling.

And on that I think I'll close on this, a happy note.

Happy & safe continued quarantine to those in geo's with restrictions.


  

Thursday, May 28, 2020

Feedback, tech talks, 23 or Anniversary.Next^8

This covers my 8th blog aligned with my work anniversary, a successor to https://vzimmer.blogspot.com/2019/02/tiano-147-and-22-or-anniversarynext7.html.  I've now passed the 23 year milestone.

I try to land this blog posting on the anniversary day. Regrettably I'm a few weeks late this year (OK a couple of months at think point). Thanks to Lee F. for reminding me about me dilatory blogging practices so that I can bump this out of my drafts folder.

The important thing about the trip is to pushing things forward. I hearken back to a quote from Joel Spolsky https://www.joelonsoftware.com/2002/01/06/fire-and-motion/ "Fire and Advance." Keep pushing. Don't duck and cover. This includes engaging with theoretical debates about tech with people, esp since folks who have the time to 'debate' and not spending enough time 'doing' (e.g., firing, advancing).

One thing about the trip is that there is always criticism. Sometimes the insight is insightful, such as the FSP-S critique in https://review.coreboot.org/plugins/gitiles/coreboot/+/refs/changes/28/36328/5/Documentation/fsp/fsp-s_discussion.md. At other times is can be a bit tougher, such as the  https://media.giphy.com/media/z9AUvhAEiXOqA/giphy.gif that Topher had as the image for https://uefi.info/.

Feedback can include specification size https://twitter.com/bcantrill/status/1219085364737888256. I am sympathetic to this approach. For example, in the PI specification we originally posted as five volumes http://www.uefi.org/sites/default/files/resources/PI%201.5.zip as late as PI 1.5 but are moving to a monolithic https://uefi.org/sites/default/files/resources/PI_Spec_1_7_final_Jan_2019.pdf
document in PI1.7. The rationale for the smaller books is that if you were doing something like the Intel FSP, you might just want to learn about generic software objects like HOB's and Firmware Volume headers (e.g., Volume 3).

As always happens to me when I haven't blogged in a while I collect more random observations. Technology seems to run in cycles, namely the I/O Controllers (IOC's) on mainframes to abortive attempts to standardize a couple of decades ago such as I2O https://en.wikipedia.org/wiki/I2O (which could be a blog on its own where I2O apportioned the work wrong with respect to data movement) to today's Smart NIC's, safe OS's such as the 432's Ada Imax OS https://en.wikipedia.org/wiki/IMAX_432 and today's Rust-based Tock OS on Open Titan https://github.com/google/tock-on-titan. Or 'computer' companies like Honeywell going from 16-bit https://en.wikipedia.org/wiki/Honeywell_316 to quantum https://techcrunch.com/2020/03/03/honeywell-says-it-will-soon-launch-the-worlds-most-powerful-quantum-computer/.

Another interesting observation this blogging cycle... Although my patenting career has mostly gone to sleep, I was interested to see the reformulated web page https://en.wikipedia.org/wiki/List_of_prolific_inventors with company names added. I observed Intel twice - myself and Bellevue Intel neighbor Mike Rothman. Although this list not complete since it's missing Intel's Robert Chau



 (nearly 500) https://venturebeat.com/2020/01/07/a-bright-future-for-moores-law/. Other notable tech companies mentioned in the list: IBM 51 times, 4 for Broadcom, 2 Qualcomm, 1 AMD and 15 for Intellectual Ventures (just a few miles away from the office @ 405/i-90) here in Bellevue. No MS or Faang companies?  Maybe latter are too young? At the time of this posting I'm #90, but the ineluctable march of the forces of IBM will bump me out of the top 100 before the end of the year, I suspect (or edits to address omissions like I mentioned above).

Speaking of Bellevue/Seattle area, prior to the recent health quarantines, there were some pretty interesting local talks, including one from Flowers in SODO on refactoring , and any code in your source tree without tests is 'legacy code.'  --  https://martinfowler.com/books/refactoring.html

Since EDKII just upstreamed a unit test framework https://github.com/tianocore/edk2/tree/master/UnitTestFrameworkPkg w/ 2 tests (Safe Int Lib and reset system), I'd say it's now more on the 'legacy' side of the ledger. It was a bit saddening since the talk was hosted an office next to my former Seattle Intel office.

Although it did find the Pac NW variant of Stonehenge, it would appear.


Another good talk was in SLU for the Rust meetup,


including a Wasm talk by a Google engineer
https://docs.google.com/presentation/d/1eOaVK3b5Ye13ZF92viWETq7F8wZdBOOY_yT37JNjAPs/edit#slide=id.p with always interesting propaganda posted at their new office.




These days, though, the in-person meetups are mostly replaced by conference calls, including but not limited to Zoom, GoToMeeting, BlueJeans, Skype, Hangouts, and MS Teams.

Let's wrap up with interesting companies in the news. I mentioned oxide in http://vzimmer.blogspot.com/2019/12/rust-oxide-corrosion.html and was happy to receive some of their swag https://twitter.com/vincentzimmer/status/1233847629928452096. I also had a chance to catch https://oxide.computer/blog/on-the-metal-3-ron-minnich/ interview with Ron. Ron wrote the introduction to https://www.apress.com/us/book/9781484200711 and it's always great to hear Ron provide history and insights into technology. When we finished this book co-author Marc Jones said 'we should do one on servers.' As always in tech, it takes a little time https://www.opencompute.org/projects/open-system-firmware. Speaking of Oxide, beyond their podcast series Bryan Cantrill has also done a nice talk as part of Stanford's ee380 https://www.youtube.com/watch?v=vvZA9n3e5pc&list=PLoROMvodv4rMWw6rRoeSpkiseTHzWj6vu.  Good stuff.



Friday, December 13, 2019

rust, oxide, corrosion,....


Miguel de Icaza (@migueldeicaza)
All of us writing C and C++ are living on borrowed time.

The only safe future is Rust.

Prepare your code to go out of scope.
 
inspired me last night to share some of the EDKII Rust work underway. I filed https://bugzilla.tianocore.org/show_bug.cgi?id=2367 to motivate the posting of
https://github.com/tianocore/edk2-staging/tree/edkii-rust.

Within this package you can find https://github.com/tianocore/edk2-staging/tree/edkii-rust/RustPkg which includes https://github.com/tianocore/edk2-staging/tree/edkii-rust/RustPkg/MdeModulePkg/Universal/CapsulePei 
as an example of using Rust to build an EDKII capability. The idea is not to fork-lift upgrade the entire 1 million LOC EDKII upstream to Rust but instead to migrate critical flows and libraries to a safer language. The capsule example is especially important since the capsule is an attacker controlled data object and the parsing flows are quite complicated https://github.com/tianocore-docs/Docs/raw/master/White_Papers/A_Tour_Beyond_BIOS_Capsule_Update_and_Recovery_in_EDK_II.pdf. The UEFI PI modularity of PEIM's and DXE drivers, along with the language interop of the FFI of Rust to other languages, naturally lend themselves to this evolutionary approach.

And listening to folks like Alex https://hardwear.io/berlin-2020/training/hunting-uefi-firmware-implants.php reminds me of the value of assurance. Thanks Alex for the discussion on language-based-security, including Ada/Spark discussion. It also reminded me of my conversation with Aucsmith in DC http://vzimmer.blogspot.com/2018/09/.

Also interesting to see other plays on "Rust" in the market, such as 'oxide'
https://oxide.computer/blog/introducing-the-oxide-computer-company/
or my favorite 'corrode' https://github.com/jameysharp/corrode for converting C code to Rust. It'll be interesting to see usage of Rust for Oxide's firmware endeavors.

When I look at the above EDKII Rust work, I ask myself if it's a 'sustaining evolution' or 'disruptive innovation' https://online.campbellsville.edu/business/sustaining-innovation-vs-disruptive-innovation/.
Specifically, do examples of disruptive innovations regarding Rust and firmware include things like https://www.ics.uci.edu/~aburtsev/doc/redleaf-hotos19.pdf  with formal or https://github.com/oreboot/oreboot with all Rust and no blobs?

Interesting times.

I try to not to rewind the history machine, but I invariably find myself assessing my engagement with system software during pivots like this. Specifically, when I started working full time I joined a time chartered with writing firmware for a remote telemetry unit (RTU). It was essentially an embedded computer that would be strapped to a gas pipeline to control valves, measure gas flow, etc. It was IOT before IOT was vogue.

My first team requested that I write the code for an 8051 microcontroller in assembly. I asked for requirements and then requested purchase of a C compiler. I was able to produce the features in C and evolve quite readily with the ever changing requirements. Over time I bounced back to assembly for some severely resource constrained usages, but C has become my language of choice.

And worse I was afflicted by large legacy assembly code bases with rich algorithmic capability. For example, there was a PCI resource allocator written in assembly that had down bottom-to-top allocation based upon a specific hardware requirement. When the next generation design required top-to-bottom, the schedule became imperiled by having to rework thousands of lines of assembly to unwind this assumption. Although C doesn't necessarily yield 'better code'', it proved much easier to factor than assembly.

So C has made sense for development efficiency, but the last couple of decades have posed other challenges, such as making assurance claims. Modula-2 never made it https://cseweb.ucsd.edu/~savage/papers/Sosp95.pdf mainstream, nor has C# for OS's https://en.wikipedia.org/wiki/Singularity_(operating_system). And I've already mentioned my own less-than-successful earlier investigations into type safety for firmware http://vzimmer.blogspot.com/2016/12/provisioning-porting-and-types.html.

So nearly 30 years onward I'm happy to have another language transition, namely a assembly-to-C moment with C-to-Rust change.

Thursday, December 5, 2019

10 years on and a small exercise in package dependencies

This post marks the 10th anniversary of my first blog posting https://vzimmer.blogspot.com/2009/12/good-reading-on-firmware-uefi.html. It also represents the 80th post.

Since a blog entry whose sole purpose would be to designate this milestone might not be so, er, 'interesting', let's add a little meat to this posting. Specifically, lets talk about package dependencies in the EDKII project. This is an interesting exercise since a large, package-based project might be daunting to new-comers trying to discern relationship among components. Correspondingly, long-term participants in the project working on refining the implementation, such as cleaning up packages to be more independent and maintainable (aka paying down technical debt) might find a higher level view interesting.

To help in these various causes, below is a recipe for generating dependency graphs across the packages. The steps include:

0. Clone https://github.com/tianocore/edk2 locally
1. Install graphviz from http://www.graphviz.org/
2. Go to the edk2 root
3. Run the following command in Windows shell:
     findstr /S /R /N /C:"^ *[^ #]*\.dec$" *.inf > all_packages.dot
4. Open the file all_packages.dot with an editor with regular expression support, such as https://notepad-plus-plus.org/, and perform a text replacement like below:
    Replace: ^([^\\]+)\\.+: *(.+\.dec)
    With: "\1/\1.dec" -> "\2"
5. Add the following text around replaced text in the all_packages.dot file:

      strict digraph PackageDependency {
           graph [rankdir=TB];
           node [fontsize=9,shape=box];
           edge [arrowhead=vee,arrowsize=0.5,style=dashed];
      …
      …
      }

6. Run the following command in Windows shell:
      graphviz\bin\dot.exe -Tsvg all_packages.dot -Oall_packages.svg

Below is a sample output of this exercise.



So what?

This is an interesting exercise since it shows the strong dependency most EDKII packages have on the MdePkg at the bottom, which is the package containing generic interfaces and library classes. Not surprisingly, the second most leveraged package is the MdeModulePkg, which is the generic set of implementations based upon interfaces in the MdePkg. It's also useful to discover if there are any circular dependencies, such as between the ArmPkg and the embeddedPkg. And finally, it shows opportunities for clean up, such as removing the dependency of the NetworkPkg upon the ShellPkg. The latter is based upon the existence of some network-related shell commands that should really rely upon API definitions of the MdePkg instead of instances in the NetworkPkg.

Saturday, November 30, 2019

Beware Experts, Beware Disruptors

This blog entry hearkens back to a tale of expertise versus disruption. The scene is DuPont, WA in the late 1990's. Prior to Tiano (2000+) there was an effort to write a BIOS from scratch in the erstwhile workstation product group (WPG) at Intel. The code name of the effort was "Kittyhawk" (1998). Like all boot firmware, the initialization was broken up into phases. Instead of the tuple of {SEC, PEI, DXE, BDS} of UEFI PI of {BOOTBLOCK, ROMSTAGE, RAMSTAGE, PAYLOAD) of coreboot, Kittyhawk had {VM0, VM1, VM2, BOOTLOAD} phases.

Although the platform initialization of Kittyhawk was in native code protected mode C code (and cross-compiled to Itanium for the latter boot variant), this was prior to emergence of IA-32 EFI and it's OS's, which was a parallel effort in DuPont. Instead, the 32-bit version of Kittyhawk relied upon a PC/AT BIOS boot interface for MBR-based operating systems. To make this happen an engineer decomposed an existing PC/AT BIOS from it's POST (Power-On Self Test) component from it's 'runtime', or the 16-bit code that provided the Int-callable BIOS API's. The 32-bit Kittyhawk code did device discovery and initialization, and then the 32-bit code provided the device information in a data block into the generic 16-bit BIOS 'runtime.' This 16-bit code was called 'the furball' by management in anticipation of a day when it could be 'coughed up' in lieu of a native 32-bit operating system load.

This Kittyhawk effort on IA-32 was definitely deemed disruptive at the time. The DuPont Intel site was more of a rebellious effort, with the larger product teams in Oregon constituting the existing 'experts'. There were cries from Oregon that the disruptors in Washington would never boot Windows, and if they did, they'd never pass WHQL, or the suite of compliance tests for Windows. The furball booted OS's and passed compliance tests. It turned out that having a non-BIOS engineer look at the problem didn't entail the prejudices of what's possible, and the work about having a clean interface into a BIOS runtime was used in the subsequent Tiano Framework work such as the Compatibility Support Module (CSM) csm https://www.intel.com/content/dam/www/public/us/en/documents/reference-guides/efi-compatibility-support-module-specification-v098.pdf API design.

So this part of the story entailed providing some caution in listening to the experts, but....

You sometimes need to beware disruptors. Specifically, riding high upon the success of building a common C code based to support IA-32 and Itanium, with the furball providing 32-bit OS support and the Intel Boot Initiative (IBI)/EFI 0.92 sample provide 64-bit Itanium OS support, the next challenge was scaling to other aspects of the ecosystem. Part of this scaling entailed support of code provided by other business entities. The EFI work at the time had the EFI images based upon PE/COFF, so the Kittyhawk team decided to decompose the statically linked Kittyhawk image into a set of Plug In Modules (PIM's).

After the Kittyhawk features were decomposed into PIM's, I remember standing in the cubicle of one of the Kittyhawk engineers and a BIOS guru visiting from Oregon helping with the 64-bit bring-up, the topic ranged over to option ROM's. The Kittyhawk engineer said "let's just put our PIM's into option ROM containers." I was a bit surprised, since to me the problem with option ROM's wasn't carving out code to run in the container but more entailed 'how' to have the option ROM's interoperate with the system BIOS. That latter point is where standards like EFI came into play by having a 'contract' between the Independent Hardware Vendors (IHV's) that built the adapter cards and the OEM's building the system board BIOS.

So the work of the disruptors was laudable in showing that you could have shared, common code to initialize a machine, but to scale to new paradigms like OS and OpROM interfaces you really needed a broader paradigm switch. And as would be shown, it took another decade to solve this interoperability issue.

As such, sometimes you have to be a bit wary of the disruptors, although the disruptive Kittyhawk did provide many learning's beyond the furball/CSM concept, including how up to today EDKII builds its ACPI tables via extracting data tables from C code files. Alas the Kittyhawkfort was shut down as part of the shuttling of the Workstation Group, and efforts to build a native code BIOS solution fell into the hands of the EFI team as part of the emergent (1999+) Tiano effort. At this time there was the EFI sample that eventually grew into the EFI Development Kit (aka the first Tiano implementation), now referred to as EDKI versus today's EDKII. EDK leveraged a lot of the learning's of EFI to inform the Intel Framework specifications, including the phases of SEC, PEI, DXE, and BDS we know today.

This original PEI phase differed somewhat from the PEI you can find in the UEFI PI specification, though. Instead of the concept of C-based PEIM's and PEI core, the original instantiation of the PEI Core Interface Specification (PEI-CIS) was based upon processor registers. There was still the business requirement of separate binary executables and the solution of GUID's for naming interfaces was used. But instead of a PPI database, the PEIM's exposed exported interfaces through a Produced GUID List (PGL) and imported interfaces via a Consumed GUID List (CGL). During firmware construction and update the CGL and PGL were reconciled and linked together. And in order to having services that interoperate, the concept of a 'call level' was introduced. The call level detailed which registers could be used by services in a PEIM since there was no memory for a call stack with this early PEI-CIS 0.3.

At the same time we were defining PEI 0.3, there was another approach to writing pre-DRAM code without memory, namely the Intel® Applied Computing System Firmware Library V1.0 (hereafter referred to as Intel® ACSF Library).
ACSFL was described in a Intel Update article (same dev update magazine https://github.com/vincentjzimmer/Documents/blob/master/it01043.pdf)
that provided 32-bit callable libraries for boot-loaders. This effort came from the embedded products team and entailed delivering a series of libraries built as .a files. This simplified initialization code addressed the lack of a memory call stack by using the MMX registers to emulate a call stack. This differed from the PEI 0.3 model of reserving a set of registers for each call level. The ACSFL concept was smarter in that constraining PEIM's to a given call level really impacted the composition of these modules into different platform builds.

I do enjoy the quotation 'requires only 15KB of flash' when I wander over to https://github.com/IntelFsp/FSP/blob/master/KabylakeFspBinPkg/Fsp.fd with its size of 592k, or 39x size dilation of silicon initialization code over the last 20 years. This is similar to the 29x scaling in the number of transistors between the 7.5 million in a Pentium II https://en.wikipedia.org/wiki/Pentium_II and the 217 million in a Kaby Lake https://www.quora.com/How-many-transistors-are-in-i3-i5-and-i7-processors.
The same article provides the software layering. One can see some similarity of ACSF Library to the Intel Firmware Support Package (FSP). This is no accident since the Intel FSP was intended to originally target the same embedded market, although Intel FSP in 2012 had the embedded mantle being carried by the Intel of Things Group. As another point of irony, the original FSP built w/ the Consumer Electronics Firmware Development Kit (CEFDK). The CEFDK was in fact the evolution of the ACSF library, going through various instances, like FDK. The latter were used to enable phones and tablets beyond just embedded.

So ACSF Library provided some learning's, and at the same time the LinuxBIOS (prior name of coreboot) solved this issue of supporting C code without initialized DRAM via the ROMCC tool.  ROMCC was a custom compiler that used processor registers to emulate a stack so that you could write things like DRAM initialization in C code.

The initial implementations of PEI-CIS 0.3 https://www.intel.co.kr/content/dam/www/public/us/en/documents/reference-guides/efi-pei-cis-v09.pdf with just registers proved difficult to deploy, and since part of the Tiano effort entailed a requirement to use standard tools, techniques like ROMCC were deemed not tenable. As such, as the PEI CIS went from 0.3 to 0.5, we moved PEI to the concept of temporary memory, with using the processor cache as RAM (CAR) as the 'temp ram.' Regrettably we were asked to not disclose how we implemented temp RAM, and the technique and initialization code were not made open source or published (and a modern instance of this reluctance can be found in the FSP-T abstraction of FSP 2.0). The coreboot community, though, didn't have this constraint and https://www.coreboot.org/images/6/6c/LBCar.pdf https://www.coreboot.org/data/yhlu/cache_as_ram_lb_09142006.pdf provided details on how to use this technique.

As time progresses, I am amused about the various recollections. During the late 2000's someone told me 'you rewrote PEI many times' when in fact the only substantive change was going from registers in 0.3 to temporary memory in 0.5. Although not necessarily a full re-write, the EFI implementations did have a long history:

IBI Sample -> EFI Sample -> Tiano Release 1..8, Tiano Release 8.1..8.7,  EDK1117, ...

...UDK2015, UDK2016, .......

Also, some fellow travelers mentioned to me their fond recollections of EFI discussions in 1996. I smile because the first discussions of EFI (in the form of IBI) weren't until 1998. But I suspect everyone's memory gets hazy over time, with this blog post having its own degree of fog and haze.

And these efforts also remind me that changes in this space take time. A recent reminder that api support takes time, such as the discussion of the EFI Random Number protocol in Linux
https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.5-EFI-RNG-x86. This is an API that was defined in 2010 by Microsoft for Windows 8 and then introduced in the UEFI standard in 2013. Or features like UEFI secure boot from 2011 UEFI 2.3.1 showing up in Debian Buster https://wiki.debian.org/SecureBoot just in mid 2019.

The other interesting perspective is https://fs.blog/2019/12/survivorship-bias/, namely you often hear about the success stories, not failures. Only for extreme cases like https://en.wikipedia.org/wiki/Normal_Accidents do you find a rich journal of failures. So in portions of this blog and the EFI1.2 discussion in http://vzimmer.blogspot.com/2013/02/anniversary-daynext-arch-ps-and-some.html I have tried to shed light upon some of the not so successful paths.



Saturday, September 28, 2019

Formal, Erdős, Rings, and SMM

This blog is a mix of a few topics. To begin, I have always been on the outlook for how to scale quality via tools http://vzimmer.blogspot.com/2013/12/better-living-through-tools.html, if possible. I continually hope that there techniques to help close that crazy semantic gap between documentation and code. To that end I enjoyed a recent talk https://nwcpp.org/author/lloyd-moore.html on Everest which provides a practical realization of generating usable C code https://github.com/denismerigoux/hacl-star/tree/master/snapshots/hacl-c from a specification. I especially enjoyed the portion of the talk where the developer had to integrate feedback from the Firefox team on how to make the code 'look better.' When working with https://github.com/termite2/Termite I saw that one of the primary challenges in UEFI code generation https://www.intel.com/content/dam/www/public/us/en/documents/research/2013-vol17-iss-2-intel-technology-journal.pdf involving production of readable source code.

This does not mean there is no place for formal methods in the firmware space, though. For example, the formalization of EFI FAT32 http://eptcs.web.cse.unsw.edu.au/Published/ACL22018/Proceedings.pdf provides confidence in the design of the structures, although it doesn't necessarily lead to formally validated software objects. And the space of employing computers for maths continues to get more exciting https://www.vice.com/en_us/article/8xwm54/number-theorist-fears-all-published-math-is-wrong-actually.

In general, math can be your friend. And speaking of mathematicians, I was always intrigued by folks who mentioned their Erdős number https://en.wikipedia.org/wiki/List_of_people_by_Erdős_number. Viewing that specific wikipedia entry I noticed Leslie Lamport in the '3' category. This reminded me of the heady days of DEC research and my former Intel colleague Mark Tuttle who had worked there with Lamport. Not surprisingly, Mark co-authored a paper https://dblp.uni-trier.de/rec/bibtex/journals/fmsd/JoshiLMTTY03 with Leslie, giving him an Erdős number of '4'. And since I co-authored a paper https://dblp.uni-trier.de/rec/bibtex/conf/woot/BazhaniukLRTZ15 with Mark, that gives me an Erdős number of '5'. As wikipedia only mentions the cohort class up to 3, I suspect some exponential blow up of any numbers beyond that https://www.oakland.edu/enp/trivia/. Nevertheless I still find it to be a pretty cool detail.

And on the topic of cool details, it is always exciting to see the evolution of UEFI security in the market, including work done by Apple https://twitter.com/NikolajSchlej/status/1159602635176939520


for driver isolation. The UEFI specification has API's to abstract access to resources, and we even modeled said resources via a Clark Wilson analysis https://cansecwest.com/slides/2015/UEFI%20open%20platforms_Vincent.pptx slides 73+.

The slides commenced with a summary of the isolation rules, and then a mapping of the rules to the important boot flows of host firmware.

The flows begin with the normal boot, or S5,


and continue with the S3 wake from sleep event (eschewed these days in lieu of S0ix)


and culminates with a boot flow for a flash update. This is typically the boot response to an UpdateCapsule invocation wherein an update-across-reset (versus runtime update in SMM or BMC) is employed.


With these rules, the OEM-only extensible compartment should be isolated from the 3rd party pre-OS extensible compartment (e.g., option ROM's) and extensible 3rd party runtime (e.g., OS). This analysis was used to inform work in the standards body and open source on what defenses we should erect. We refreshed some of this type of analysis recently in https://edk2-docs.gitbooks.io/understanding-the-uefi-secure-boot-chain/comparing-clark-wilson-and-uefi-secure-boot.html. Regrettably we added a code signing guard in the mid-2000's (e.g., UEFI Secure Boot) but we didn't provide inter-agent isolation.

As a historical note, we talk about isolation, including rings, in https://firmware.intel.com/sites/default/files/resources/A_Tour_Beyond_BIOS_Supporting_SMM_Resource_Monitor_using_the_EFI_Developer_Kit_II.pdf for SMM using user mode and paging (page 10) in 2015 and an earlier mention of pushing EFI drivers into ring 3 in the now expired https://patents.google.com/patent/US20030188173 filed back in 2002 (17 years ago, gasp).

Given the 1999 inception of EFI and 2001 for the EFI driver model, the challenge has been application compatibility and delivering these features to market given their later addition. To that end I must given credit to Apple for their work in this space, especially as true innovation is delivering the solution to market in my view http://vzimmer.blogspot.com/2013/12/invention-and-innovation.html .

On other oddities from the past and SMM, I was curious about the first mention of System Management Mode (SMM).  This archaeology was also motivated by testing the claim wherein technology is most fully described in its initial product introduction, with further evolution having successively fewer details given industry practice in the domain. Since the CPU mode was introduced in the 386SL, I found the following http://bitsavers.trailing-edge.com/components/intel/80386/240852-002_386SL_Technical_Overview_1991.pdf in which the feature is first described, although the acronym "SMM" was never used. I especially enjoyed this quote from the datasheet:

"Since system management software always runs in the same mode, OEM firmware only needs to provide a single set of SMI service routines. Since real mode is essentially a subset of each of the other modes, it is generally the one for which software development is most straight-forward. SMI firmware developers therefore need not be concerned with the virtual memory system, page translation tables initialized by other tasks, interprocess protection mechanisms, and so forth. "
(page 58).

Especially ironic the mention of paging given the earlier topic in this blog on isolation and the document https://firmware.intel.com/sites/default/files/resources/A_Tour_Beyond_BIOS_Supporting_SMM_Resource_Monitor_using_the_EFI_Developer_Kit_II.pdf. Some of the venerable collateral does describe the smi# pin http://bitsavers.trailing-edge.com/components/intel/80386/240814-005_386SL_Data_Book_Jul92.pdf, though.  And a later book https://www.amazon.com/Intels-Architecture-Designing-Applications-McGraw-Hill/dp/0079113362 on the 486SL has source code for an assembly language 'kernel' to handle dispatching of event handlers. In that latter book, it was nice to see a mention of my former colleague and https://www.amazon.com/Beyond-BIOS-Developing-Extensible-Interface/dp/1501514784/ co-author Suresh, too.

Well, so much for September 2019 blogging. I did my usual wandering across topics. I should probably produce more bite-sized blogs, one per topic, but what would be the fun in that.