iOS 14 – What time is it?

iOS 14 upgrades LLVM from the 9.x line to the 10.x line. This jump in versions was not well communicated to users, and ended up producing a challenging and upsetting bug on the platform breaking a code within Occipital’s Structure SDK. Apple, a trillion-dollar company, needs to do better. The change made by the LLVM team is easily justified (read on), but the way that it was rolled out was not as clear cut.

How did this happen? Well…

Maintaining an SDK in Apple’s ecosystem

Maintaining an SDK can be tiresome, thankless work on the best of days. There are a considerable number of platform issues to worry about, even after ignoring the litany of API issues, sample app snafus, out-of-date paradigms, etc. I could go on.

I have for some time been employed by Occipital, and work on some components of their Structure SDK. This SDK isn’t super typical by 2020 standards, in that most of the SDK code runs on device, is made available in Objective-C (not Swift, although interoperability isn’t too challenging), and is largely available to provide on-device SLAM and computer-vision adjacent code to work with Occipital’s Structure Sensors. Needless to say, this isn’t akin to Facebook’s or Twitter’s SDK, which are largely a set of web APIs.

The challenge with this is that we end up having to deal with a lot of issues at different levels of the stack, with different expertise required to make them all work together in harmony. Some examples of disparate work streams that go into making this work:

While somewhat of a limited view of what goes into the SDK itself, this should give an idea of all the different ways things can go wrong. This is somewhat burdened by the fact that we are producing our own hardware and partaking in Apple’s MFi program, and building an ecosystem around that. Naturally, there is a considerable amount of platform work in ensuring we’re up-to-date with regards to which Apple APIs we use and keeping our SDK running on the latest releases that Apple puts out.

Needless to say, when something breaks, it can take a lot of (disparate) expertise to get to the bottom of the issue!

iOS 14, and Apple’s lackadaisical attitude towards developers

Fast-forward to mid-September and iOS 14 was released. There was some talk internally of our SDK not working well on the iOS 14 beta release. However, the company was amidst other priorities at the time, working to release Calibrator 4.0. We admitted that it probably wasn’t worth testing the beta release extensively, or spending time investigating these issues, since it was pretty common for Apple betas to be buggy and we weren’t ready to invest a bunch of time investigating issues that could easily never make it into the final release of iOS 14.

Quite frankly, many of our engineers aren’t excited when Apple announces a new release either. For some time now Apple has pushed the burden of QA/QC onto developers’ lap, or just outright shipped buggy code. Does anyone remember last year, when Catalina was released? How about the iOS 13 release?

Developing on this platform is stressful. New releases from Apple being buggy isn’t some strange 2020 affliction. These aren’t unprecedented times, this is the norm now. Working in this space can be awesome, when you see customers launching products built on top of your work that are changing the world. The platform, though, is constantly shifting ground.

Breaking changes between LLVM 9 ➔ 10

So what broke? It actually took a couple weeks to really trace down, but the crux of the bug was that our frame synchronization (between iOS color frames and Structure Sensor depth frames) broke due to a change in LLVM with regards to the default std::chrono::steady_clock implementation. This meant that while no change on our SDK actually broke anything, merely upgrading to iOS 14 would cause apps that otherwise functioned well on iOS 13 and earlier to immediately and irrevocably break, because apps link to libc++ on iOS dynamically.

Rolling this back a bit, what actually happened? Well, turns out that part of our code for performing frame synchronization relied on std::chrono::steady_clock, to timestamp frames upon arrival. These arrival timestamps are a small part of the information used by the Structure SDK to enable our system to synchronize Structure Sensor events with iOS sensor events (camera, IMU). In iOS 13 and earlier, this clock matched the same time scale of timestamps coming from CMSampleBufferRef / CVPixelBufferRef produced by AVFoundation. On iOS 14, because std::chrono::steady_clock was changed as a result of moving from LLVM 9 ➔ 10, these clocks no longer matched the same time scale.

How do I know this? Well, mostly empirical testing, but we can roughly identify which clock was being used by running a simple test. AVFoundation timestamps come in as if they are acquired by mach_absolute_time(), which uses the underlying kernel clock, CLOCK_UPTIME_RAW.

This is something that you can’t really change via the API, so we were kind of stuck with it. However, as most of our frame code was written in C++, and our drivers work across multiple platforms, we weren’t exactly stoked to mix this Objective-C API within our pure-C++ code.

If we look hard enough, we find that std::chrono::steady_clock, which we used to time the arrival of sensor events from Structure Sensor devices used to be based on the same CLOCK_UPTIME_RAW kernel clock prior to LLVM 10. On iOS 14, with the introduction of LLVM 10, this was changed, to now use CLOCK_MONOTONIC_RAW.

Initially, I hadn’t discovered the above LLVM threads, but I could tell something was wrong with the timestamps I was getting. To verify that there was a change in behaviour, I used the following code to get timestamps on both iOS 13 and iOS 14:

double now_nanoseconds_machclock(void)
{
    mach_timebase_info_data_t timebase;
    kern_return_t status = mach_timebase_info(&timebase);

    const double machToNanoseconds = (status == 0)
        ? static_cast<double>(timebase.numer) / static_cast<double>(timebase.denom)
        : 0.0;

    const double machTime = mach_absolute_time();
    return machToNanoseconds * machTime;
}


double now_nanoseconds_steadyclock(void)
{
    auto now = std::chrono::steady_clock::now().time_since_epoch();
    return std::chrono::duration_cast<std::chrono::nanoseconds>(now).count();
}

If you ran this on iOS 13, these clocks roughly matched (ignoring small differences since I didn’t call both functions in parallel). On iOS 14, if you had immediately rebooted your device and ran this, then the clocks matched. However, if you had at any point put your device to sleep, these clocks quickly become very, very different. This was particularly hard to reproduce, because it wasn’t immediately obvious from the std::chrono::steady_clock documentation that the device going to sleep was going to affect our time count!

This is because the difference between CLOCK_UPTIME_RAW and CLOCK_MONOTONIC_RAW is that CLOCK_UPTIME_RAW does not increment when the devices’ screen is off and the device is sent to sleep. This change isn’t something we could hold off on, or even avoid without re-implementing that part of libc++ within our SDK. If you upgraded to iOS 14, this broke your app. For that, I’m sorry.

Consequences

It’s hard to argue that the changes to std::chrono::steady_clock are wrong, per sé. It makes sense that those adhering to the standard and expecting code to behave the same across platforms want a steady clock that is, well, steady.

However, the end consequences of fixing this bug with an OTA update and with very little insight into the exact changes made are pretty dire. Like I said, this affected the Structure SDK itself, and by extension, every app built with the SDK that supports Structure Sensor devices. This broke hundreds of apps powered by Structure Sensor, and the fix was for every developer to recompile their app with our latest SDK, since there was no switch I could hit that would restore the iOS 13 behaviour. With many of our customers in the medical 3D-scanning space, this caused a lot of anxiety. Some customers had to delay or even cancel appointments with patients because of this bug. We fixed it, and we did so as fast as we could, but there’s a real impact to human life when this kind of thing happens.

This shows the dangers of AppStore-like models in some ways: you can only have one version of an application live at any time, and you can’t choose the environment it is getting run in. This is exactly the reason that many are flocking to Snap, Flatpack, AppImage, and more on Linux today.

But author, you say, wasn’t this change good and justified? How is this a failure of the AppStore? Doesn’t this mean that apps that were broken subtly by this behaviour are now automatically fixed? Not so fast.

Conclusion – Apple needs to be better

Look, this causes a lot of strain and chaos, even ignoring the struggle of our engineers to get to the bottom of this. We want to be able to provide an SDK that people find reliable, and want to build their applications on. But it is getting harder and harder to do when Apple, a trillion-dollar company, can drop an iOS update that changes a very low-level and core aspect of the platform on a whim.

Mostly, I wrote this because I’m upset at all the stress and chaos that something like this generated. It shouldn’t have to be like this. Apple absolutely has the resources to do better here. They absolutely have the talent to produce solutions to this problem that don’t involve breaking hundreds of apps on an OS update. They are choosing not to.

Again, the trend of Apple releasing buggy release after buggy release is concerning. In this case, the bug was a result of a change that was probably quid-pro-quo positive across the board. I don’t fault the LLVM team for making the decision they did. How are they supposed to know about an SDK from a fairly small fish in Apple’s ocean? But when you run a platform, take a cut of every purchase on that platform, force developers to cater to a growing list of rules and restrictions, you need to be open and transparent, and be able to provide developers on that platform a way to help it grow rather than expecting them to perform QA/QC cycles for free.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.