Saturday, May 2, 2015

MacOS X 10.10.3 Update is *TOXIC*

As a PSA (public service announcement), I'm reporting here that updating your Yosemite system to 10.10.3 is incredibly toxic if you use WiFi.

I've seen other reports of this, and I've experienced it myself.  What happened is that the update for 10.10.3 seems to have done something tragically bad to the WiFi drivers, such that it completely hammers the network to the point of making it unusable for everyone else on the network.

I have late 2013 iMac 27", and after I updated, I found that other systems started badly badly misbehaving.  I blamed my ISP, and the router, because I was seeing ping times of tens of seconds!
(No, not milliseconds, seconds!!!  In one case I saw responses over 64 seconds.)  This was on other systems that were not upgraded.  Needless to say, that basically left the network unusable.

(The behavior was cyclical -- I'd get a few tens of seconds where pings to 8.8.8.8 would be in the 20 msec range, and then it would start to jump up very quickly until maxing up around a minute or so.  It would stay there for a minute or two, then rest or drop back to sane times.   But only very briefly.)

This was most severe when using a 5GHz network.  Switching down to 2.4GHz reduced some of the symptoms -- although still over 10 seconds to get traffic through and thoroughly unusable for a wide variety of applications.

There are reports that disabling Bluetooth may alleviate this, and also some people reported some success with clearing certain settings files.  I've not tried either of these yet.  Google around for the answer if you want to.  For now, my iMac 27" is powered off, until I can take the chance to disrupt the network again to try these "fixes".

Apple, I'm seriously seriously disappointed here.  I'm not sure at all how this testing got past you, but you need to fix this.  Its absolutely criminal that applying a recommended update with security critical fixes in it should turn my computer into a DoS device for my local network.  I'm shocked that several days later I've not seen a release update from Apple to fix this critical problem.

Anyway, my advice is, if possible, hold off for the update to 10.10.3.  Its tragically, horribly toxic not just to the upgraded device, but probably to the entire network it sits on.  I'm a little astounded that a bug in the code could hose an entire WiFi network as badly as this does -- I would have previously thought this impossible (and this was part of the reason why it took a while to diagnose this down to the computer -- I thought the ridiculous ping responses had to be a problem with my upstream provider!)

I'll post an update here if one becomes available.

31 comments:

David Beegle said...

I've experienced this as well. Hopefully it will be remedied soon.

Anthony Ryan said...

I have also been having similar issues (likely the same one) that started shortly after the early 10.10.3 pre-releases were seeded to a machine on my network.

Perhaps this is a problem.

brob said...

I don't see any actual proof here of this.

Dylan Taft said...

Saw this suddenly on Ddwrt on my Netgear 4500v2. Switched to Mikrotik and it's fine now.

Dylan Taft said...

Saw this on Ddwrt on 2.4ghz when I got the update on my Netgear 4500v2. Switched to a Mikrotik and it's ok now.

Thomas Mansencal said...

I noticed my Wifi network was really slow too yesterday, (almost 10 minutes to copy 4.7mo between two Macbook). I guess this is the reason.

Juan Doe said...

Same problem here. Mi first Mac, my first crap.

Alexander Staubo said...

Check if Photos is uploading to iCloud. Users have reporting this completely saturating their Internet connection. I doubt it's related to wifi drivers at all, just lack of upload throttling.

Garrett D'Amore said...
This comment has been removed by the author.
Garrett D'Amore said...

Alexander: So, as far as Photos, I don't think this is related. That's another problem (sucking all bandwidth) -- there is no conceivable reason why that would cause ICMP messages (ping) to take over a minute to be received. (It *could* explain *drops*, but not deferrals.)

Dylan: The switch to a different router makes sense. I think that this is a physical layer problem, and that its bad interaction between the driver/chipset/firmware in OS 10.10.3, and the firmware on my router. TBH, I think the router is just as complicit, but I'm not an expert in WiFi radio specifics, so this is just gut instinct on my part. (It *should* be shared medium, my gut instinct is that perhaps the new firmware isn't properly spreading its load across spectrum, or is somehow saturating spectrum. Somehow though this is triggering bad behavior on the router. I don't fully understand it.

brob: The proof is simple. Turn off the mac, or turn off the WiFi on the mac, and all other stations on the network immediately begin working as normal. Turn the mac on (or enable its WiFi), and boom all dead. Originally I misdiagnosed this as a router fault, it took replacing the router (with a different brand!) and then doing some further troubleshooting to determine it wasn't the router -- then I discovered it was the mac. (Part of the reason for the misdiagnosis was that turning off the wifi on the router immediately made life better for *wired* stations. I thought this was WiFi on the router being broken, only later did I understand my misdiagnosis. (Frankly, if you want further proof, you can try this out yourself.) I think for real deeper proof you ned to use a spectrum analyzer which I lack (and for which I lack the skill to use.)

Jean said...

I experienced the same problems after updating.
In my case it was caused by AWDL and Airdrop.
I came accross an incredibly detailed article on Medium by Mario Ciabarra about the issue with the helper app called WiFriedX to mitigate the issue.

See https://medium.com/@mariociabarra/wifriedx-in-depth-look-at-yosemite-wifi-and-awdl-airdrop-41a93eb22e48

Unknown said...

Open a terminal.
sudo ifconfig awdl0 down

You're welcome.

dr2chase said...

There *is* a conceivable reason for delayed ping -- BufferBloat. Not saying that's what's going on here, but it at least makes it onto my radar. If there's some serious network traffic, maybe through multiple connections, that might do it. Depends on the BW of your local connection, depends on the congestion control in your router. And yes, it affects ping time, that is how you diagnose it.

I've been working on installing CeroWRT but have not had the bandwidth to fully configure and test it.

Jean-Christophe said...

Alexander is probably right. Something is just eating up your bandwidth and you probably have a buffer bloated network device (e.g. Router) which makes your latency (which ping could detect) really bad. Search the web for buffer bloat for more info. I don't think it is a OS x problem but more like a crappy router/switch firmware.
So open up OS X Activity Monitor and see which process is consuming your bandwidth. Could be Photo, iCould, a backup tool, another cloud solution, etc.

Note that some router can use QoS and when overloaded by data will put all ICMP traffic (such as ping) in low priority (and there could be many devices on the network (including on your ISP or any gateway between you an the ping target) which behave the same. So ping can give you an estimate of the RTT (2x latency) but it can also provide completely irrelevant values if it is aggressively buffered. So it is possible that ping reports RTT of several seconds whereas opening an HTTP connection would still have a RTT of 20-60 ms, depending on the QoS applied by all network devices between your browser and the web site.

Note2: deactivating bluetooth could make your WiFi in the 2.4GHz range better, but not in the 5GHz range, unless you can point me to a bluetooth specification/implementation which uses the 5GHz band!

ktappe said...

What we mean by "proof" is for you to provide us some tcpdumps and/or traceroutes so we can see actual numbers.

Robby CHS said...

Photos uploading to iCloud is indeed very related. Please remember if you enabled it or not.

For the reason why it's related, please see this article: http://lartc.org/howto/lartc.cookbook.ultimate-tc.html

If you enabled iCloud Photo Library, try to limit the upload bandwitdh of your Mac (or iPhone or iPad) during the uploading process, by changing your router settings. An example of how to do it: http://www.tp-link.com/en/faq-557.html

Good luck!

Will Cate said...

We have four Macs in our house, all on 10.10.3. We are not seeing this problem at all, *however* nobody uses iCloud.

marxworld said...

Out of interest do you get the same behaviour if you go wired ?

LES said...

For me it was the Photos app. There were multiple background processes involved. So, I put the bandwidth limiter (from the dev tools) on that Mac and the problem 'went away'. Once Photos finally finished syncing (several days), I was able to remove the limiter and everything was fine.

Rich Brown said...

Folks, I'm sorry to say that the article on lartc.org talks about to Wondershaper. It was magnificent in its day - 2002.

But Wondershaper is missing out on the last 15 years of network research. It doesn't handle IPv6. And it is severely out of date with modern network traffic control. That's why Dave Täht wrote the article: Wondershaper Must Die http://www.bufferbloat.net/projects/cerowrt/wiki/Wondershaper_Must_Die

In direct response to the OP:

1) If turning off the Airport on the 10.10.3 computer makes everything work fine, then there's a bug in the wireless. It's not your duty to debug farther.

2) You just got an OS upgrade, so you deserve to get support from Apple. Call the AppleCare folks 800-275-2273.

3) Here's how to find out if your router is bufferbloated: go to the DSLReports Speed Test at http://dslreports.com/speedtest It tests latency *during* the download and upload. (Other speed test sites only test a couple pings before starting the test, so that's no test at all...) If the latency/lag figures get high during the download, then your router is bufferbloated.

4) For more info about bufferbloat, read http://richb-hanover.com/bufferbloat-and-the-ski-shop/ It contains recommendations for making your router less bloated.

Robby CHS said...

Slightly OOT, this problem doesn't only happen on iCloud Photo Library. It also happens when you upload video via Photos (Google+) on Android (tried on Nexus 5).

Thanks Rich Brown for very valuable information!

Ben Oliver said...

Having identical issue. Ping goes up to 1500ms+ when a MacBook Pro with 10.10.3 is connected.

Garrett D'Amore said...

We it does appear that indeed turning off both awdl0 *and* iCloud upload of iPhotos "solved" the problem.

I guess my router firmware does indeed suffer badly from buffer bloat or mishandlng of QoS. (Rescheduling packets for *seconds* is tragically stupid IMO -- better to have a fair-share sharing scheme than to blindy starve one stream in favor of another.)

What's more, current-gen home routers seem to have the same problem. I tried the latest ASUS AC2400 router, and also my home AC1900 router (nighthawk) suffers the same problem.

Still, I also blame Apple -- their approach here to flood the network also seems incredibly poor.

SB Dexter said...

The experience of bufferbloat is caused by a few factors.

First, network equipment, like cable modems and routers can end up buffering a lot of packets, (enough that it may take the queue seconds to drain).

Second, when a buffer or a QoS queue fills, they end up dropping the last packets in the queue.

TCP/IP has congestion management built in. It sends packets and waits for those packets to be acknowledged by the far end. Ordinary network latency is such that it can take a while (1/10th - 1/100 of a second) before the acknowledgements come back, so there is a window where it will keep sending packets while waiting for acknowledgements. Typically, this window starts out relatively small, and then increase until packet loss occurs. At which point, it backs off dramatically, and inches back up again.

The problem with dropping the most recently received packets is that it takes a long time for the sender to realize whats happened, and it takes a long time to adjust.

So, given these problems, perhaps Apple should be more conservative in picking an upload rate. On the other hand, poorly behaving network equipment, while not uncommon, is still broken, and how much should Apple do to accommodate that?I don't have a good answer. Ultimately, Apple gets the blame, but if they worked around this, there would be less pressure on router makers to finally fix this.

Now, all that said, I have already implemented fixes for buffer bloat on my network, and I'm not using iCloud photo library yet, and yet, I seem to be having more WiFi related problems since I started using 10.10.3. Or it seems that way. Its hard to track down.

Garrett D'Amore said...
This comment has been removed by the author.
Rich Brown said...

Regrettably, bufferbloat needs to be solved in many places at once. First your home router, but then also in *every* TCP/IP stack of *every* device.

It's regrettable that Apple hasn't leapt out front and implemented something like SQM/fq_codel in OSX (and in iOS). They could lord it over Microsoft, who also hasn't implemented any SQM...

Garrett D'Amore said...

Realized I made a bits/bytes error in my calculations, hence why I deleted my post.

So really really buffering in NICs and OSs needs to be smaller (alot).

For example a 1Mbps link can only deliver 83 full MTU frames per *second*. That says, at this really slow link speed, you really want almost *no* buffering -- at most a frame or two. You can scale up as the link speeds increase.

So I have a 50 Mbps down, and 5 Mbps up link. At 5 Mbps, if I queue up 1024 frames (a common count), and they are full MTU, we are looking at about 2 seconds of buffer bloat.

Conversely, at 1Gbps, I can handle 83000 such frames in just one second. So this is why you see larger buffers -- because the link speeds being high suggests that you need to have deeper buffers to avoid starving the link between interrupts and context switches.

I've not looked extensively at CoDel, but it seems that the way to do this would be to figure out your effective link bandwidth (asymmetric doesn't help here if you have to choose just one), and then try to configure something like 1 msec of buffer at each point in the stack. You may have to go a little bit higher for slower links.

Now I really wish I had a fast high end symmetric link (FiOS? Too bad I can't get it!)

Rich Brown said...

Yes, buffers can be *much smaller* except in certain high performance cases. (That's the problem. People are trying to set land-speed records with 10Gbps connections, and the really *do* need big buffers.)

But, us poor slobs with slow uplinks wind up with a ton of data buffered if we don't do anything. That's where fq_codel comes in.

Read my Bufferbloat and the Ski Shop essay (http://richb-hanover.com/bufferbloat-and-the-ski-shop/) to get a quick overview of fq_codel's machinery.

It's really clever - give it your link speeds (up and down) and it'll manage the bottleneck so that all the connections share the bandwidth fairly. (And it's a piece of cake to install if you have an OpenWrt-capable router...)

Zorak said...

Datapoint: Early 2k3 MBP Retina

Upgraded to 10.10.3. I started seeing awful ping times. (Out of 10 pings, 5 < 70ms, 4 around 400ms, 1 timeout.)

Then I signed out of iCloud.

10/10 pings < 70ms.

10/10, would sign out of iCloud again.

(I'm an Android user and don't even use iCloud.)

Unknown said...

I see the Photos problem not only in Yosemite, but also when my iPhone is storing new images in the Photos library. It's a shame that Apple's own Time Machine hardware suffers from bufferbloat- you'd think that Apple would test their products in a lab that replicates typical home network connectivity.

hoberion said...

Got the same issues on my MacBook and the nighthawk router