So everything’s been hunky-dory with my Hybrid Provider; I’ll get back to that...
For whatever reason, our production SharePoint environment is hosted on a VM, not a physical machine (although new servers have been ordered). We have scheduled VM backups, SharePoint backups, and SQL backups. Somewhere in the midst of all these backups, a restore was performed, and wiped out our IIS settings, reset the web.config in our SharePoint web app, and magically uninstalled the SmartPart. Grrr.
At least the content database was intact! So I reinstalled the SmartPart, the HybridProvider, and got IIS back up. No sweat. I logged in with both a SQL account and an AD account. No sweat. It was a little slower than I remembered it being, but even still, it worked, and I was happy. I recreated my admin page, added a few SmartParts, and loaded up my admin controls that programmatically use the HybridProvider to access our users and automate SharePoint tasks.
An unknown error has occurred.
Crap. Those five little words are the bane of any SharePoint developer’s life. I guessed that it was a timeout, since it took almost a minute before the error page came tearing its way onto my screen. So I started removing webparts, and when all of the controls that used the HybridProvider were gone, things started working. The diagnosis was clear: the HybridProvider was timing out.
One interesting thing to note before we discuss the fix: the logging in and out mechanisms worked fine. However SharePoint and the standard ASP.NET 2.0 Login and CreateUser controls interact with the HybridProvider was still managing to work with no problems. It was only through code (of course) that things were barfing.
So I started digging. I created a quick Windows forms app with a button that initialized the HybridProvider to run a query to get all users, and deployed it to the SharePoint server. It worked immediately. I did the same with a web app, and sure enough, after about a minute, it worked. As usual, when it comes to SharePoint and things just cease working, rest assured that it’s either security, configuration, or an act of God.
Okay so IIS was being a butt. My first hunch would be a problem with security, but it did work. If there was a security problem, shouldn’t it die immediately and give me a better error? Nope. It just must so have happened that the SharePoint timeout was set quicker than the IIS timeout. What about configuration? Well it used to work! I double and triple checked my web.config, and everything was in order! Membership is rather finicky, so usually if something’s not wired up correctly, it just dies.
So why was this thing taking so long? What security or configuration changes would cause it to slow down, but still be able to finish the race? I dug deeper, and deployed a separate AspNetSqlMembershipProvider and AspNetActiveDirectoryMembershipProvider to see what was going on behind my HybridProvider’s scenes.
Again, on the Windows app, both worked fine. Now on the web side is when I finally found something that I could work with. The SQL provider worked fine. The AD provider took almost a minute! So that’s the culprit: AD! I completely removed SharePoint and my custom code from the equation, and uncovered that something was amiss with Active Directory. As it turns out, I was wrong about my initial assumption: it wasn’t security or configuration; it was both! Recall: the Windows app worked fine. Why? Because I was logged into the box with a domain account, which is something that AD needs, since it tries to connect first via SSL. The SQL provider is using SQL auth, so it doesn’t care who you’re logged in as. Now what’s going on in IIS?
I was thrown off the scent of the solution because my code provides separate credentials in the AD connection string, and I assumed that the AD provider, at some point within the Initialize() method, impersonated this account to talk to the forest. However, I realized that it is actually the account running the app pool of the web site that dictates who is technically authenticating to AD. So go ahead and pass God’s credentials in the connection string; if your app pool is running under a non-domain account, such as Network Service, well, sorry God – access denied!
And sure enough, a symptom of that stroke our web server suffered included reverting the identity of our app pool from our AD domain service account back to Network Service. So I switched it back, and everything began working immediately. So, indeed, security AND configuration, as always, ended up being the elements to blame.
Now once again, I am always happy when I solve a problem, but am rarely content with it. In this case, I was curious about what exactly made AD tick, and decided to go screw around on our production server. Haha.
DISCLAIMER: Neither myself, Catalyst, its parent, subsidiary, or third-cousin companies, my mother, her friend Barb, nor the homeless guy who lives outside my condo WOULD EVER endorse “screwing around” or even logging in or looking sideways at a production server. I did this only in the name of…um…science…
So first, I changed the app pool to run as myself. A bit narcissistic, I know, but I like the idea of being the process behind all my company’s threads. Well, even after an explicit IIS reset (don’t tell the network guy…he’ll have a heart attack) I continued to get that “Service Unavailable” message on our site.
I quickly switched back to our AD service account to avoid having a heart attack of my own. That was my punishment for the narcissism. By why “Service Unavailable” and not a deeper- in error message; it was as though IIS stopped listening. Well, my assumption would be that my account lacks some hosting permission intrinsic to my friend, Network Service. Feel free to explore this one on your own; in the meantime, I’ll be content knowing not to impersonate myself on a production SharePoint web server.
Next: impersonate the domain admin account. I know…the thought is chilling… My hunch: this account will certainly have all the permissions on the Windows Server 2003 / IIS side of things, but (by design) can do nothing in SharePoint. So I entered this account and did another shameless IIS reset. (Be careful here. When you right-click an app pool in IIS, go to “Properties” and click the “Identity” tab, the UI lets you type in whatever you want for the password, only enforcing that your confirmation password matches. If you manage to type the wrong password exactly the same twice, IIS will act as though everything is peachy, and save with the wrong password! Nothing will work! So be careful; this’ll only knock you a step further back from the solution.)
Anyways, my lunch was right. The HybridProvider works, and you are authenticated, but authorization fails, and you are redirected to the SharePoint access denied screen. Does this make sense? Sure. You are authenticated because AD happily welcomes the domain admin, and SQL doesn’t care. But you are not authorized because the process running the IIS worker threads do have access to the pages (either through SharePoint or NTFS security)! Unfortunately, my further hunch was wrong…
I figured that if I made the domain admin the site collection administrator in SharePoint Central Admin for my web app (like the AD service account was), I could impersonate it as well on the app pool, and ultimately make this a super-account that could do frickin’ anything! Well, fortunately, it didn’t work, so I can’t get in trouble for these shenanigans. I even tried making it the primary site collection admin, putting my AD account down to the secondary site collection admin.
But it still didn’t work. As a last-ditched effort, I made the domain admin a local admin on the web server, since that was the only place the AD service account seemed to have something the admin didn’t. Nothing. Access Denied. So it seems as though the HybridProvider will only work (that is, authenticate on its own and pass things off to SharePoint for proper authorization) if the SharePoint primary site collection administrator == the app pool’s identity != the domain admin != Chris Domino. I’m sorry but I don’t have a better answer…
So I logged off the server before I got myself into any trouble. The last problem to go after relates back to my test web app: why did the AD membership provider take so long to authenticate as Network Service? I asked Google, and it directed me to the MSDN site for the AD provider. Here’s a snippet:
The ActiveDirectoryMembershipProvider class will attempt to connect to Active Directory using SSL. If SSL fails, a second attempt to connect to Active Directory using sign-and-seal will be made. If both attempts fail, the ActiveDirectoryMembershipProvider instance will throw a ProviderException exception.
This is from MSDN.
So what the beef is sign-and-seal? Going back to Google, it was slim pickin’s, but another MSDN page gave me this little definition:
"The connection to the Active Directory server is secured by digitally signing and encrypting each packet sent to the server."
Now, I can see how something like this has a lot of overhead, and can affect performance, but to take a full minute to authenticate a user? And then to STILL work? The only guess I can hazard is that Network Service, not being a domain account, combined with some SSL voodoo, was not able to authenticate against AD. Then the AD provider tried this sign-and-seal nonsense, and it worked. Did it take a minute to fail, and then sign-and-seal worked immediately? Or did SSL fail, then this packet-level encryption took its sweet time to get though?
Ugh. I don’t know. Yet again, we have another SharePoint experience leaving us with more questions than answers. I need a beer.