Syndication

1/2/2014

35565 Views // 0 Comments // Not Rated

Thoughts On Programming Managed Metadata With SharePoint 2013’s CSOM Taxonomy API

Introduction

Well, it's official: programmatically manipulating managed metadata still sucks in SharePoint 2013. I've complained about it here and many other places. However, to be fair, I feel like it sucks less than it did in the 2010 days. It seems more stable now. For example, the Term Store Management Tool is suffering from its "The Managed Metadata Service or Connection is currently not available" error less frequently. Also, the new term-driven navigation stuff is pretty cool.

But in the end, when I find myself in the weeds of, for example, using CSOM to automate the provisioning of taxonomy fields or the querying or taxonomy values from ListItems, the managed metadata architecture still feels hacky to me. To deal with these headaches, I'm going to cover a few of the charming peccadilloes you might encounter when using CSOM to programmatically get and set managed metadata field values.

In my opinion, the biggest hack of taxonomy in SharePoint is its dependency on hidden lists and fields and event receivers to wire everything together. Not only is this approach akin to using SQL triggers to implement application logic in a database, but it also causes acne on the otherwise beautiful face of my information architecture. Hidden "CatchAll" columns? Hidden note fields with generated internal names? Ugh. In order to see why the code needed to properly set a TaxonomyField in CSOM is interesting enough to blog about, let's start with some background.

Some Background

I always make a gold star effort to use the proper field value classes in SharePoint development, client and server APIs alike, when setting values on ListItems or SPListItems. The best example of this is the trusty little FieldUrlValue class that allows me to never care how the string representation of a url and its description should be formatted on a "Hyperlink or Picture" column.

If you dig into FieldValueUrl a bit, you'll see it has a nice, friendly, parameterless public constructor, and members for the two pieces of information that comprise it. You can new up an instance of it, populate the properties, and set it to a ListItem's field value (See Line #'s 1 - 5 below). And when you want to get an instance of one, just cast the value of the field back to it (Line #7).

Code Listing 1

item["url field"] = new FieldUrlValue()
{
Description = "description",
Url = "http://chrisdomino.com/blog"
};
...
string url = ((FieldUrlValue)item["url field"]).Url;

Lovely and clean. Should we expect the same from taxonomy fields? Of course. Are we that lucky? Of course not. The two field classes we'll be dealing with, TaxonomyFieldValue and TaxonomyFieldValueCollection, promisingly start out by having public constructors. But as we investigate further, we'll see that they are like our body's appendix: useless and prone to rupture.

Setting Taxonomy Values: The Problem

Single-Selection Fields

Considering TaxonomyFieldValue first, we can use it in the same manner as FieldUrlValue: new up an instance, set the Label, (to the text of the corresponding term) the TermGuid, (which is of course not really a guid, but obnoxiously a string representation of the term's unique id instead) and the WssId (the integer Id of the owning ListItem). This final parameter seems particularly hacky to me; how can an item's field value belong to a different item? When in doubt, use -1. It just works. Lame.

Anyway, once one of these is populated, we'd set it as the value of the corresponding ListItem's field, right? Then call Update? And ExecuteQuery? Finally, we'd refresh the page, and see our sparkly new managed metadata value proudly displayed? Unfortunately, all sarcasm aside, this is not in the cards for using CSOM to set a TaxonomyFieldValue.

Multi-Selection Fields

Before seeing why, let's take a look at what's going on the IEnumerable version of our field value for multi-selection managed metadata columns. The constructor for TaxonomyFieldValueCollection in CSOM is adventurous. It takes in a ClientRuntimeContext, a string for the field value, (which I'll complain about next) and finally something called a "creating field."

The first parameter is trivial, so let's consider the second. Like I said, assembling a string representation of a field value in SharePoint is bad news: it opens up the door to potential typos, hard coding, and incompatibilities across environments. Unless you're programmatically backed into a corner (as you'll see we will be) it's always better to use the API.

But indulging in TaxonomyFieldValueCollection's constructor, we need to cobble these characters together to form our second parameter. What does this object look like when wearing a nice string-y dress? You might assume that such a gown would be woven together by the concatenation of the string representations of each constituent element in the collection. But to see that that's not how this fabric is made, let's first look at the way a single TaxonomyFieldValue string is assembled:

Each angle-bracketed component is the corresponding property of a TaxonomyFieldValue. What's nice is that SharePoint will throw an exception (when ExecuteQuery is called) if the value is malformed. Although it's great to have this protection, be careful: exception handling should never be part of your logic flow. The resulting error message is actually pretty straightforward:

The given value for a taxonomy field was not formatted in the required <int>;#<label>|<guid> format.

The API is even smart enough to make sure that the guid in question corresponds to a term in the termset bound to the column. But once all of these checks pass and we have a proper string, recall that we're still expressing a single taxonomy value here. Therefore, let's see what multi-selection string values are wearing:

Interestingly, we don't care about that WssId all of a sudden. Also, notice how the single-selection value resembles a lookup field value (which is not at all surprising, since TaxonomyField inherits from LookupField). By comparison, the multi-selection field resembles a plain old frock that's been hastily sewn together with mismatched thread.

The kicker here is that if we intuitively use the multi-selection string representation of a taxonomy value for the second parameter of its constructor, a ServerException is thrown: "Value cannot be null." So how the hell do we build a collection of values? Maybe initialize the collection with a single value for the constructor's second parameter, (which one do you choose?) call Clear, and then finally squeeze off a bunch of Adds to fill it with terms?

Maybe in dream world where these methods actually exist and I wouldn't have to be writing this post. What works (and by "works" I mean "doesn't throw an exception") is passing in an empty string for this parameter, and then calling PopulateFromLabelGuidPairs on the TaxonomyFieldValueCollection object, passing in a multi-selection string formatted as above.

And we're not even done with this constructor! What about that "creating field" for the third value? From MSDN, this is "the Field the value is bound to." Although this makes more sense than "creating field" it's still not very helpful. "Bound to?" You can't get Field objects from a ListItem in CSOM, so do we grab it from the parent list's fields? The content type's FieldLinkCollection? The web's FieldCollection?

What works (again I'm using the same "doesn't bomb" definition for "works") is getting the field from the list's collection; any other column from any other collection throws the same "Value cannot be null" exception we saw before. I'm not going to go into any more detail here, because even after Frankenstein-ing this object together and setting it as a column value, we end up in the same place as we did with its single-selection cousin: no changes to the ListItem are persisted.

So as you can see, these field value types have cryptic properties, confusing constructors, and, at the end of the day, simply don't do anything. Unfortunately, the only path through these trying taxonomy woods is made dark by the shadows of string manipulation and hard coding. Although there are a lot of articles out there on how to code around these managed metadata malignancies, my solution is the most dynamic. If an API backs me into the aforementioned programmatic corner, I'll happily hack my way through the brush with strings.

Setting Taxonomy Values: The Solution

The secret is the aforementioned taxonomy hack I discussed at the beginning of this post: dealing with the hidden information architecture that managed metadata in SharePoint depends on. In order to persist one of these fields in CSOM, we need to set not only the column's value, but the corresponding hidden note column as well. Without this second value, the fields will not update. The fact that the TaxonomyFieldValue class doesn't do this for us automatically is a travesty.

My approach is to create an extension method off of ClientContext that takes in a ListItem and the field's internal name, and uses a model (which is just a class that holds the term data) to set the values. The only drawback is that it requires us to do some string puppeteering. But like I said, we have no choice since conventional CSOM simply does not get us there. Let's take a look at the code, which you'll see isn't even all that complicated.

First things first: the model. Named TaxonomyModel, this class has properties for the core components of a term: the name/value pair corresponding to its label and guid. Since the WssId can always be safely set to -1, there's no need to carry a value for it along for the ride within this DTO. Finally, to model the hierarchical nature of terms in a termset, each TaxonomyModel has a collection of itself. Here's what the class looks like:

Code Listing 2

public class TaxonomyModel
{
#region Members
public Guid Id { get; set; }
public string Name { get; set; }
public List<TaxonomyModel> Children { get; set; }
#endregion
#region Public Methods
public override string ToString()
{
//return
return string.Format("-1;#{0}|{1}", this.Name ?? string.Empty, this.Id == null ? Guid.Empty : this.Id);
}
#endregion
}

Note that for the purposes of this post, the Children property is not used, although some simple recursion is all it would take to set the field's value to a child or grandchild term. But on to the good stuff: the code that programmatically sets taxonomy field values. The coolest aspect of this extension method is that it works for both single- and multi-selection columns. Other field types that have similar distinctions around selection options, such as lookups and choices, require different handling for each. But for my taxonomy setting logic, both are handled. Let's take a look:

Code Listing 3

public static void SetTaxonomyFieldValue(this ClientContext context, ListItem item, string internalName, params TaxonomyModel[] values)
{
//set taxonomy field
item[internalName] = values.Select(t => string.Format("-1;#{0}|{1}", t.Name, t.Id)).ToArray();
//set hidden taxonomy field
item[context.GetTaxonomyHiddenFieldName(internalName)] = string.Join(";", values.Select(t => string.Format("{0}|{1}", t.Name, t.Id)));;
}

Although this method appears to be on the petite side, it actually packs quite a 1-2 punch. As previously stated, SetTaxonomyFieldValue extends ClientContext, taking in a ListItem, the internal name of the taxonomy field, and a params array of TaxonomyModels. The later allows us to more fluidly support multi-selection columns.

The first of the 1-2 punch is on Line #4, where we use a little lambda to set the specified field's value to an array of strings. Each one is formatted to match the single-selection convention for a TaxonomyFieldValue. If there's just one term, the value is set to an array with a single element; if there's many, then the array contains a string for each. This is how the same handling can apply to both single- and multi-selection columns.

Line #6 contains the follow-up punch, right in the managed metadata service application's face. First, on the left hand side of the assignment, you'll see we call a new method, GetTaxonomyHiddenFieldName, which we'll look at next. What's more interesting is the value that we're setting for it on the right: it's a single string that's formatted as a multi-selection TaxonomyFieldValueCollection. How bizarre: the single-selection convention is used as an array of strings, whereas the multi-selection variety is joined into a solitary string!

At least taxonomy is consistent in its weirdness. But instead of listening to me complain more, let's move on and take a look at GetTaxonomyHiddenFieldName. The idea is that we get the target field by its internal name from the root Web of the ClientContext, cast it to a TaxonomyField, pull the note field from it, and grab its internal name. This is the name of the secret hidden taxonomy field for our target column.

<Rant>

This name of this column is an all-lower case guid with no dashes...that doesn't match any of the unique id values for either field. Real classy, SharePoint.

</Rant>

Code Listing 4

public static string GetTaxonomyHiddenFieldName(this ClientContext context, string internalName)
{
//initialization
if (!context.Web.IsObjectPropertyInstantiated("Fields"))
{
//load fields
context.Load(context.Web.Fields,
ff => ff.Include(
f => f.Id,
f => f.InternalName));
context.ExecuteQuery();
}
//get target field
Field field = context.Web.Fields.ToList().SingleOrDefault(f => f.InternalName.Equals(internalName));
if (field == null)
return string.Empty;
//get taxonomy field
TaxonomyField taxField = context.CastTo<TaxonomyField>(field);
context.Load(taxField, f => f.TextField);
context.ExecuteQuery();
//get note field
Field noteField = context.Web.Fields.ToList().SingleOrDefault(f => f.Id.Equals(taxField.TextField));
if (noteField == null)
return string.Empty;
//return
return noteField.InternalName;
}

Starting on Line #4, we see proper CSOM style development: checking if a property has been initialized, and if not, loading it and only its members that we need (Line #'s 7-11). Then, as stated above, we get the field, (Line #14) the taxonomy field, (Line #18) the note field, (Line #22) and finally the latter's internal name (Line #26). This is what I meant when I said my method is more dynamic than the other ones out there. Yes, we need to assemble strings for the actual values, but everything else is inferred and farm-safe.

Also, recall from my previous post that if we're setting multiple values on a ListItem, we can't call ExecuteQuery intermittently before we're done; some columns won't be persisted. If this is the case, clone your context and use that to perform any ancillary ExecuteQuery calls. This will allow us to get our hidden note field's name without disrupting the in-progress ListItem updates.

And that's how we need to be setting taxonomy field values. By populating both the field itself and its corresponding note field, calling Update and ExecuteQuery on the LIstItem will persist our selected term or terms. As we saw, this was only half the battle; we also had to disregard the proper value types and instead sew together string representations for each single-selection and multi-selection of terms to store in the ListItem.

Another Problem

But there's one last thing to mention before we discuss the logic needed to extract a TaxonomyFieldValue from a ListItem. I noticed a peculiar behavior around copying a value from a ListItem in one farm to another. Although this is not a common use case, the error I'm about to describe manifests demons lurking in the depths of these managed metadata values.

Just like with Entity Framework, if a ClientObject is birthed by one context, it cannot be used in another. We can of course peel off property values and assign them to other ListItems; it's the objects themselves that are non-transferable. For example, if you grab a content type from context A, you can't add it to a List begotten from context B. Such usage throws the following InvalidOperationException upon calling ExecuteQuery:

The object is used in the context different from the one associated with the object.

We'll even see the above error when both contexts are pointing to the same site collection. The reason I mention this is because getting a TaxonomyFieldValue from a ListItem in one context and setting it as the value to another from a different context will throw this exception as well. However, other ClientValueObjects work fine! I've tested with choice, lookup, (single-selection and multi-selection for both) HTML, image, and user columns, not to mention all the "standard" ones.

Since none of these other value types exhibit the same behavior, it seems to me that taxonomy value objects are much more complicated than their ClientValueObject brethren. FieldUrlValue, for example, can simply be thought of as model; it is not bound to its context; it's just a DTO. The fact that TaxonomyFieldValues cling to their makers implies that internally, they must doing a lot more than just modeling managed metdata.

Getting Taxonomy Values

Getting a value from a taxonomy field is much easier, since the API has already done the work of populating all the ListItem's columns that we requested via our CamlQuery with ClientObjectValues. Once we have a fully-loaded ListItem, all we need to is grab the indexed field value from the object via its internal name: SharePoint development 101...

...Sometimes. The one very weird thing I've noticed when querying taxonomy field values is that sometimes, instead of nice value objects, we are instead given a cryptic dictionary whose keys look like COM and whose values most closely resemble classes from SharePoint's client JavaScript Object Model (JSOM). There are always two entries: one for the type of value, and one for the value itself. The following screen shots from a Visual Studio unit test debugging session depict this phenomenon.

Raw Taxonomy Value

Raw Taxonomy Value Expanded

As you can see, the "_ObjectType_" is "SP.Taxonomy.TaxonomyFieldValueCollection" which is the JSOM representation of our new CSOM multi-selection managed metadata friend. The value key, "_Child_Items_" is an array of objects, where each element is another dictionary whose name/value pairs describe the type of object and its properties. In this case, each one of these is a JSOM object of type SP.Taxonomy.TaxonomyFieldValue.

During this debugging session, I experimented with all kinds of different configurations for my CamlQuery: including all fields, including only the taxonomy field, including the taxonomy field and its hidden note field, and even including no fields. The most scientific result from this research is that sometimes calling ExecuteQuery twice (by dragging the yellow-arrowed current line of execution back up to the CamlQuery statement and re-running the code to the place shown the images above) will give me a TaxonomyFieldValueCollection instead of this JSOM barf.

Taxonomy Value

Eventually, after re-executing these few lines over and over, once I finally get a legitimate CSOM taxonomy value, I will keep getting a taxonomy value for the rest of the session. It seems as though something gets "triggered" and the API will start returning the excepted result types. I haven't seen anything on the Internet that describes this issue, possibly because this is such a hard issue to describe.

What vexes me is that I've been doing CSOM for almost a year now, and have never seen this behavior. All my SharePoint 2013 projects have dealt with managed metadata, weather it was taxonomy or term-driven navigation. So perhaps it has something to do with the context of my debugging session? Do unit tests adversely affect CSOM somehow?

Or perhaps my local development environment is screwy? It was provisioned by someone else's PowerShell script that uses content type hubs, host headed sub-site collections, and taxonomy stored in the managed metadata service application instead of the root site collection (a 2013 CSOM first for me; to more easily support term-driven navigation and one-stop administration, I've been using the root site collection to store my term sets).

Either way, sometimes I get taxonomy values, and sometimes I get crap. Although I hate allowing non-deterministic logic to keep breathing, I am at least assured that all the possibilities are known and can therefore be safely contained. I implemented this container as another extension method hanging off of ClientContext that handles getting taxonomy values regardless of which format they might be retrieved in. This little ditty is called "GetTaxonomyFieldValue" and does all the work needed to parse raw column data into arrays of our TaxonomyModel. Let's take a look:

Code Listing 5

public static TaxonomyModel[] GetTaxonomyFieldValue(this ClientContext context, ListItem item, string internalName)
{
//initialization
if (item[internalName] is TaxonomyFieldValueCollection)
{
//field is a taxonomy value collection
return ((TaxonomyFieldValueCollection)item[internalName]).ToList().Select(t => t.ConvertToTaxonomyModel()).ToArray();
}
else if (item[internalName] is TaxonomyFieldValue)
{
//field is a taxonomy value
return new TaxonomyModel[] { ((TaxonomyFieldValue)item[internalName]).ConvertToTaxonomyModel() };
}
else if (item[internalName] is Dictionary<string, object>)
{
//check for a "raw" taxonomy value collection
Dictionary<string, object> rawTaxonomyValue = item[internalName] as Dictionary<string, object>;
if (rawTaxonomyValue.ContainsKey("_ObjectType_") && rawTaxonomyValue.ContainsKey("_Child_Items_") && rawTaxonomyValue["_ObjectType_"].Equals("SP.Taxonomy.TaxonomyFieldValueCollection"))
{
//get child values
List<TaxonomyModel> model = new List<TaxonomyModel>();
foreach (object taxValue in rawTaxonomyValue["_Child_Items_"] as object[])
model.Add(Utilities.ConvertFromRawTaxonomyValue(taxValue));
//return
return model.ToArray();
}
else
{
//check for a "raw" taxonomy value
TaxonomyModel model = Utilities.ConvertFromRawTaxonomyValue(item[internalName]);
if (model != null)
return new TaxonomyModel[] { model };
else
return null;
}
}
else
{
//not taxonomy
return null;
}
}

There are three main checks in this method: is the value of the passed-in column of the passed-in ListItem a TaxonomyFieldValueCollection (Line #4)? Is it a TaxonomyFieldValue (Line #9)? Or is it one of these crazy dictionaries (Line #14)? If none of these are true, Line #40 assumes this field is not taxonomy at all, and returns null.

Before we check out the code that parses this dictionary, there is a helper method in here on Line #'s 7 and 12 that converts TaxonomyFieldValues to TaxonomyModel arrays. Intuitively named, "ConvertToTaxonomyModel" extends only TaxonomyFieldValue, since TaxonomyFieldValueCollection alone doesn't store any actual taxonomy data.

Code Listing 6

public static TaxonomyModel ConvertToTaxonomyModel(this TaxonomyFieldValue value)
{
//return
return new TaxonomyModel()
{
//assemble object
Name = value.Label,
Id = new Guid(value.TermGuid.ToString())
};
}

Back to GetTaxonomyFieldValue: notice that we always return an array of TaxonomyModel. This is again to support both single- and multi-selection fields with a single method. After calling ConvertToTaxonomyModel, we either transpose all models to an array via a lambda, or use a lone model to seed a single element enumerable on Line #'s 7 and 12 respectively.

Starting on Line #17, we start to dissect the dictionary if the API wasn't feeling up to giving us proper values. Line #18 uses more hard coding to ensure that this raw data indeed resembles known taxonomy JSOM objects. If so, we loop our way through them and call another helper, "Utilities. ConvertFromRawTaxonomyValue," on each taxonomy value. Otherwise, since each one of these child objects is a JSOM dictionary with the same keys as the root, we can use this same method to check if the whole damn thing is a TaxonomyFieldValue disguised as JSOM. This is what Line #30 and the rest of that else statement does.

For completeness sake, let's take a look at ConvertFromRawTaxonomyValue. First of all, I preface calls to it with "Utilities" (which is my go-to name for classes that house extension methods) because I don't like writing extension methods against object unless they are ubiquitous to every .NET type. Not only does it seem hacky to extend the god base class, but it also muddies up IntelliSense popups.

Code Listing 7

public static TaxonomyModel ConvertFromRawTaxonomyValue(object rawTaxonomyValue)
{
//check for raw object
if (rawTaxonomyValue is Dictionary<string, object>)
{
//make sure it's a cient taxonomy field value
Dictionary<string, object> clientTaxonomyObject = rawTaxonomyValue as Dictionary<string, object>;
if (clientTaxonomyObject.ContainsKey("_ObjectType_") && clientTaxonomyObject.ContainsKey("TermGuid") && clientTaxonomyObject.ContainsKey("Label") && clientTaxonomyObject["_ObjectType_"].Equals("SP.Taxonomy.TaxonomyFieldValue"))
{
//get fields
return new TaxonomyModel()
{
//assembly object
Name = clientTaxonomyObject["Label"].ToString(),
Id = new Guid(clientTaxonomyObject["TermGuid"].ToString())
};
}
}
//not taxonomy
return null;
}

As you can see, this is doing a lot of the same work as GetTaxonomyFieldValue; the keys and values varied just enough to warrant a separate method. Line #8 makes sure we have the correct data in our dictionary, and Line #11 pulls this information out to assemble a TaxonomyModel. If the method doesn't like what it sees, it will again return null to inform the caller that this is not a taxonomy field.

Speaking of null, one more interesting peccadillo about getting taxonomy values is what is returned when there's nothing to return. None of these extension methods will result in anything if they can't extract meaningful managed metadata from the column. How do we decipher this situation from a legit taxonomy field that just so happens to be empty?

What I've seen is that single-selection values will return null if there's nothing set, which allows us to basically not care if the field null or not taxonomy; the handling for null will be the same in either case. On the other hand, multi-selection columns will actually give us empty TaxonomyFieldValueCollections. Keep this in mind when you use GetTaxonomyFieldValue.

Conclusion

That does it for SharePoint 2013's CSOM Taxonomy. I took you through setting and getting values to and from managed metadata columns, and pointed out all the caveats encountered along the way. Reading back over this, perhaps I'm being a bit too hard on this API. The workarounds all ended up being pretty easy and work just fine.

Getting the job done is the most important short-term aspect of programming. CSOM makes this job possible when dealing with taxonomy from remote site collections, farms, or computers. My goal for this post is point out some of the gotchas, whose avoidance make this job easier. But by knowing the limitations of your platform and reducing hacks, we can achieve the most important long-term aspect of programming: getting the job done right!

You need to login with Twitter to share a Thought on this post.

[Home]

[All Posts]