Share via


Adding multiple entities to table storage in one operation

Question

Friday, January 27, 2012 5:00 AM

In the code below I am inserting 2 entities into the Accounts table (both entities have the same partition key) and 1 entity into the PublicUsers table (has a different partition key).

In the development environment I was able to add all 3 entities at the same time in the same operation as outlined below without a problem.  This would appear to be an entity group transaction but as far as I understand the third entity shouldn't be included in the transaction since it has a different partition key.  Grouping these 3 inserts together would appear to be efficient but I'm not sure if it's the best way to do things.

So my question is how should I handle a situation where the first 2 addobject calls succeed but the third fails?  Or would any failures come during the SaveChanges call?  I understand that Azure tables don't have transaction support like one would expect from a relational database so what are the best practices for rolling back an operation involving different partitions?

Thanks in advance for the help.

CloudStorageAccount storageAccount = CloudStorageAccount.DevelopmentStorageAccount;
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
TableServiceContext serviceContext = tableClient.GetDataServiceContext();

serviceContext.AddObject("Accounts", account);
serviceContext.AddObject("Accounts", accountUser);
serviceContext.AddObject("PublicUsers", pubUser);
serviceContext.SaveChangesWithRetries();

All replies (6)

Tuesday, January 31, 2012 6:00 PM âś…Answered

Hi - sorry I missed this over the weekend.

Yes, that sounds like a good approach.  The one thing I want to be explicit about is that queries using AccountId as partition key will be efficient so long as they are executed against the same table.  In the above example, doing a query for all entities in the "Accounts" table where PartitionKey = "ABC" will be an efficient way of getting all Account and AccountUser entities from that table.  If you had another table, say, "Purchases" or something like that, and used AccountId as a partition key there as well, then it would take a second query to get all entities from the "Purchases" table where PartitionKey = "ABC".  This query would also be efficient, but is a second query.  On the other hand, if you stored "Purchase" entities in the Accounts table, then you could still execute one query against the Accounts table and get the Account, AccountUser, and Purchase entities all from that one single query, and it would still be efficient, since you're using Partition Key.

Make sense?

-Jeff


Friday, January 27, 2012 7:28 AM

You have to use two requests. The first request inserts the first two entities, and the second inserts the last entity. You have to make sure during each batch request, all entites have the same partition key.


Friday, January 27, 2012 1:50 PM

Thanks for the quick reply.  So would it look something like this where operations on different partitions use their own SaveChangesWithRetires call or would I have to build a separate TableServiceContext for each operation?

CloudStorageAccount storageAccount = CloudStorageAccount.DevelopmentStorageAccount;
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
TableServiceContext serviceContext = tableClient.GetDataServiceContext();

var account = new Account();
account.PartitionKey = "ABC";

var accountUser = new AccountUser();
accountUser.PartitionKey = "ABC";
 
serviceContext.AddObject("Accounts", account);
serviceContext.AddObject("Accounts", accountUser);

serviceContext.SaveChangesWithRetries();

var pubUser = new PublicUser();
pubUser.PartitionKey = "XYZ";

serviceContext.AddObject("PublicUsers", pubUser);

serviceContext.SaveChangesWithRetries();


Friday, January 27, 2012 7:27 PM

Hi - thanks for the question!

In order to save all changes as a batch operation (entity group transaction), the entities must all be in the same partition of the same table.  So even if they had the same partition, but were in different tables, you still could not use entity group transactions.  So your code is correct, two operations are needed.  However, for the first operation, if you want to use a batch operation, you should pass in SaveChangesOptions.Batch, otherwise it will actually save each of the entities as their own transaction anyways.  So the code looks like this:

serviceContext.SaveChangesWithRetries(SaveChangesOptions.Batch);

Doing that will allow the operations against the "Accounts" table to occur in a single request, and guarantee that they will either both succeed or both fail.  Note that because you can't guarantee that your second SaveChangesWithRetries will work if the first one did (network errors, etc), so your application will need to be able to handle the case where only the first call succeeded.

Another option: If both entity types (pubUser and accountUser) will have the same partition, then one thing that table users do in order to be able to use batch operations is store both entity types in the same table.  Azure Table entities in one table can have different schemas.  So, one example we have is where someone wants to store "customers" and "purchases".  If they store these two data types in the same table, and use the same partition key for a customer and that customer's purchases, then it becomes possible to perform batch operations that include both types of entities, including being able to use a single request to get the customer and their purchase data.  If they are in two tables, then you have to make two operations, one for the customer, one for their purchases.  In this way, you can add a little bit of a relational behavior your application, and ensure that operations across different entity types are transactional in nature.

Hope that helps, let me know if you have additional questions!

-Jeff


Friday, January 27, 2012 9:19 PM

Thanks for the insight. 

Let me ask you this to see if I'm on the right track.  My goal is to have an account entity and then be able to have multiple users (PublicUser) assigned to that account. Each account entity will have a unique partition key and will live in the Accounts table. 

However, when a user logs on they won't know the partition the account is in so I won't be able to take advantage of query optimization by using the account's partition key in the query.  So my thinking was to create a PublicUsers table where each PublicUser entity has a unique partition key (e-mail address in this case) and a property (AccountId) that references the partition key of the Account entity in the Accounts table. 

The AccountUser entity is simply a de-normalization if you will so that a user can see who else is linked to the account by being able to query in the same partition as the main account entity without havign to query the PublicUsers table. 

So a user would login against PublicUsers using their email address which is also the partition key so that query should be fast.  Once they are authenticated they can use the AccountId stored in the PublicUser entity to gain access to the Account entity in the Accounts table (AccountId = Partition Key of the Account entity).  Once they have the AccountId all other operations will take place within that partition and presumably be very efficient.

Does that make any kind of sense?

Thanks again for your replies.


Thursday, February 2, 2012 7:09 PM

Jeff and Stoolio, I've gone ahead and split the thread for the new question. The new thread can be found here.