Tuesday, 18 November 2008

Expand a virtual PC VHD file and extend the partition

Ran into a few problems today, for one client who uses a VPN that doesn't work at all well with Vista, I have an XP development virtual PC. I needed to install visual studio sp1 (yet again) on it but my boot drive only had 4Gb of free space - the temporary installer needs more than that.

After finding the excellent VHD Resizer I was able to resize by virtual disk by an additional 10Gb, however it doesn't extend the disk into this space and diskpart won't let you issue an extend command on the boot disk.

The solution? Boot off another VHD with the resized disk as a second drive, then extend under this environment before shutting down and setting it back to the newly extended drive.

Full answer here...

http://www.enusbaum.com/blog/2008/02/07/expand-a-virtual-pc-vhd-file-and-extend-the-partition/

Sunday, 16 November 2008

Using attached properties to compose new behaviour

I was inspired by a posting on stackoverflow.com - "How do I drag an image around a canvas in WPF" - where the first response in most peoples heads is to work with mouse up/down/move events, track offset positions and move the UI element around in response - pretty reasonable right?

Seeing as WPF's principles favours composition over inheritance and such like, what if instead we used the power of attached properties to attach new behaviour to a UIElement? Here's how to do it, first the code for the dependency property:

public class DraggableExtender : DependencyObject
{
    // This is the dependency property we're exposing - we'll 
    // access this as DraggableExtender.CanDrag="true"/"false"
    public static readonly DependencyProperty CanDragProperty =
        DependencyProperty.RegisterAttached("CanDrag",
        typeof(bool),
        typeof(DraggableExtender),
        new UIPropertyMetadata(false, OnChangeCanDragProperty));

    // The expected static setter
    public static void SetCanDrag(UIElement element, bool o)
    {
        element.SetValue(CanDragProperty, o);
    }

    // the expected static getter
    public static bool GetCanDrag(UIElement element)
    {
        return (bool) element.GetValue(CanDragProperty);
    }

    // This is triggered when the CanDrag property is set. We'll
    // simply check the element is a UI element and that it is
    // within a canvas. If it is, we'll hook into the mouse events
    private static void OnChangeCanDragProperty(DependencyObject d, 
        DependencyPropertyChangedEventArgs e)
    {
        UIElement element = d as UIElement;
        if (element == null) return;

        if (e.NewValue != e.OldValue)
        {
            if ((bool)e.NewValue)
            {
                element.PreviewMouseDown += element_PreviewMouseDown;
                element.PreviewMouseUp += element_PreviewMouseUp;
                element.PreviewMouseMove += element_PreviewMouseMove;
            }
            else
            {
                element.PreviewMouseDown -= element_PreviewMouseDown;
                element.PreviewMouseUp -= element_PreviewMouseUp;
                element.PreviewMouseMove -= element_PreviewMouseMove;
            }
        }
    }

    // Determine if we're presently dragging
    private static bool _isDragging = false;
    // The offset from the top, left of the item being dragged 
    // versus the original mouse down
    private static Point _offset;

    // This is triggered when the mouse button is pressed 
    // on the element being hooked
    static void element_PreviewMouseDown(object sender, MouseButtonEventArgs e)
    {
        // Ensure it's a framework element as we'll need to 
        // get access to the visual tree
        FrameworkElement element = sender as FrameworkElement;
        if (element == null) return;

        // start dragging and get the offset of the mouse 
        // relative to the element
        _isDragging = true;
        _offset = e.GetPosition(element);
    }

    // This is triggered when the mouse is moved over the element
    private static void element_PreviewMouseMove(object sender, 
        MouseEventArgs e)
    {
        // If we're not dragging, don't bother
        if (!_isDragging) return;

        FrameworkElement element = sender as FrameworkElement;
        if (element == null) return;

        Canvas canvas = element.Parent as Canvas;
        if( canvas == null ) return;
        
        // Get the position of the mouse relative to the canvas
        Point mousePoint = e.GetPosition(canvas);

        // Offset the mouse position by the original offset position
        mousePoint.Offset(-_offset.X, -_offset.Y);

        // Move the element on the canvas
        element.SetValue(Canvas.LeftProperty, mousePoint.X);
        element.SetValue(Canvas.TopProperty, mousePoint.Y);
    }

    // this is triggered when the mouse is released
    private static void element_PreviewMouseUp(object sender, 
        MouseButtonEventArgs e)
    {
        _isDragging = false;
    }
}

As you can see, we hook into the events exposed from the target element whenever we detect the property being changed. This allows us to inject any logic we like!

To use the behaviour, we include the namespace in XAML:

<Window x:Class="WPFFunWithDragging.Window1"
        xmlns:local="clr-namespace:WPFFunWithDragging"

And then just attach the behaviour to the elements we want to be able to drag like so;

<Canvas>
       <Image Source="Garden.jpg" 
              Width="50" 
              Canvas.Left="10" Canvas.Top="10" 
              local:DraggableExtender.CanDrag="true"/>
   </Canvas>

Cool huh? Sample code attached....

The Enterprise Stack

What makes good software? Separation of concerns has got to be up there in a big way, ensuring you have relevant tiers to your application that deal with a particular set of concerns, but what does an enterprise software stack look like? I've been asked this question a number of times, so thought I would put up one of the ways in which I write software for SOA environments.

To avoid being short down in flames, let me be clear, I'm not advocating any particular dogmatic approach here, just presenting one way that's worked for me in several situations in service oriented environments.

So, here's the simplified picture of the stack:

image

Database
At the very top of the stack we have the database or persistence engine - the place where our application is going to store it's data. This doesn't have to be a SQL database, but is more a concept of a place to store information. To this end this box could be satisfied by XML, an object database, text files, other external services and so on - indeed there may even be multiple boxes.

The Data Abstraction Layer
This layer is responsible for hooking repositories up to persistence. It should expose an engine flexible enough to work with a variety of types of physical repositories in a natural manner. Generally you won't write your own data abstraction layer, but will instead re-use one of many different technologies already available such as NHibernate, LINQ to Entities, LINQ to SQL and so on.

Repositories
The repositories are responsible for fulfilling requests to obtain and modify data. This allows a further level of abstraction that describes the purpose of the code rather than the implementation. IE: A service will ask a repository to "SelectAllCustomers" rather than directly execute some LINQ query.

Repositories deal in one thing and one thing only - Domain entities. Their inputs and outputs are usually one or more entity objects from the domain (see below). For example, suppose you were writing a pet shop application, you may have a repository for dealing with customers as follows;

image

As you can see we have methods for retrieving customers and for updating them. This makes working with customers extremely clear and self describing. The first method - SelectAll() will simply return all of the customers in the system (as customer Domain objects). SelectQuery will allow the description of how to get data, sort it and present it.... eg, using the CustomerQuery, one might be able to specify the sort order and direction, filters on fields, along with which rows to return for pagination.

Each type of conceptual data would have it's own repository in the stack.

(One alternative to using repositories is the active record pattern where entities expose methods and functions similar to those exposed from repositories)

Domain Entities
The domain model (in this instance) is a representation of the various entities that make up the problem you are describing along with their relationships and any operational logic (business rules).

The following is an example of a simple domain model, working again with the fictitious pet shop example.

image

In the diagram above we can see we have a customer entity, which has attributes describing the customer. It also contains a collection of Order entities representing the orders that this customer has made. Each order must have a customer, but a customer can have 0 or more orders.

Where an order exists, this will have attributes of it's own to describe the order, along with a collection of order lines (0 or more). Each order must have one and only one customer.

Moving down the graph, the order line will know which order it belongs to and also reference a product that the line of the order corresponds to. Each order line can only be within one order and it must also reference a product.

As you can see, the domain model is just the object graph of the entities it represents. There may of course be more meat on the bones of your real life domain model, including operations on entities to implement business rules and such like.

So, how does the repository return all of this information - where we simply invoke SelectById(10) to get the customer entity with an ID of 10? Well, the answer is actually in the DAL and repository layer.

NHibernate and other OR/M technologies allow for something called Lazy loading - where the initial query loads the Customer object, but then wraps it's properties (called proxies) so that when they are invoked, it actually automatically goes back to the database to get the entities required.

A second alternative is to describe how deep you want the graph to load in the repository implementation (or even make this a factor of the query parameter to allow control further down the stack). Again, most OR/M's allow you to specify what you want to load from the graph - such as specifying to load the orders and order lines for a customer at the same time as it gets the customer. This usually offers some performance gains too as only one query is executed. (And in fact is your only option if you are using LINQ to entities as of V1, which doesn't support lazy loading).

Regardless, your DAL or your repositories should be able to hydrate an object graph based on a request from further down the stack, and should also ensure that when you do get an object from the database, it only get's one instance of it!! (IBATIS for example doesn't do this by default and you must implement your own Identity Mapper pattern that is used by repositories).

Service implementations
These are the actual end-points of your service - the ASMX you call, or in the case of WCF (my preference), the endpoints you've defined and implemented through service contracts.

The service is responsible for taking an in-bound request object, working out what to do next, then invoke the appropriate repository methods to get or affect data before then assembling a response back to the caller.

In other words, the service is invoked using a data contract in the form of a data transfer object (see below), it then, if necessary hydrates a domain object graph ready for use by repositories before invoking them and getting back domain objects. When it does, it uses assemblers (see below) to convert the full data representation from the domain into a structure of data that the client application is actually interested in (DTOs).

Data Transfer Objects - DTOs
The purpose of DTOs is to represent the data needed to complete an activity in it's most minimal form for transmission over the wire. For example, where a client application is interested in customers, but is only interested in the customer's name and ID, but not the other 20 fields, your DTO would only represent customers as name and ID. They are lightweight representations.

Communication between client applications and the service tier is done only through the use of DTO objects.

Interestingly DTO type objects may be present in your domain (but not called DTOs). You may have several different representations of customer for instance in order to optimise how much information flows across the network between the domain model and the database. The trade off is how simple you want to keep your domain versus how much control you want over database performance.

Assemblers
The assemblers are used by the service implementations to map between the conventions of DTO and Domain. For example, if you have a DTO contract for updating a customer, an assembler would take this DTO and map it to a valid customer domain object. This would then be passed to the repositories for serialisation.

Service Proxies
Hopefully this requires very little explanation, the service proxy is a client side implementation of the service implementation that maps to the communications channel and invokes the actual code on the service tier. Your client application works exclusively with the service proxies and the DTO's that it expects and returns.

Conclusion
This is just one way to build scalable N-tiered enterprise applications. Several of the concepts presented here are interchangeable with other methods - such as active record instead of domain + repository. Speaking from personal experience I have seen the above work very well on large scale implementations.

It is also worth mentioning some supporting concepts. To truly realise benefit from the above implementation, one would need to use interface driven development to allow any piece of the stack to be mocked and unit tested effectively. In addition, dependency injection can make your life easier as the scale of the system grows, automatically resolving dependencies between various objects between the tiers.

In the future I will post a simple example application with source code that uses all of the above techniques to demonstrate the implementation specifics. I hope this was worth writing and someone finds it useful.

Friday, 14 November 2008

Microsoft Tech-Ed is over, I'm going back to work for a REST

It's a shame that it's over for another year, and its been a fantastic, if somewhat tiring, week that I've thoroughly enjoyed. They cram in as much information as they can in the 5 days of the conference and provide you with everything you need to stay comfortable during the duration.

On the social front, things were also good, the country drinks event was great on Wednesday night, and Barcelona overall was a cool place to hang out and chat about the day.

The inhabitants of Barcelona must think there's a geek invasion or something though as every third table in the restaurants was debating Azure, L2E, federation in the cloud or whatever - but if that didn't convince them, then watching the games at the country drinks evening would have.

One of the games during the night out was a form of bowling using a tennis ball. After the person had bowled, the ball was thrown back to the bowler. You could tell we were at a geek conference though as not one person managed to catch the ball!!!

If anyone is thinking about Tech-Ed 2009 (in Berlin next year), I wouldn't hesitate to recommend it

image

Day 5 - 13:30 : An introduction to Oslo (mostly M)

This session, run by Jon Flanders focused on the new M modelling language - specifically MSchema and MGrammar. To be honest the entire session concentrated on using MSchema to generate T-SQL to create a database which I didn't feel was a great example and certainly wasn't real world.

We already have a perfectly good textual DSL for building databases - it's called T-SQL, and just like the DSL, it can be split across multiple files, can generate data, and can be stored in source control for versioning.

I really like Jon Flanders, he's a great guy, and he did a much better job of presenting M that I ever could, but I honestly didn't enjoy this session one bit.

Day 5 - 10:45 : Data access smackdown! Making sense of Microsoft's new data access strategy (DAT02-IS)

Stephen Forte, Chief Strategy Officer from Telerik, started this interactive session reviewing the history of Microsoft's data access offerings and then discussing the latest multitude of choices. As this was an interactive session, there were lots of questions, comments and debates going on throughout.

A quick history - Microsoft gave us ODBC, then DAO with JET was sat on top of this, RDO wrapped DAO and ODBC, and then there was ODBCDirect. I can remember working with all of these technologies, so I must be getting on a bit now! (not really, I'm 35, technology just changes quickly). Anyway, from this they gave us ADO and with the release of .NET, we got ADO.NET.

Today we have a bunch of options available and moving forward into the future, we're going to have even more choice. Sat atop of ADO.NET we have the conceptual model LINQ technologies (LINQ to SQL, XML, Entities, REST) along with cloud services (Azure) and SSDS (now SDS). That's without even thinking about 3rd party solutions like NHibernate, SubSonic etc - although these were also discussed.

The debate was on-going concerning which technology should be used in which context, and as ever - there is "No Silver Bullet" - the stock answer to such a question is, and should be, it depends.

Choosing which data access strategy you use should be one that gives the highest return for least complexity - not just the one that seems the most technically pure as this is purely subjective - eg: "objects first" people are going to look for OR/M, "data first" guys are looking for T-SQL and so on....

If this means you want OR/M and are using TDD or you need facilities like lazy loading (and you don't always!), then avoid the entity framework - which doesn't cope well with either. In that scenario, perhaps you'd stick to NHibernate. On the other hand if you want highly optimised database queries and you want full control over them for whatever reason, use plain old ADO.NET. Maybe if you're writing a system that doesn't need lazy loading or you're not using TDD, then perhaps the entity framework is a great way to very quickly get started using entities with very little code.

I get frustrated by dogma and elitists saying there's only one way to do things - the way they do it! We all have our preferred way of working, and I'm the first to promote various strategies and techniques as being good options, and sure, I'm occasionally guilty of being dogmatic, but dogma hurts objectivity - we should always consider the project in question and what is best for it rather than fly flags and banners.

Stephen was questioned about, and acknowledged the vote of no-confidence and it's validity - and he made a very, very good point - this is V1 of the framework. The guys involved in EF are aware of it's limitations and are working to resolve them, but are heading in the right direction at least. Personally, I feel I came out of this session a little less dogmatic and a little more objective.

Day 5 - 15:15 : The TechEd Daily Scrum

Presented by Stephen Forte again (second time inside of a single day!), this final session at Tech-Ed was a review of what SCRUM is, how it's working for people and then it turned over to mass debate.

Again, we find religious arguments coming to a head in this scenario between the agile purists and the more pragmatic fixed price bid guys.

Personally, I'm somewhere in between, where, given the opportunity to do a time + materials project using SCRUM, I would prefer to, but often we must be realistic and offer fixed price contracts to clients, even though doing so is difficult because up front design is always wrong and this is why SCRUM rejects this outright.

(Saying this though, even in fixed price bid contracts, one can still apply and benefit from the principles of SCRUM (backlogs, sprints, daily scrums, business owner involvement etc). Doing so will just expose problems earlier, which is a positive thing.)

Big design up front wise - the minute a specification is published it starts to go out of date and becomes incorrect. Never have I (nor had anyone else in the room) seen a specification written before a project and been able to go back and compare the delivery with the original specification to find an exact match.

<Ranting>

These differences, errors and omissions add time to the project, but in relation to total project time, not just in relation to the slippage felt by the initial mistake. For example, If I have a 4 month project and during month 1, I find slippage of 1 week, does my project length get extended by 1 week only?

No - it gets extended as a factor of the time so far consumed and remaining.

My project was 4 months, and in the first month I found an omission that has cost me 25% of the time I've spent so far, so it's a reasonable assumption that I'll find a similar quantity of omissions in the remaining development, so my overall increase isn't 1 week, it's 1 month - this is a fact that causes many development projects to be seen as failures that have gone drastically over time and over budget.

This principle, which has been proven many times was written by Fred Brookes in an essay back in 1975 and it is as valid today as it was back then. I recommend reading the book: mythical man month for more information on this and many other principles - including the famous Brookes Law - adding more manpower to a late project makes it later.

</Ranting>

Anyway, back to the session - it was highly enjoyable and despite the presence of significant dogma - I think everyone was able to learn something new from each other and from the presenter.

Day 5 - 09:00 : Identity and cloud services (ARC302)

Vittorio Bertocci presented this session on how to work with identity across services and environments using the cloud.

Traditionally in enterprise we have a set of users and a set of resources that they can access. Each time the user accesses a resource, the user is validated against this source and granted or denied access etc. This scenario starts to become more complex when you wish to allow other environments to access your resources or for your users to access resources in other environments.

The solution is to outsource aspects of identity management to the cloud, allowing relationships and credentials to be managed across technologies, services and environments. Where two systems, organisations or environments have no trust between them, we can use a claims transformer or resource security token system (R-STS) in the cloud that is trusted by both.

This provides a natural point of trust brokering with customers and partners along with a natural point of authorisation evaluation and enforcement.

Azure provides this type of service through the .NET services access control service. In this service, every solution gets a dedicated R-STS instance. The application has it's own policy which remains the same, whilst rules are created for how to transform between your various customers or partners tokens (or windows live credentials etc) to your own through the R-STS.

Thursday, 13 November 2008

Day 4 - 17:30 : Building and consuming REST based data services for the web. (WUX313)

In this session, Mike Flasko demonstrated the new ADO.NET data services framework that enables developers to create services that expose data over a REST interface using industry standard formats and semantics such as JSON and AtomPub.

The reason I attended this session was to find out what this was all about because on the face of it, the idea of a data services framework worries me for a variety of reasons, including;

  • Why would I ever expose my entire data model over a REST interface?
  • Surely doing so breaks separation of concerns? My clients should be invoking a service or REST interface that returns targeted and focused results based on a strict request/response model.
  • Isn't the data services framework tied to LINQ to Entities?
  • Opening up an interface direct to my data sounds extremely dangerous and opens it to all manner of abuses!
  • What if I have business logic executed when I manipulate data within my service - using data services this can't be expressed!

Taking each in turn;

Why would I ever expose my entire data model over REST?
Quite simply, data is what drives much of web 2.0 - it drives mash-ups and makes data driven AJAX, flash and Silverlight applications possible.

Normally we may expose this data through a bunch of services (possibly REST) with tightly defined semantics such as GetCustomers, GetCustomer(1) and so on. The data services team suggest however that as the application complexity and size increases, managing these interfaces can become cumbersome and tedious and may be serviced better by interfacing directly to the data via a specific data service.

These data services are exposed as REST, allowing access to your data model by navigable URL. For example, lets say you have a data model of People with Contact telephone numbers, you could access this data with the following HTTP conventions;

To get a list of people you would execute a standard GET verb request against the resource URL which might be something like: http://yourdomain.com/yourservice.svc/data/people

To filter people, you would pass over parameters to the URL, perhaps like this: http://yourdomain.com/yourservice.svc/data/people?$filter=A$field=Name

Whilst to get a specific person with an ID of 12, you might look at http://yourdomain.com/yourservice.svc/data/people(12)

From here you can get a list of that persons contact details: http://yourdomain.com/yourservice.svc/data/people(12)/contacts

To make changes to the data, you simply change the HTTP verb. Eg: to update a record, you would POST it, to insert, you would PUT and to remove you would DELETE (standard rest interface), and the results would be returned back in the format requested in the header (ie: you can set the accept type header value to specify that you want atom or JSON etc) - the outcome of any given action would also have a status response, which again map to the standard HTTP status codes - (eg: people(12) where no person exists with ID 12 would generate a 404 not found status error).

When data is returned it also has relative navigable references to other parts of the data model. Again this is a common REST feature which describes not only the data that is being returned, but also how to navigate it by invoking the service with more specific information.

On the client side, you are able to query against your data service using a LINQ query, offering a powerful and intelligent way to request the data you want from the web service, but this is also one of my concerns.

Isn't this breaking the principle of separation of concerns?
I believe it is because your calling code is tightly coupled to the semantics of your entity model, rather than being coupled only to the DTO that it is expecting to be returned. This might be a problem in some situations, but in others it may be perfectly fine, so there isn't a hard and fast rule here.

Isn't the data services framework tied to LINQ to Entities?
My next concern was the reliance of the data services framework on LINQ to entities, of which I'm not a big fan. I'm led to believe however that this isn't true. Apparently, your data service is able to expose any object model in any way you see fit, but in a limited 75 minute session, this wasn't covered in any detail. I'm assuming though that this will at least require that the context of your data (eg, an NHibernate session) implement the IQueryable interface to allow LINQ queries to run against it. Again, I'll reserve judgement on this until I've seen an implementation of a data service that isn't exposing a LINQ to entities model.

Opening up an interface direct to my data sounds extremely dangerous and opens it to all manner of abuses!
In terms of security however, my concerns were unfounded. One of the things I really liked about data services was the ability to lock down your data with any rule you can code. You use standard authentication and authorization mechanisms (this is just HTTP after all) to govern, with fine granularity, access permissions at entity level, row level and field level - this will please SPROC aficionado's who often argue they don't use OR/M technologies because they lose row level security on their data.

Business rules
Finally, the business rules and logic issue is still perfectly valid. If you need to carry out rules processing during invokation of operations on your model, then a conventional approach would be more suitable as you don't get this opportunity under data services. When a request arrives to manipulate data - it is surfaced straight into the data services framework and executes against your data.

Overall I thought the technology was interesting and could see uses for it in numerous areas. In enterprise applications with distinct layers and business rules however I'm not sure it's that applicable, and I'd also like to see it surface a non-L2E model.

Day 4 - 15:45 : An overview of the Azure services platform (ARC201)

David Chappell is perhaps one of my favourite speakers from the conference. He tackled an objective look at Microsoft's new Azure platform in this session and gave what I thought was the clearest overview of the vision.

Azure is Microsoft's new distributed platform that they define as "Windows in the cloud" and "Infrastructure in the cloud". It's like having a huge data centre available on-tap, without having to invest heavily in capital expenditure, you simply rent the size of cloud you want, and scale up/down as needed.

It offers not only the ability to deploy your applications to a scalable platform but also provides the ability to consume standard Azure service offerings including (amongst several others);

  • .NET Services
  • SQL Data services
  • Live services

The fabric of Azure essentially provides a layer on top of thousands of virtual machines, all running windows server 2008, 64bit. Each virtual machine has a 1:1 relationship with a physical processor core. The fabric layer provides an abstraction on top of this virtualised hardware, so we can't access the virtual machines directly (eg: RDP to them), but we can build .NET applications in certain roles and deploy out to them.

This is an extremely beneficial deployment model for whomever is building the next facebook, myspace or amazon out there - being able to build for the cloud and deploy incrementally to a scalable off-premises infrastructure allows new killer applications to fail fast or scale fast without huge capital outlay.

This of course is not actually new - Amazon's elastic compute cloud, storage solutions, google's app engine, gears etc all offer the same sorts of cloud infrastructure and utility computing. In fact, for existing apps that can't be ported to use Azure functionality like storage, Amazon's EC2 platform provides a good alternative as this allows you to supply virtual machine images into the infrastructure, whereas Azure abstracts this away.

The roles initially available are the web role and worker role. These correlate directly to an IIS hosted web application and standard .NET executable applications respectively, but deployed and hosted through the Azure fabric and the core services that this provides.

These core services include the storage service which allows us to store;

  • Blobs
    • Storing a simple hierarchy of any binary representation of data directly into the Azure storage services.
  • Tables
    • Storing structured data into hierarchical tables (not RDBMS tables).
  • Queues
    • Storing data into queues that allow communication between web and worker role instances.

Data in storage is exposed to applications via a RESTful interface with a query language based on the LINQ C# syntax. Data can be accessed by Applications or from other on-premises or cloud based applications.

In addition to the core Azure components of the web / worker roles and storage, it also offers a suite of pre-built ancillary services that can be consumed as mentioned above - presumably we'll pay extra for these, but the only comment Microsoft would make on pricing during the conference was that they will be competitive.

Azure .NET services
This provides;

  • Access control services
  • Service bus
  • Workflow

Access control services
Different organisations identify users with tokens that contain different claims. Each organisation might have different semantics and applications, especially across organisations, can be presented with a confusing mess.

Azure's solution to this problem is to have an access control service that implements a security token service (STS) in the cloud. It accepts an incoming token (of various types if necessary) and issues out another one that may or may not differ from the incoming one. This allows for scenarios of single sign on across enterprises and organisations etc, from multiple sign on sources.

Service Bus
Exposing internal applications out onto the internet isn't easy. Network address translation and firewalls get in the way of efficiently and securely exposing your on-premises application to other users.

In Azure, the service bus provides a cloud-based intermediary between clients and internal applications. Organisation X may expose application A by connecting it to the service bus - it initiates the connection, over standard ports, and keeps the connection open, thereby ensuring NAT doesn't interfere. Meanwhile, Organisation Y opens their connection to the service bus and connects to these intermediary end-points that map back to the actual organisation's end points.

Of course all of this is controlled and secured with integration to the access control service.

Workflow
Where should workflow logic that co-ordinates cross organisation composite applications be run?

Azure helps to make this clear by having the workflow service run WF based workflows in the cloud. There are some limits on what activities can be used (eg: no code activities), but having the workflow run in the cloud overcomes the problem it is meant to address.

SQL Data Services
Formerly known as SSDS - this is now just SDS - is built on SQL server and provides a mechanism for working with a fixed hierarchy of data tied to a data sync service based on the sync framework. Whilst this sounds much like the storage service, the ultimate aim of SDS is to provide more database relevant facilities like reporting, analysis, ETL and more.

It's important to distinguish that SDS is built on SQL server, but it is not SQL Server. It's a hierarchy storage solution. Each data centre has authorities and each of these has data containers, which in turn consist of entities and entities have properties. Each property has a name, a type and a value.

This may seem a little limiting, but one key advantage is never having to worry about managing SQL servers or SDS itself. You just see data, not the database - all scalability and management etc is taken care of, on behalf of your application by the Azure platform.

SDS data is accessed via SOAP and REST (including ADO.NET Data services) and, just like with the storage service, it provides a query language based on the LINQ syntax with queries returning entities. Unlike storage services however, SDS supports database type functionality with order by and join operations.

Live services
Live services and the live framework offers something called a live operating environment (LOE) that allows for;

  • Accessing live services data, and personal data
    • Contacts
    • Hotmail
    • Calendar
    • Search
    • Maps
  • Creating a mesh of devices
    • My vista desktop
    • My MAC OSX desktop
    • My windows mobile device
    • All running the live operating environment, with data synchronised across all devices and into the cloud.
  • Creating mesh enabled web applications that can run through your browser, on your desktop or on any device within your live mesh.

Applications that are built to run within the mesh, on all of the LOE devices - create and consume both cloud and local data by using the live framework to access live services data via it's exposed REST interface (data is presented in the AtomPub format).

Applications are rich internet applications (RIAs) built using silverlight, javascript, flash etc and run from the desktop or from the cloud on the devices within the mesh. A user can add devices into their mesh and can then select to install mesh applications (from a catalog) to the LOE in the cloud that are then also presented via the LOE on the mesh devices.

Conclusion
Microsoft are investing heavily in Azure and there's no doubt that cloud computing and more specifically utility computing will be a significant factor in the future and the Azure platform overall offers some exciting opportunities to exploit and monetise ideas with much lower commitment and risk.

Day 4 - 13:30 : Building WCF services with WF in .NET 4.0 (SOA302)

Jon Flanders from plural sight took this session looking at how you can expose WCF services from windows workflow on the .NET 4.0 stack, continuing from yesterdays first looks session.

A cool feature being introduced is the ability to generate xaml only workflows in the form of .XAMLX files. These, when hosted in IIS or another host, allow an entire WCF endpoint along with all of the WF logic to be defined in a single xaml file. This file can be hand cranked or you can utilise the new WPF enabled WF designer surface.

Guidance suggests benefits can be achieved using WF to define WCF services if the service;

  • Calls a database, calls another service, or uses the file system.
  • Coordinates parallel work
  • Enforces ordering between messages
  • Coordinates messages with application state
  • Is long running
  • Requires rich tracking information

Wednesday, 12 November 2008

Day 3 - 17:30 : Develop with the visual studio 2008 extensions for sharepoint

I went to this presentation to learn about the new visual studio extensions for sharepoint, unfortunately the presentation was an exact repeat of a PDC video I had already watched.

The sharepoint extensions for visual studio 2008 make building sharepoint applications much easier than before and allow simple packaging of sharepoint solutions for deployment.

For more information, (and in fact to watch the exact same presentation as I've just seen again) see the following PDC video: http://channel9.msdn.com/pdc2008/BB13/

Day 3 - 13:30 : An in-depth look at the ADO.NET Entity framework (DAT307)

This presentation, with Elisa Flasko, was covering an introduction to the entity framework and how it fits with common application patterns - client server, web applications and distributed N-Tier apps etc.

The ADO.NET Entity framework is the next layer up in the ADO.NET technology stack that allows us to describe data using a conceptual model mapped to the database using a declarative mapping and queries using LINQ to entities.

The demonstration started with replacing the data access layer of a client server application by importing an existing database structure. This pulled in the table structures and stored procedures of an existing database to an entity model, automatically resolving 1:n, 1:1 and n:n relationships to properties and collections etc.

The result of this was a graphical view of the entity model. This is where my first concern is with the entity framework. Visual tools simply do not scale in large scale applications - imagine having hundreds or thousands of entities - a graphical tool simply doesn't cut it. Once the number of entities increases to any significant number the visual model becomes cumbersome.

That aside, the demonstration then moved to consuming data. Rather than programming against the physical data model, you write code against the conceptual entity model. If you're at all familiar with OR/M solutions such as NHibernate or SubSonic, none of these concepts will be alien to you.

One nice facility was the seamless integration between EF and LINQ, allowing elegant, strongly typed and intellisense enabled queries to be defined. (This isn't unique to EF however, there is LINQ for NHibernate for example)

As you would hope, when queries execute, EF has a built in identity mapper pattern to ensure that only one instance of the same entity exists within the same context at any one time - this is an incredibly important feature to ensure you're always working with the correct and same unique object.

The presentation moved on to using EF in a web application and using object data sources within ASP.NET, with two way binding etc. As far as EF goes, the usage wasn't particularly different from using it in the client server environment.

Moving to N-Tier applications, as you might expect, the service layers utilise EF but then expose out from this data transfer objects (DTOs), although the speaker did mention that passing entities directly across the wire is possible. Personally I would always go for the DTO pattern as this represents best practice to send out only the data your clients need to consume.

Next, we looked at using ADO.NET data services (project Astoria) to expose an entity model directly over HTTP. This was reasonably cool, allowing the client to still query against the entity model using LINQ, but have this then executed remotely via your data service. Whilst this is a cool feature, it moves the responsibility for query to the wrong tier - the client should ask the service for data, the service should get it and return it appropriately - having the client issue any query it likes via the service breaks this separation, but is a cool feature non-the-less.

Whilst the EF is a very positive step forward in the OR/M space for Microsoft, the fact is that there are more mature, open and proven technologies already in existence (such as NHibernate), and that several other persons in the field have issued a vote of no-confidence for the framework which must all be considered before proceeding with an EF implementation.

Tech-Ed - Free TShirt of the week.

Just got a free t-shirt from the guys that produce dotfuscator. On the front the slogan is;

"I can see you, I'm instrumenting"

and on the rear;

"You can't see me, I'm obfuscating"

Definite geek t-shirt, but I find this hilarious.

Day 3 - 10:45 : Windows Workflow Foundation 4.0: A first look (SOA207)

Presented by Aaron Skonnard, co-founder Pluralsight, this session looked at the up-coming release of windows workflow foundation 4.0.

Firstly, my definition of workflow is slightly different to how MS define workflow. I think of workflow as the definition of a process - usually some form of state machine type process which involves activities consisting of system and human interaction. MS define workflow the same in most ways, but consider workflow to additionally be a new way to write software overall.

In this guise, WF helps to move traditional sequential processing (get input, store state, with control flow logic) into a loose coupled distributed world (SOA). However, the current implementation of WF has challenges for adoption including;

  • Limited support for XAML only workflows.
  • Versioning is problematic
  • Limited base activity library
  • Writing custom activities and managing data flow is not easy enough today
  • Limited WCF integration and activity support
  • No generic server host environment

WF 4.0 aims to address these issues in .NET 4.0 by introducing:

  • XAML only workflows are the new default, with a unified model between WF, WCF and WPF. (Set breakpoints in XAML, designer writes XAML etc)
  • Extended base activity library
    • Flow Control
      • Flowchart
      • ForEach
      • DoWhile
      • Break
    • WCF
      • SendMessage
      • ReceiveMessage
      • ClientOperation
      • ServiceOperation
      • CorrelationScope
      • InitializeCorrelation
    • Others
      • Assign
      • MethodInvoke
      • Persist
      • Interop
      • PowerShellCommand
    • + MS are planning to ship additional activities via codeplex.
  • Simplified WF programming model
  • Support for arguments, variables, expressions
  • Major improvements to WCF integration
  • Runtime and designer improvements (designer is WPF based)
  • Hosting and management via "Dublin"

Day 3 - 09:00 : Writing custom LINQ providers - LINQ to Anything (TLA317)

This session was presented by Bart J.F. De Smet, software development engineer from Microsoft, concentrating on the theory for writing your own data providers for LINQ.

He ran through the implementation of the open source implementation of LINQ to AD, allowing LINQ to query active directory using the LDAP language. The key principles revolved around implementing IQueryable, IQueryProvider and how expression trees are parsed and how lamda expressions are used in the context of Func<> and Expression<Func<>>.

Assigning lambdas as Func<> compile to IL code, whilst using Expression<Func<>> treats code as data, allowing it to be parsed through expression trees (with other LINQ query information) for transformation to another, external, language.

The session was supposed to be a 300 level session but, as the presenter pointed out, it was actually a 400+ session, covering very technical, in-depth topics at lightning speed. I leave further investigation as an exercise for the reader (and for me!), looking at the LINQ to AD or LINQ to sharepoint code for good examples of implementing custom providers.

Tuesday, 11 November 2008

Day 2 - 15:15 : How LINQ works: A deep dive into the C# implementation of LINQ.

Luke Hoban presented this 400 level session. Luke is the program manager for visual studio languages.

Having not really used LINQ in anger, I was interested in following this session to understand a little more about LINQ and how it works. I got lost at about the time he started talking about expression trees for D-LINQ - which takes the query and rather than processing it, instead builds an SQL query, but I managed to get the basic principles down. If any of the following is wrong, then that's purely my fault, not the presenters.

LINQ to Objects - translation from query to code
Despite appearances, the following code doesn't actually get represented directly by IL when it is compiled. Instead it is first translated into invocations of methods, and in fact you could of written the same query in the way in which it is translated;

var query = from c in GetSomeList()
    where c.City == "London"
    select c;

The translated form:

bool IsCustomerFromLondon(SomeType obj)
{
    if( obj.City == "London" ) return true;
    return false;
}
...
IEnumerable<SomeType> query = GetSomeList().Where(IsCustomerFromLondon);

In fact, if we were doing it this way, we could make life easier for ourselves by using a Lambda expression:

IEnumerable<SomeType> query = GetSomeList().Where( c => c.City == "London" );

The C# compiler does this exact same thing - it actually transforms to invoking the above methods etc and this is what is turned into IL and built into your assemblies. The main difference being the compiler optimises the delegate construction (which is expensive) by caching it - so when it's used again, it just re-uses it from the cache. In the case of the Lambda expression, this is generated into code as a method that resembles our IsCustomerFromLondon method, but given a munged name and marked with the [CompilerGenerated] attribute.

The GetSomeList() method above returns an IEnumerable object. LINQ provides an extension method to IEnumerable - the where method, which takes a Func<type, bool> delegate expression for comparison. This is implemented as follows;

public static IEnumerable<T> Where<T>(this IEnumerable<T> source, Func<T, bool> filter)
{
    foreach( item in source )
        if( filter(item))
            yield return item;
}
 
Deferred Execution
Be aware though that this where method isn't invoked immediately, rather it is invoked on demand. Using our original query, if we then enumerated through the results, as;
 
foreach( SomeType item in query.List() )
{
    // do something to item
}

It would invoke the where method at this point, as soon as it finds a match, it would then yield back to the caller until it is asked to get the next item, where it would pick up where it left off. This is all done under the covers, but you can see it in operation by implementing your own Where extension method on IEnumerable and debug stepping through the code.

Expression trees
Luke then went on to describe expression trees where LINQ operates on the IQueryable interface instead of IEnumerable. The result of this is that the where extension methods and such like build expressions trees which contain a description of the code that invoked it rather than references to the delegate. This can then be used in the data provider to build SQL queries or such like. To be honest, I got lost at this point of the presentation as I've had only limited exposure to LINQ and haven't ever looked at the inner workings before. I'm hoping to find out more over the coming days of Tech Ed.

Day 2 - 13:30 : Sense and Testability session (ARC307)

Roy Osherove from Typemock presented this informative session about testability and design and how they relate to each other - specifically how we might design software to make it inherently testable. To be honest there was nothing particularly new about the concepts presented here, but it was nice to have these principles confirmed and reinforced.

What makes a unit testable system?
A unit-testable system is a system where for each piece of coded logic in the system, a unit test can be written easily enough to verify that it works as expected whilst keeping the PC/COF rules, which are;

Partial runs are always possible - you can run all or 1 test, they are not dependent upon each other.

Configuration is not needed - or at least, isolate tests that need configuration from tests that don't so that it is clear.

Consistent pass/fail result. Ensure your test can be trusted - it produces the same result time and again until the code is changed.

Order does not matter - running the tests in different orders won't change the results.

Fast - unit tests should execute quickly so that they remain useful.

A problem posed
Given the following method, how could you write a unit test to check both a positive and negative outcome?

public bool IsRetired(int age)
{
    RulesEngine engine = new RulesEngine();
    if( age >= engine.RetirementAge )
        return true;

    return false;
}

Quite simply you can't. The tight coupling between the IsRetired method and the rules engine prevents us from testing a positive and negative outcome without having knowledge of the values that will be returned from the rules engine itself.

Interface based design decision
So, we change our design to support testability - using interface based design we can de-couple and replace the testable parts of the code as follows;

public interface IRulesEngine
{
    int RetirementAge;
}

public class MyClass
{
   private IRulesEngine _engine;

   public IRulesEngine Engine
   {
        get
        {
            if( _engine == null )
                _engine = new RulesEngine();
            return _engine;
        }
        set
        {
            _engine = value;
        }
   }   
 
   public bool IsRetired(int age)
   {
       if( age > Engine.RetirementAge )
        return true;

        return false; 
   }
}

We can now have a test specific version of IRulesEngine and have this present predictable results to our class and therefore test both positive and negative responses. By having dependencies on interfaces rather than concrete classes, it becomes easy to de-couple the parts and replace functionality with test specific instances.

This in itself leads to problems however. The over-simplistic example above is itself quite cumbersome, so imagine how things would be if the class depended on 5 interfaces and each of those depended on 5 interfaces and so on. Resolving these dependencies could become prohibitive very quickly!

For this reason we use dependency injection.

Having an application centric register of concrete object types referenced against interface types, we can easily implement a locator or factory class that, given an interface being requested, can quickly create an instance of the concrete class. In turn, it would examine the dependencies of the concrete class and resolve these from the registry allowing for complex dependencies to be resolved just by asking the locator/factory for an instance of an interface.

This of course is classic dependency injection. He recommended the use of constructor dependency injection, where dependencies are passed as interface references on the constructor of an object, for non-optional dependencies, and using property setter injection where optional dependencies exist. Eg: I may or may not have a logger depending on my scenario.

Inversion of control (IoC) containers such as spring.net, castle windsor, structure map and the Microsoft unity application block provide these dependency injection facilities and more in an easy to use form out of the box, along with support for defining objects as having different life cycles (singleton for example).

To summarise, design decisions made to support testing so far in the presentation included the use of interfaces to be able to replace objects and the use of a locator/container to help resolve dependencies automatically.

Next, he spoke about testing static methods, properties and other things that haven't been implemented specifically to be testable. Eg, how do you test the paths in a method that uses DateTime.Now?

public bool IsRetired(DateTime dateOfBirth)
{
    // This is just pseudo code....
    if( (DateTime.Now - dateOfBirth) > 60 years )
        return true;
    
    return false;
}

You could solve it with an interface, say IClock, and have everywhere that needs to know the time use the ioc container to get an instance of the clock. This would clutter your code in many places though with lots of constructor parameters in lots of classes that need the IClock.

Instead, it would be easier design the system to wrap the DateTime object with a new static class that is able to return either the value of DateTime.Now, or the value of a nullable DateTime property within it.

// note: not thread safe....
public static class SystemClock
{
    private static DateTime? _forcedDateTime;

    public static void ForceDateTime( DateTime newDateTime )
    {
        _forcedDateTime = newDateTime;
    }

    public static DateTime Now
    {
        get
        {
            if( _forcedDateTime.HasValue ) return _forcedDateTime.Value;
            return DateTime.Now;
        }
    }
}

This approach just means you now have to enforce the policy to ensure developers always use SystemClock.Now instead of DateTime.Now.

This abstraction allows us to control fake results for the purposes of testing, but sacrifices encapsulation - normally you wouldn't have the ability to force the date/time in a design not specifically intent on providing testability.

General design for testability principles
The following were some general principles that Roy advocated;

Avoid big design up front. This doesn't mean avoid all design up front - but don't be too prescriptive. Design the purpose of a component and list the tests that must be satisfied.

Use interface based designs.

Avoid singletons, let a container specify transient/singleton life cycles of objects it creates. Where singletons are needed, use a static wrapper that creates a singleton instance of another class, but allow the wrapped class to still be constructed.

Use IOC containers to resolve dependencies and specify life cycle.

Avoid GOD methods - huge do-it-all methods. These prevent maintenance, are impossible to test. Avoid GOD methods by design - keep the single responsibility principle with calls to small methods.

Have methods virtual by default.

Don't use X = new X(), instead us Factory.MakeX() or Container.Resolve<IX>();

Ensure a single responsibility for classes and methods.

Overall, follow S.O.L.I.D. principles (For more information, see http://butunclebob.com/Articles.UncleBob.PrinciplesOfOod)

SRP - single responsibility principle

OCP - open closed principle - ability to extend without modifying

LSP - Liskov substitution principle - derived classed must be substitutable for their base classes

ISP - The interface segregation principle - make fine grained interfaces that are client specific

DIP - Dependency inversion principle - depend on abstractions not on concretions.

To close out, Roy finished with a catchy song.....about bad design. Obviously not that catchy though as I can't remember any of it.

Day 2 - 10:45 : How IT will change in next 10 years and why you should care (ARC205).

Miha Kralj is a senior architect with Microsoft whose job it is to think about and strategise for the future. According to him, that future is largely going to revolve around moving to cloud computing (unsurprising given the investment in Azure) and the transition to utility computing.

Moving to utility computing is the same as moving to buying a car as a service - instead of buying a new car based on model, colour, style and such like, you'd choose the car as a service based on value for money and comfort. If you were buying a car as a service you would of course be taking a taxi.

Miha suggests that utility computing will be provided by mega-data-centre providers who provide all the hardware, the power, the operational support and your enterprise will consume the services it needs and scale up/down as required.

There is evidence to suggest a move towards this at present - out of all sales of servers worldwide, 3 customers represent 50% of the business. They of course are Microsoft, Google and Yahoo.

Microsoft themselves no longer buy servers in the conventional sense - instead they purchase locked, sealed container "PODs" like the one pictured below, each consisting of a sealed environment with 200 physical servers. These remain locked unless availability of units falls below 95% (or in other words, when 100 servers have fried), at which point the container provider sends out an engineer.

image

The consumption of servers by organisations such as Microsoft has a knock on effect on their carbon output. Microsoft have so many servers in their data centres (Chicago alone has capacity for 220 of the above containers), that worldwide they have the same carbon footprint as the entire aviation industry. That's 5% of the planets entire power output and emissions going straight to and attributed to one company!

In fact if a POD provider is able to bring lower power variants to market, Microsoft are happy to replace entire containers to reduce operational costs - the significant cost of the data centre today isn't in capital outlay, it's in the operational cost - the power to run them and the power/water to cool them.

As developers, we can't write code to be more green. Using a while loop isn't going to produce less carbon than a for loop, so no matter what we do as creators of software, that software will continue to rely on carbon hungry servers. The shift towards utility computing and consolidation of hardware can help to alleviate some of these issues by requiring less hardware to do more, and by building these mega-data-centres in strategic locations (Iceland was suggested), power consumption can be drastically reduced.

Before widespread adoption of utility computing can be progressed however, there will be several hurdles that need to be overcome to promote trust in the industry.

Firstly, solid, robust SLA's need to be put in place to promote confidence in off-premises infrastructure and secondly some form of regulation needs to come into play for the IT industry at large. When we plug our laptops into the power socket or when we drink from the tap, we only do so because of the trust established by regulation of those utilities that ensures we won't be fried or poisoned. IT needs the same regulation to promote trust.

Whatever happens in the future, we can't just look back at past success and hope to repeat it. Just because it worked before doesn't mean it will in a future that could be governed by a new set of rules.

Day 2 - 09:00 : A first look at Oslo, Dublin and WF 4.0

David Chappell presented this session and, after yesterdays two post-keynote and frankly less engaging presentations, explained the upcoming new features of oslo, dublin and workflow foundation 4.0 very clearly.

Applications can be distilled down to 3 major abstractions - they consist of workflow, services and models. The three new features supporting these abstractions are;

Dublin - extensions to windows server for hosting services.

Oslo - modelling technology to describe applications and other models.

WF 4.0 - technology for coordinating work within software.

What is workflow?
In simple terms, workflow is a bunch of activities executed as a process.

For what do we need workflow?
Scalable applications that must pause and start (long running applications) and applications that must co-ordinate parallel work can all benefit from using WF without having to build the pause/restart/persist/un-persist mechanisms etc from scratch..

What's new in WF 4.0?
WF 4 brings improvements to performance, new visual designers and more activities. It also introduces a new workflow type beyond the sequential and state machine workflows called flowchart. This new type is more powerful than the basic sequential workflow whilst is easier to work with than the even more powerful state machine. It provides the missing middle ground.

Earlier versions of WF have been difficult to work with as the facilities for hosting workflow isn't provided out of the box. Microsoft are addressing this issue in WF 4.0 through Dublin.

What does Dublin provide?

  • A scalable host for WCF and WF services and applications
  • Built in persistence for service state
  • Management tools
  • Auto start and heartbeat services - failed services restart etc.
  • Message forwarding - content based routing
  • Tracking and logging what's going on within your services
  • Dublin will be "Free" (included in windows, but initially offered as a separate download)

What about Oslo?
Oslo completes the picture by offering the ability to model against different schemas. But what is a model?

A model is an abstract description of something - for example, a city map is an abstract description of the city. Models omit detail - the city map doesn't include terrain or information about all of the buildings, what colour they are and so on - it's an abstraction of the city for a given purpose - in this case navigation.

Many different things can be described with models - WF workflows, services, applications and business processes for example. Models are able to be purely descriptive and informative or may even be executable  eg: in the case of a WF workflow. Models can also be linked to other models.

Oslo itself is a general purpose modelling platform with 3 key components;

  • A repository - storage for both schema (kinds of models) and instances of schemas (individual models)
  • "M" - a modelling language used to define schemas. This divides into two other components MSchema and MGrammar discussed later.
  • "Quadrant" - a visual modelling tool used against

The repository is where all the information about schemas and models are stored. Initially it's envisaged that Oslo will ship with schemas for processes, applications, workflow, activities, services and environments, but using "M", you will be able to define any schema you can imagine.

It's worth noting that the repository itself is just a simple SQL database - you can query it directly to interrogate models or schemas, and you don't even have to use "M" to define your schemas. If you can understand the database structure for a schema, you can create your own tools.

"M", the modelling language is used to define schemas. It splits into two areas: "MSchema", which is a C# type language used to define the structure of models and "MGrammar" which is used to define textual domain specific languages (DSLs).

MSchema defines structure of the model, the relationships between the structural elements and ultimately is used to generate T-SQL to define storage of model instances.

MGrammar is used to define the syntax for new DSLs and provides tools for creating parsers of these DSLs. Examples of DSLs include SQL, regular expressions and also MSchema - Interestingly, and perhaps obviously, MSchema is a DSL created with MGrammar.

The quadrant toolset is  a graphical tool that consumes schemas to provide a modelling surface with appropriate views. Schemas can be defined to have different viewers and control what tools appear in the designer for manipulation on these design surfaces.

Monday, 10 November 2008

Day 1 - 17:45 : When you have too much data, good enough is good enough (ARC303)

Presented by Pat Helland, this session was a high level talk that resided largely in the theoretical space and offered no real answers but rather problems were posed and left as exercises for the attendee to think about.

The session aimed to challenge how we think about data and how rigid and prescriptive we are about our interfaces to the data and the usages of that data.

This culminated in a look at how organisations often find themselves compromising data quality the larger that data gets. Amazon was used as a case study, where specifically their merchant API contracts aren't overly prescriptive about what data they expect in an effort to encourage merchant adoption over data quality. Instead, they have processes which attempt to reconcile data together, but ultimately they sacrifice data quality for the sake of simplicity for the merchants.

For example, take shoes - they have no unique code - there's no ISBN or similar unique identification systems, yet if you were able to buy shoes on Amazon, pair of shoes X from manufacturer Y, sold by merchant A would appear as the same product on the site sold by merchant B, with the same unique Amazon product code. Merchant A may have sent distinctly less data than Merchant B, yet the Amazon service is able to apply logic to work out that the products are the same thing and present them as such.

Merchant A and B both might send the colour and manufacturer name for instance, whilst merchant B might provide a host of additional information on top of this that can flesh out the product data. The idea being the colour and manufacturer name might be the prescriptive contract whilst the additional information from Merchant B is completely optional, not strictly defined and would be used to flesh out extra information on the product data (for both merchants) if it was available (from either). He even went so far as to suggest that contracts offer key/value pairs to allow any data to be passed optionally and used. (This gives me chills of the bad variety, I've got to say).

Of course this is where the quality issues appear - it's not always 100% possible to match the two together so sometimes the same product will be brought into the catalogue as separate items. For Amazon, this is deemed acceptable rather than forcing a regimented API that merchants must adhere to.

The ideas presented revolved around how classic RDBMS systems offer crisp answers over relatively small amounts of data, but new systems have huge amounts of data, high rates of change and large volumes of queries. As systems grow, data quality and it's meaning becomes more fuzzy - any schema, if it's even present, may vary across data and the origin of the data may be stale and we must be able to work with this data within given tolerances of staleness.

For example, if we have an ordering API that allows our customers to place orders for products with us and that API exposes also a list of prices that updates at midnight each evening. If someone submitted an order at 11:59pm to your services that was processed at 12:01am, is your system going to reject the order because the pricing is stale? No - it should allow either the stale or the current pricing to be used for a period of time before enforcing such rules.

This was discussed further as the concept of inside data and outside data. Inside data being the transactional systems we're all used to - you start a process, you freeze the database in time using a transaction and then commit when you're done. This is the historic model of databases, but today we have services to contend with that are outside of the transaction and so aren't within the same space/time as the database transaction. We have to deal with this in our systems in future.

To cut the remainder of the story short, he theorised that for many businesses, just like Amazon, they are happy with "good enough" if it gives benefit elsewhere.

Day 1 - 16:00 : The future of composite applications and SOA - everything they told you and why it isn't true (SOA202)

The presentation opened with a video about the history of models. You can view this light hearted video here: www.modelsremixed.com

Presented by Mark Berman and Steven Martin, this session covered (in quite simple terms) the future of SOA and some misnoma's about SOA in general. To begin with they addressed the following mistakes;

SOA is a product?
Service oriented architecture is not a what, it's a how. You don't go and buy Microsoft SOA or IBM SOA, you build applications using the principles of SOA. It's another way to build applications or even another way to re-purpose them. In fact most recent high ROI implementations of SOA haven't been green field developments, but instead have been a wrapper over existing software, exposed as SOA.

SOA aligns business to IT
No, this is what people do. No technology can align business to IT, only people can achieve this.

SOA governance fixes everything
This is wrong, SOA governance alone is not enough - governance needs to span all of IT, not just SOA.

SOA stops at the firewall
Notice in many organisations today how the administrator of the firewall holds immense power! To truly benefit from SOA we need to be thinking of SOA as starting at the firewall.

The presenters followed this up with a claim that essentially Microsoft pioneered SOA. This is claimed as Microsoft, along with other constituents, pioneered web services, a fundamental part of SOA itself.

One of the issues Microsoft are looking to address is how, it's perceived at least, only blue chip / fortune 500 / FTSE companies are able afford to implement SOA successfully. They wish to make it available for the smaller organisation too.

Dublin is the codename for a set of extensions to the windows server platform (IIS7 specifically) that are the next evolution of the WAS/IIS platform that will be used to host, run and manage windows communication foundation and windows workflow foundation applications.

Released after visual studio 2010, dublin will provide facilities to manage, throttle and inspect individual services within a deployed application. Combined with OSLO, the graphical modelling toolset and DSL environment, building, deploying and managing services is purported to become easier than it is at present.

Whilst this session was titled the future of composite applications and SOA, there wasn't enough depth to take away anything about the future of SOA beyond the rough idea of what dublin will be.

Day 1 : The Keynote

After introductory comments and reinforcement of the Azure message, Jason Zander, general manager of visual studio team developer division of Microsoft took to the stage to present the future of visual studio in the guise of visual studio 2010. It got off to a flying start as VS crashed within a few minutes of the demo - but lets face it this IS pre-alpha software, so is easily forgiven.

Amid many new enhancements in vs.net 2010, there were key improvements discussed in the following areas;

  • Understanding the code
  • Building web applications
  • Creating office business applications
  • Using the power of C++

There was quite a lot of depth to some of these changes, but to summarise some of what I felt were the more important ones;

A new "architecture explorer", is being introduced to help you visualise the structure and dependencies within your solution. It visually displays the relationships between assemblies within a project and provides drill down into the assemblies to see relationships between individual classes and namespaces and the connections between them are weighted according to the level of dependency.

In addition to visualising the dependencies between areas of your solutions, individual portions of code can be extrapolated into UML 2.1.1 sequence diagrams - which is a useful addition, and I'm hoping this eludes to visual studio 2010 ultimately supporting UML modelling out of the box.

Testing was a major focus for the next release, with 2010 introducing the testing activity centre, code-named Comono(?). This is a test environment for managing and running manual scripted tests. This is a major boon for the end-to-end experience and integrates into new debugging facilities in visual studio and TFS. Some of the highlights for this new facility;

  • Testers are presented with scripted steps to follow - test steps are marked as passing or failing.
  • During testing, the system is able to take video of what the tester is doing as part of their session and also record replay information about the state of the application that can be replayed in the debugger within visual studio.
  • When failures occur, a bug can be entered directly into TFS, attaching any video or replay session.
  • In TFS, viewing any raised bugs from this process shows a list of steps the tester took, and each step provides a hyperlink to the timecode within the video so you can see exactly what the tester was doing.
  • Historical debugging of the testers snapshot allows debugging of the testers session.
  • The aim and objective of this suite is to eliminate the "unable to reproduce" responses developers often run into and to bring manual testing into the managed process.

Alongside this new manual test studio, is the test lab management facilities. This allows virtual test environments (eg; different servers for various tiers) to be provisioned and used within the test. Again this is integrated back to TFS and visual studio, allowing virtual machines to be restored into the state they were at the point a bug was raised.

Within visual studio, the editor itself has had a complete WPF overhaul. As a result of this, the editor can take advantage of lots of WPF goodness like the ability to contain and display any form of information, from the source code we know and love, to diagrams, graphics and other glyphs that might help us to understand or navigate the code more efficiently. The editor uses the extensibility framework (MEF) allowing for multiple add-ins for the editor and add-ins on add-ins on add-ins. X-copy deployment of extensions will also be supported as a by-product.

The editor provides for intellisense support for jQuery and also offers a bunch of new refactoring capabilities - which I can't help but think treads on the toes of Resharper, which has got to annoy JetBrains somewhat.

A new configuration transformer toolset is provided, which allows you to define transformation rules that will be applied when an application (web in this case) is deployed to various environments. Eg: Ability to change connection strings when preparing for production deployment versus debug deployment etc. One touch publishing of websites is also a new facility, where your entire web application is built and zipped into a single file ready for deployment with msdeploy.

In the sharepoint space, demonstrations were given on the new sharepoint server explorer extensions and the WSP importer, which looks like it should speed up sharepoint development no end, especially tied to the new packaging explorer which allows you to package and deploy sharepoint applications easily.

Finally, Jason demonstrated several new features of C++ including running parallel for loops and such like for performance.

Saturday, 1 November 2008

Datasets - what's the problem?

I was talking with a friend of mine last night, an experienced developer who's leading a team of guys building a large commercial enterprise application. We were just chewing the fat over data access and how we approach things differently - I'm a fan of the domain model and use OR/M for persistence - usually NHibernate, but my buddy was advocating the use of the Dataset.

Many of us are told that the Dataset is just evil incarnate, but that's not actually true - the Dataset does serve a purpose, just not one in an enterprise application with complex domain logic.

The issue with a dataset is that it's not representational of data, it merely contains it. Beyond the activity of querying data and performing straight forward operations, it requires additional components. Representing complex logic and interrelated business entities is harder than it needs to be.

A domain driven approach on the other hand, with entities representing data, provides more flexibility to describe your logic - for instance, a customer object will contain properties to represent itself, along with business logic for operations and rules that govern a customer object and it's relationships to other objects.

Does that mean that the DataSet is intrinsically evil? No!

Sure dataset's have their problems - amongst others, they enforce a database centric view of data and when they serialise they also serialise a description of the schema of the data resulting in a bloated payload (this can be disabled). Despite these problems however, for trivial systems, they can serve a purpose.

It's a matter of choice. Even for trivial applications I would advocate a domain approach, but that's my preference. For anything non-trivial however and for any enterprise system, I'd avoid the dataset.