Blog moved to http://www.andrevdm.com/

Saturday, 21 November 2009

BlockingCollection & parallel yields

.Net 4 has many new features. My favourite at the moment are the classes in the System.Collections.Concurrent namespace. System.Collections.Concurrent has four thread safe collections; ConcurrentQueue<>, ConcurrentStack<>, ConcurrentDictionary<> and BlockingCollection<>.
Here are two articles that discuss these collections

I’m finding the BlockingCollection’s GetConsumingEnumerable() method incredibly useful. What it does is return an IEnumerable<> that removes items from the collection and blocks while the collection is empty. This combined with the fact that the BlockingCollection is thread safe makes for very simple code.

As an example image that you have this method that downloads data from several locations (DownloadSourceStreams), parses the data (ParseStream) and yields the processed results (ImportData)

public override IEnumerable ImportData()
{
 foreach( Stream inputStream in DownloadSourceStreams() )
 {
  ParsedData data = ParseStream( inputStream );
  yield return data;
  }
}

public IEnumerable DownloadSourceStreams()
{
 foreach( string url in m_sourceUrls )
 {
  yield return DownloadStreamFrom( url );
 }
}


It would make sense to download all the data in parallel. However you can’t simply make the foreach in ImportData() a Parallel.ForEach since you can't yield from within a Parallel.ForEach. So what you need to do is download the data in parallel and then have a single point for yielding all the results. The BlockingCollection makes this easy


private BlockingCollection m_imported = new BlockingCollection();

public override IEnumerable ImportData()
{
 Task.Factory.StartNew( ParallelImportData );
 return m_imported.GetConsumingEnumerable();
}

private void ParallelImportData()
{
 Parallel.ForEach( DownloadSourceStreams(), inputStream =>
 {
  ParsedData data = ParseStream( inputStream );
  m_imported.Add( data );
  } );
  
  m_imported.CompleteAdding();
}

public IEnumerable DownloadSourceStreams()
{
 foreach( string url in m_sourceUrls )
 {
  yield return DownloadStreamFrom( url );
 }
}



The code now works as follows


  • ImportData() starts a new task to import the data (line 5) and then returns the consuming enumerable (line 6)
  • ParallelImportData() is then called on a separate thread/task and does the downloads in parallel by using Parallel.Foreach (line 11). Each imported item is then added to the blocking collection (line 14).
  • When the import has been completed the BlockingCollection’s CompletedAdding() method is called.




When the consumer calls ImportData() it gets an IEnumerable (consuming enumerable) that blocks while there is no data in the collection. As soon as there is data it iterates over it and removes it from the base collection. This continues until CompletedAdding is called.


What is striking about this code is that all of this happens without any explicit locking, the blocking collection handles it all for you.

Tuesday, 4 August 2009

.net REPL

I often want to test things quickly in .net without creating a new project. There are quite a few options.

REPL

Firstly the read-eval-print-loop options. These are great for quick tests.

Most of the dynamic languages have a REPL. Personally I use booish because I like the boo programming language. IronPython and IronRuby also have REPL. You can even run IronPython in your browser if you want.

The best C# REPL I’ve seen is gsharp from Mono. You can happily run Mono side by side with Microsoft’s .net frameworks. So there is no reason not to try this.


Compiling

For more complex tests a light weigh editor/compiler are useful. By far my favourite in this category is SciTE. SciTE is an incredible editor that will compile .cs files (and many others) out of the box. Just open a .cs, press F7 to compile and F5 to run. It can be configured to edit pretty much anything. Its light weigh, cross platform and it is free. IMO its an editor that every developer should take a look at.

Other options include Snippet Compiler which has a nice IDE and intellisense. Also take a look at Snippy, and the reflector snippy addin.

Wednesday, 15 April 2009

C# T-Tree

I have just created a T-Tree project on github: http://github.com/andrevdm/ttree/tree/master

“A T-tree is a balanced index tree data structure optimized for cases where both the index and the actual data are fully kept in memory, just as a B-tree is an index structure optimized for storage on block oriented external storage devices like hard disks. T-trees seek to gain the performance benefits of in-memory tree structures such as AVL trees while avoiding the large storage space overhead which is common to them.” (from wikipedia)

See also: Tobin J. Lehman and Michael J. Carey, A Study of Index Structures for Main Memory Database Management Systems. VLDB 1986 for a comprehensive discussion of T-Trees

The project is a C# implementation of a T-Tree. There is also a very simple command line profiler and a GUI for visualising inserts and deletes.

There are still a few things outstanding (e.g. Making the class enumerable and supporting the visitor pattern) but all the basics (insert, search and delete) are now working.

I'm sure that more can be done to speed up this implementation but it already performs very well. That coupled with the efficient use of memory makes it quite an attractive data structure.

A quick disclaimer: I have only just gotten the basics working I'm sure there are still some bugs lurking. I'll be doing more testing and adding more unit tests as I have time.

The code is released under the BSD licence.

Please let me know if you find it useful, spot any bugs or can think of way to improve it.

Friday, 27 March 2009

Visual Studio Dot Debugger Visualiser

Graphviz is amazing, I've never found a better or easier to use tool for generating graphs and data visualisation. The dot language is very easy to learn and the documentation is very good indeed. If you are doing any form of visualisation its worth taking a look at.

In the past I've used it for visualising compiler ASTs and data structures. (Which reminds me Jim Idle posted a nice ANTLR to dot conversion sample).

At the moment I'm working on a couple of data structures. Visualising them while developing the structures helps a great deal with spotting obvious errors and checking that things look as expected (yes I have unit tests too :). However it was a little inconvenient to have to keep writing the dot out to disk and generating the images manually.

Luckily Visual Studio supports visualisers and they are easy to write too. So I've just created a dot visualiser. My classes now have a ToDot() method which return a string. Whenever I want to visualise them I simply add a watch and select the dot visualiser. It could not be easier.

image

 

Download

VS 2008 Source code
VS 2008 Binaries

Installing the binaries

  1. Copy the binaries to your visualisers directory (%USERPROFILE%\Documents\Visual Studio 2008\Visualizers)
  2. Edit the config file and set the path to dot.exe

Disclaimer

I’ve not made much effort to make this visualiser robust. It helps me during debugging, that’s all.  works on my machine, starburst


Update: Thanks Riaan for spotting an horribly embarrassing  bug in the code :).