Implementing iterators with yield statements

Iterators are very useful, but in the past they've been a bit of a nuisance to write. Not difficult as such, but you've always needed an extra class in the past to store the state of where you've got up to in the collection, etc. yield statements allow you to write iterators "inline" in a single method, with the compiler doing all the hard work of keeping track of state behind the scenes.

yield statements are only valid in a method/operator/property which returns one of IEnumerable, IEnumerable<T>, IEnumerator or IEnumerator<T>. You can't mix and match - if a member uses yield statements, it can't use normal return statements too.

There are two types of yield statements - yield return (for returning the next item) and yield break (to signify the end of the iterator). Here's virtually the simplest example possible: (I've chosen not to use the generic form of IEnumerable to avoid confusion if you're not familiar with generics yet.)

using System;
using System.Collections;

class Test
{
                  static void Main(string[] args)
    {
                  foreach (string x in Foo())
        {
            Console.WriteLine (x);
        }
    }
    
                  static IEnumerable Foo()
    {
        yield return "Hello";
        yield return "there";
    }
}

The result is:

Hello
there

The important thing to understand is that although Foo() is only called once, the compiler has effectively built a state machine - the yield return "there"; statement is only executed after "Hello" has already been printed on the screen. Every time MoveNext() is called on the iterator (in this case MoveNext() is called implicitly by the foreach statement) execution continues from where it had got to in what we've declared as the Foo() method, until it next reaches a yield statement. If you're familiar with coroutines from other languages, that's effectively what is going on here - it's just that the compiler has done all the hard work.

Within the method, you can use perfectly normal code, with a few restrictions - you can't put a yield statment in a finally block, you can't put a yield return statement in a try block if there's a catch block, and you can't use unsafe code. However, you can use normal looping, access other variables etc. Here's an example, this time implementing IEnumerable:

using System;
using System.Collections;

class Test
{
                  static void Main(string[] args)
    {
        NameAndPlaces t = new NameAndPlaces("Jon",
                  new string[]{"London", "Hereford", "Cambridge", "Reading"});
        
                  foreach (string x in t)
        {
            Console.WriteLine (x);
        }
    }
}

public class NameAndPlaces : IEnumerable
{
                  string name;
                  string[] places;    
    
                  public NameAndPlaces (string name, string[] places)
    {
                  this.name = name;
                  this.places = places;
    }
    
                  public IEnumerator GetEnumerator()
    {
        yield return "My name is "+name;
        yield return "I have lived in: ";
                  foreach (string place in places)
        {
            yield return place;
        }
    }
}

The result is:

My name is Jon
I have lived in:
London
Hereford
Cambridge
Reading

As mentioned before, yield break is used to stop iterating. Usually this is not needed, as you naturally reach the end of the iterator block. As well as stopping iterating, yield break can also be used to create a simple "empty" iterator which doesn't yield anything. If you had a completely empty method body, the compiler wouldn't know whether you wanted to write an iterator block or a "normal" block (with normal return statements etc). A single yield break; statement as the whole method body is enough to satisfy the compiler.

yield break can be useful if you want to stop iterating due to some external signal - the user clicking on a "cancel" button for instance. Sometimes it is easier to stop the code which is providing the data than the code which is requesting that data. In simple cases, of course, you can just use while loops to only keep going while more data is really wanted. In more complicated scenarios, however, that can make the code messy - yield break ends the method abruptly in the same way that a normal return statement does, with no need to make sure that every level of iteration checks whether or not to continue. Here's an example:

using System;
using System.Collections;
using System.Threading;

class Test
{
                  static TripleCounter counter;
    
                  static void Main(string[] args)
    {
        counter = new TripleCounter();
                  new Thread (new ThreadStart(ShowCounter)).Start();
        
                  // After 5 seconds, stop the counter
        Thread.Sleep (5000);
        counter.stop = true;
    }
    
                  static void ShowCounter()
    {
                  // This would keep going for a very long
                  // time if the counter wasn't stopped!
                  foreach (string count in counter)
        {
            Console.WriteLine (count);
        }
    }
}

public class TripleCounter : IEnumerable
{
                  // Of course in normal code we'd never use a public field...
                  public volatile bool stop = false;
    
                  public IEnumerator GetEnumerator()
    {
                  for (int i=0; i < 10000; i++)
        {
                  for (int j=0; j < 4; j++)
            {
                  for (int k=0; k < 4; k++)
                {
                    Thread.Sleep(250);
                  if (stop)
                    {
                        yield break;
                    }
                    yield return string.Format ("{0} {1} {2}", i, j, k);
                }
            }
        }
    }
}

As noted in the code, normally you'd never have a public field - it just makes the code simpler in this case, so you can concentrate on the yield statements. One thread reads values from the enumerator and the other thread just stops it after five seconds. Coding this without yield break would involve each of the for loops checking whether or not the loop ought to stop - and other situations could be even more complicated.

Behind the scenes, the compiler creates an extra nested type to store the state of the enumerator. It should be noted that a hand-written enumerator may end up being significantly faster than the compiler-generated one. However, in most cases I'd suggest that the iteration speed is unlikely to be significant - and the solution using yield statements is likely to be much easier to read and maintain than a custom solution. As ever, if you have performance concerns, measure and compare different solutions.

While iterators are usually used for collections of one sort or another, the yield statement syntax makes other styles of programming possible too. For instance, the Concurrency and Coordination Runtime under development by Microsoft uses iterators and yield statements to make asynchronous execution much simpler to understand.


Next page: Generics
Previous page: Delegate changes

Back to the main C# page.