Nathan Evans' Nemesis of the Moment

Really simple way to split a F# sequence into chunks / partitions

Posted in .NET Framework, F# by Nathan B. Evans on March 13, 2014

I needed a simple function to split a (potentially infinite) sequence into chunks, suitable for processing. My exact use-case for this was actually in optimising my Azure blob storage uploads. I would split a sequence of 1,000s of items into batches of 60 or so items and then upload them concurrently across 60 connections to the Azure blob store. The performance benefits from this (after also messing around with ServicePointManager’s stupid connection limits and Nagle algorithm stuff) were simply staggering but that’s kind of another story.

I searched high and low for a suitable F# function to do this, but there was nothing. And all the samples I found on the web had design flaws or were overly complex. The design flaws were usually that it would seek the sequence more than once which is highly inefficient and could even cause side affects depending upon the source of the sequence.

I got frustrated and quickly wrote my own, though I will warn you that it uses mutable state. But as a result is very fast…

/// Returns a sequence that yields chunks of length n.
/// Each chunk is returned as an array.
let toChunks n (s:seq<'t>) = seq {
    let pos = ref 0
    let buffer = Array.zeroCreate<'t> n

    for x in s do
        buffer.[!pos] <- x
        if !pos = n - 1 then
            yield buffer |> Array.copy
            pos := 0
        else
            incr pos

    if !pos > 0 then
        yield Array.sub buffer 0 !pos
}

// Ridiculously imperative, but it works and is performant; won't seek the sequence more than once.
// If you're using in a forward-only manner and won't be holding references to the returned chunks
// then you can get rid of the Array.copy to gain some extra perf and reduce GC.

Here’s the Gist if that’s better for you: https://gist.github.com/nbevans/9429542

Oh, this is MIT licensed so you know that I won’t come calling.

Enjoy.

Tagged with: , ,

4 Responses

Subscribe to comments with RSS.

  1. Paul Westcott said, on March 14, 2014 at 3:32 AM

    Same same; just slighty easier to read to read…

    let toChunks (n:int) (s:seq) = seq {
    let buffer = ref (ResizeArray n)

    for x in s do
    (!buffer).Add x
    if (!buffer).Count = n-1 then
    yield (!buffer)
    buffer := ResizeArray n

    if (!buffer).Count > 0 then
    yield !buffer
    }

  2. Wally said, on March 20, 2014 at 8:53 PM

    Thanks for posting this! It was just what I was looking for.

  3. Aidan said, on October 27, 2016 at 6:13 PM

    This is a thing of beauty. Would never have thought of this. Thanks!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: