Nathan Evans' Nemesis of the Moment

Really simple way to split a F# sequence into chunks / partitions

Posted in .NET Framework, F# by Nathan B. Evans on March 13, 2014

I needed a simple function to split a (potentially infinite) sequence into chunks, suitable for processing. My exact use-case for this was actually in optimising my Azure blob storage uploads. I would split a sequence of 1,000s of items into batches of 60 or so items and then upload them concurrently across 60 connections to the Azure blob store. The performance benefits from this (after also messing around with ServicePointManager’s stupid connection limits and Nagle algorithm stuff) were simply staggering but that’s kind of another story.

I searched high and low for a suitable F# function to do this, but there was nothing. And all the samples I found on the web had design flaws or were overly complex. The design flaws were usually that it would seek the sequence more than once which is highly inefficient and could even cause side affects depending upon the source of the sequence.

I got frustrated and quickly wrote my own, though I will warn you that it uses mutable state. But as a result is very fast…

/// Returns a sequence that yields chunks of length n.
/// Each chunk is returned as an array.
let toChunks n (s:seq<'t>) = seq {
    let pos = ref 0
    let buffer = Array.zeroCreate<'t> n

    for x in s do
        buffer.[!pos] <- x
        if !pos = n - 1 then
            yield buffer |> Array.copy
            pos := 0
            incr pos

    if !pos > 0 then
        yield Array.sub buffer 0 !pos

// Ridiculously imperative, but it works and is performant; won't seek the sequence more than once.
// If you're using in a forward-only manner and won't be holding references to the returned chunks
// then you can get rid of the Array.copy to gain some extra perf and reduce GC.

Here’s the Gist if that’s better for you:

Oh, this is MIT licensed so you know that I won’t come calling.


Tagged with: , ,