Low Latency Programming in Java and C#

If you’re dealing with high-frequency trading, game servers, or anything where microseconds matter, you need low latency. And guess what? Java and C# can actually be pretty damn fast if you know what you’re doing. Forget the “Java is slow” meme—it’s all about how you use it.

Let’s break down some key techniques to squeeze every bit of performance out of these languages.


1. Ring Buffers: The Circular Speed Demon

A ring buffer (or circular buffer) is a fixed-size queue that wraps around when it hits the end. Why use it? Because it avoids dynamic memory allocation (which is slow) and keeps data in a CPU-friendly layout.

Java Example (Disruptor-style Ring Buffer)

public class RingBuffer<T> {
    private final T[] buffer;
    private int head = 0;
    private int tail = 0;

    public RingBuffer(int capacity) {
        buffer = (T[]) new Object[capacity];
    }

    public boolean enqueue(T item) {
        if ((tail + 1) % buffer.length == head) return false; // Full
        buffer[tail] = item;
        tail = (tail + 1) % buffer.length;
        return true;
    }

    public T dequeue() {
        if (head == tail) return null; // Empty
        T item = buffer[head];
        head = (head + 1) % buffer.length;
        return item;
    }
}

C# Example (Using Value Types for Zero GC)

public struct RingBuffer<T> where T : struct {
    private readonly T[] buffer;
    private int head;
    private int tail;

    public RingBuffer(int capacity) {
        buffer = new T[capacity];
        head = 0;
        tail = 0;
    }

    public bool Enqueue(T item) {
        if ((tail + 1) % buffer.Length == head) return false; // Full
        buffer[tail] = item;
        tail = (tail + 1) % buffer.Length;
        return true;
    }

    public T? Dequeue() {
        if (head == tail) return null; // Empty
        T item = buffer[head];
        head = (head + 1) % buffer.Length;
        return item;
    }
}

Why this rocks:

  • No dynamic resizing → no GC pressure.
  • Predictable memory access → CPU cache loves this.

2. Lock-Free Programming: Because Locks Are for Chumps

Locks (synchronized, lock) kill performance. Instead, use atomic operations (CAS—Compare-And-Swap).

Java (AtomicInteger FTW)

import java.util.concurrent.atomic.AtomicInteger;

public class LockFreeCounter {
    private final AtomicInteger counter = new AtomicInteger(0);

    public void increment() {
        int current;
        do {
            current = counter.get();
        } while (!counter.compareAndSet(current, current + 1));
    }
}

C# (Interlocked to the Rescue)

using System.Threading;

public class LockFreeCounter {
    private int counter = 0;

    public void Increment() {
        int current;
        do {
            current = counter;
        } while (Interlocked.CompareExchange(ref counter, current + 1, current) != current);
    }
}

Why this rocks:

  • No blocking → threads don’t stall.
  • Atomic operations are hardware-optimized.

3. Binary Protocols: Skip JSON, Go Binary

JSON is slow for parsing. Use binary formats like FlatBuffers (Java/C#) or Protocol Buffers.

Java (ByteBuffer for Manual Encoding)

ByteBuffer buffer = ByteBuffer.allocate(1024);
buffer.putInt(123); // Write an int
buffer.putDouble(45.67); // Write a double
buffer.flip(); // Prepare for reading

int myInt = buffer.getInt();
double myDouble = buffer.getDouble();

C# (Using Span for Zero-Copy)

byte[] data = new byte[1024];
Span<byte> buffer = data;

// Write
BinaryPrimitives.WriteInt32LittleEndian(buffer, 123);
BinaryPrimitives.WriteDoubleLittleEndian(buffer.Slice(4), 45.67);

// Read
int myInt = BinaryPrimitives.ReadInt32LittleEndian(buffer);
double myDouble = BinaryPrimitives.ReadDoubleLittleEndian(buffer.Slice(4));

Why this rocks:

  • No parsing overhead → just raw bytes.
  • Zero-copy possible with Span<T> in C#.

4. Value Types in C#: Stack Allocated = Blazing Fast

Avoid heap allocations with struct (value types).

C# (Struct vs Class Benchmark)

public struct PointStruct { public int X, Y; } // Lives on stack (fast)
public class PointClass { public int X, Y; }   // Lives on heap (slow)

// Usage
PointStruct p1 = new PointStruct { X = 1, Y = 2 }; // No GC pressure
PointClass p2 = new PointClass { X = 1, Y = 2 };   // GC overhead

Why this rocks:

  • No GC → no pauses.
  • Better cache locality → faster access.

Final Thoughts

Low latency isn’t magic—it’s about avoiding:
GC pauses (use structs, object pools)
Lock contention (go lock-free)
Slow serialization (use binary formats)
Unpredictable memory access (ring buffers, arrays over lists)

Java and C# can be insanely fast if you ditch abstractions and think like a performance freak. Now go make something fast! 🚀

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.