Bela
Realtime, ultralowlatency audio and sensor processing system for BeagleBone Black

A Perlin Simplex Noise C++ Implementation (1D, 2D, 3D, 4D). More...
Functions  
static int32_t  fastfloor (float fp) 
static uint8_t  hash (int32_t i) 
static float  grad (int32_t hash, float x) 
static float  grad (int32_t hash, float x, float y) 
Variables  
static const uint8_t  perm [256] 
A Perlin Simplex Noise C++ Implementation (1D, 2D, 3D, 4D).
Copyright (c) 20142015 Sebastien Rombauts (sebas) tien .romb auts @gmai l.co m
This C++ implementation is based on the speedimproved Java version 20120309 by Stefan Gustavson (original Java source code in the public domain). http://webstaff.itn.liu.se/~stegu/simplexnoise/SimplexNoise.java:
This implementation is "Simplex Noise" as presented by Ken Perlin at a relatively obscure and not often cited course session "RealTime Shading" at Siggraph 2001 (before real time shading actually took on), under the title "hardware noise". The 3D function is numerically equivalent to his Java reference code available in the PDF course notes, although I reimplemented it from scratch to get more readable code. The 1D, 2D and 4D cases were implemented from scratch by me from Ken Perlin's text.
Distributed under the MIT License (MIT) (See accompanying file LICENSE.txt or copy at http://opensource.org/licenses/MIT)

inlinestatic 
Computes the largest integer value not greater than the float one
This method is faster than using (int32_t)std::floor(fp).
I measured it to be approximately twice as fast: float: ~18.4ns instead of ~39.6ns on an AMD APU), double: ~20.6ns instead of ~36.6ns on an AMD APU), Reference: http://www.codeproject.com/Tips/700780/Fastfloorceilingfunctions
[in]  fp  float input value 

inlinestatic 
Helper function to hash an integer using the above permutation table
This inline function costs around 1ns, and is called N+1 times for a noise of N dimension.
Using a real hash function would be better to improve the "repeatability of 256" of the above permutation table, but fast integer Hash functions uses more time and have bad random properties.
[in]  i  Integer value to hash 

static 
Helper function to compute gradientsdotresidual vectors (1D)
[in]  hash  hash value 
[in]  x  distance to the corner 

static 
Helper functions to compute gradientsdotresidual vectors (2D)
[in]  hash  hash value 
[in]  x  x coord of the distance to the corner 
[in]  y  y coord of the distance to the corner 

static 
Permutation table. This is just a random jumble of all numbers 0255.
This produce a repeatable pattern of 256, but Ken Perlin stated that it is not a problem for graphic texture as the noise features disappear at a distance far enough to be able to see a repeatable pattern of 256.
This needs to be exactly the same for all instances on all platforms, so it's easiest to just keep it as static explicit data. This also removes the need for any initialisation of this class.
Note that making this an uint32_t[] instead of a uint8_t[] might make the code run faster on platforms with a high penalty for unaligned single byte addressing. Intel x86 is generally singlebytefriendly, but some other CPUs are faster with 4aligned reads. However, a char[] is smaller, which avoids cache trashing, and that is probably the most important aspect on most architectures. This array is accessed a lot by the noise functions. A vectorvalued noise over 3D accesses it 96 times, and a floatvalued 4D noise 64 times. We want this to fit in the cache!