Fast object painter

For a while now there's been a point source simulator in pixell called pointsrcs.sim_srcs, which takes in a catalog of object positions, peak amplitudes and a shared radial profile and returns a map where those objects have been painted in. As long as one is simulating point sources, the limitation of all the objects needing to have the same radial profile isn't a big deal, but it's annoying when simulating resolved objects like clusters. The existing implementation was also a bit slow due to being written in python+numpy. So I had a go at rewriting it as a C extension. Despite ending up being more than 600 lines of C code long, the algorithm itself is pretty simple.

Algorithm

  1. Divide the map into cells of e.g. 8×8 pixels
  2. For each object use its profile and peak amplitude to determine the radius rmax at which it falls below the minimal value vmin that we will consider. In the example below, the profile is the act f090 beam (so this is a point source), the peak amplitude is 1000 μK, and vmin = 0.1 μK, resulting in rmax = 14.8 arcmin. This step takes about 1 μs per object.
  3. Estimate which cells fall within this distance, and add the object to those cells' list of objects to consider. Since there are many more faint objects than bright ones, most objects will only be assigned to a few cells. Computing the exact distance from each cell to each object would be too slow, so the cell assignment currently uses a two-step heuristic: First use the local pixel size at the object's location to translate rmax into a pixel bounding box, and then get the pixel size at the corners of this box and recompute the bounding box using the smallest of these. This works well for cylindrical projections, but probably won't work projections with internal edges like Mollweide. This step typically takes around 1 μs per object
  4. For each cell that was assigned objects, copy out the pixels for that cell into a small buffer. For each of its objects, loop through the cell's pixels, compute the distance to the object, evaluate the profile at that distance and add the value to the pixel in the buffer. Once all the objects have been processed, copy the buffer back into the map. Each cell can be processed independently of the others, allowing us to cheaply OMP parallelize over them. If the profiles are equi-spaced, then this takes something like 30 μs per object. Otherwise it takes twice as long. These numbers depend strongly on vmin and the profile shape, since that determines how many pixels need to be painted.

Speed comparison

This is the same general idea I used in the existing point source simulator. As is typical of slightly involved algorithms like this one, python is far from optimal even when using numpy. This new C implementation is about 20x faster than the python version on my laptop, which took around 600 μs per object. Since the slowest step is OMP parallelized, the ratio should be even bigger on a cluster.

Pixell update

The new function is available as pointsrcs.sim_objects. I also updated pointsrcs.sim_srcs to forward to sim_objects by default, so existing code should enjoy this speedup without needing to be modified. However, I still recommend calling sim_objects directly, as sim_srcs's interface doesn't support per-object profiles.

One thing the new function doesn't support that the old one did is padding of the image. The old function would simulate the objects on a slightly padded canvas, then crop it down to the target size before returning. The motivation for this is that when applying the pixel window (which the functions can optionally do), then it's best if no point sources are right at the edge of the image, since they violate the periodiciy fourier operations assume. Since the new function doesn't support this, it's best to apodize and apply the pixel window manually for now. If this is a problem, then the old version is available by calling sim_srcs with the argument method="pyhton".

Painting the websky catalog

With sim_objects in hand I wanted to try simulating the the websky cluster catalog. This would have been impractical with the old function, since each cluster has its own profile.

Raw y profile

  1. The websky catalog comes in the form of a binary file containing roughly 1 billion objects. To avoid using too much memory I load these in batches of 100k.
  2. The catalog contains 3d positions in comoving coordinates as well as the quantity R which is not, as I first thought, r200. Instead it's "the 'comoving Lagrangian radius' corresponding to the mass enclosed in M200m". So to get m200 = m200c and z, the easiest quantities to work with, one needs to do
    chi     = (pos_x**2 + pos_y**2 + pos_z**2)**0.5
    a       = pyccl.scale_factor_of_chi(cosmology, chi)
    z       = 1/a-1
    rho_m   = calc_rho_c(0, cosmology)*Omega_M
    m200m   = 4/3*np.pi*rho_m*R**3
    m200    = m200m_to_m200c(m200m, c)
    These subtleties (e.g. m200m vs. m200c) took a few days of debugging to get right.
  3. Given m200 and z we can compute the cluster shape parameters from Battaglia 2012:
    z1    = z+1
    m     = m200/(1e14*utils.M_sun)
    P0    =  18.1     * m** 0.154   * z1**-0.758
    xc    =  0.497    * m**-0.00865 * z1** 0.731
    beta  =  4.35     * m** 0.0393  * z1** 0.415
    alpha =  1; gamma = -0.3
    # Go from battaglia convention to standard gnfw
    beta  = gamma - alpha*beta
    Given these parameters the 3d radial electron pressure profile is given by
    P_e  = 0.5176 P_th
    P_th = G * m200 * 200 * rho_c(z) * f_b / (2*r200) * P0 * gnfw(r/r200, xc, alpha, beta, gamma)
    def gnfw(x, xc, alpha, beta, gamma):
      return (x/xc)**gamma*(1+(x/xc)**alpha)**((beta-gamma)/alpha)
    
    where rho_c(z) is the critical density, f_b = Omega_b/Omega_m is the baryon fraction and r200 is the radius inside which the average density is 200x the critical density (and m200 is the total mass inside this radius).
  4. The compton y parameter is the line-of-sight integral of σ_T/(m_e*c**2), but actually performing this integral for each radial sample of each object would be far too slow to be practical. Instead I notice that the shape parameters alpha and gamma are constant while xc is equivalent to a rescaling of the radius, leaving beta as the only shape-affecting parameter left. I can therefore get away with a simple 2d interpolation over the radius and beta. This is implemented in enlib.clusters.ProfileBattagliaFast.

After implementing all this I still couldn't quite reproduce the official websky maps. This plot shows the ratio between the central y value I get and the ones the websky mapping code gets as a function of m200 and redshift:

We agreed to better than 10% for most masses, but at the highest masses the ratio approaches a factor of 2! After discussing this with Marcelo Alvarez (who also helped me get the previous parts right, as did Nick Battaglia, Matt Hilton and Emily Moser), Marcelo found out that the websky interpolation table was based on a line-of-sight integral that didn't converge properly in the center of the clusters (figure from Marcelo):

This has been fixed, and Marcelo has already released new websky maps.

Beam convolution

It's tempting to just paint the raw clusters on the sky, and then apply the beam using a spherical harmonics transform in the end. However, the strongly cuspy nature of the cluster profiles means that this is not safe unless the pixels are arcsecond-size rather than the arcminute-size pixels we use in ACT. The reason why this doesn't work is that a coarsely sampled map does not contain enough information to properly represent the profile.

A better approach is to do the beam convolution directly on the radial profile. This can be done using a Hankel transform, which scipy provides a fast version of. The great thing about the Fast Hankel Transform is that it requires the input to be log-spaced, which means that it can resolve both the small details of the cusp and the large scales further out. This means that we can evaluate the transform accurately with much fewer points than we would have needed with linear scaling. I implemented a class called pixell.utils.RadialFourierTransform which makes this convenient. Example usage (including raw profile evaluation):

rht          = utils.RadialFourierTransform()
lbeam        = np.interp(rht.l, np.arange(len(meta.beam)), meta.beam)
prof_builder = clusters.ProfileBattagliaFast(cosmology=cosmology, beta_range=beta_range)
rprofs       = prof_builder.y(m200s[:,None], zs[:,None], rht.r)
lprofs       = rht.real2harm(rprofs)
lprofs      *= lbeam
rprofs       = rht.harm2real(lprofs)
r, rprofs    = rht.unpad(rht.r, rprofs) # (optional)

The plot below shows the size of the error one makes by using the native 2d fourier transform to apply the beam, for various pixel sizes. As you can see, one can expect up to a 60% amplitude error for our standard 0.5' pixels. This is for a cluster centered on a pixel. The error will be different for different cluster locations, and for locations far from a pixel center it can underestimate rather than overestimate the signal.

I don't know how this issue was handled in the websky maps. It's easy to get wrong.

Result

I made a driver program sim_websky.py that reads in the websky catalog, a map geometry and a beam profile and outputs a simulated map, tenki/sim_websky.py. It simulates a full-sky 0.5' CAR websky map at f090 in about 3.5 hours on 4 nodes, with 10 mpi tasks per node and 4 threads per task. Each task takes about 0.5 ms per object, which is completely dominated by the profile evaluation. With some C-level optimization this could probably be sped up by a factor of 10, but I think it's good enough for now. A 6°×6° cutout is shown below, on a color scale from -100 to 0 μK. This is is about 0.1% of the sky, and contains about about 750000 clusters.

(I'm pretty confident in the cluster painting here after all the comparisons, but doesn't that big cluster look a bit too big? I don't remember seeing anything as big as that in the ACT maps...)

Full maps