Correlating 3D Models with Images

2024-12-06. Updated 2024-12-13

While watching an old 3Blue1Brown video, I learned some fascinating things about image processing techniques that could potentially be applied to 3D modelling! I’m excited!

First, a convolution is a “standard” operation that arises fairly naturally from a sequence of any kind of values that supports product and sum operations. That’s actually a fairly wide range of kinds, and it isn’t limited to numeric types, either.

As I’m writing this, I’ve realized it would likely apply to the Maybe, Applicative or Monad types from Haskell, in addition to numeric types. That’s interesting; and beside the point.

I further learned that a Guassian blur over an image is defined in terms of a convolution, because it can be used to calculate a “moving average” of all the pixels surrounding the target pixel when deciding what color to change each pixel to. The “Gaussian” part of the blurring algorithm comes from the specific weights that are chosen for the pixels involved.

Furthermore, you can change the same convolution to an edge-detection algorithm merely by changing the weights to something else! He explains it visually much better than I can here textually; so I won’t attempt a full explanation — just go watch the video; but it motivated me to ask something else:

What other kinds of operations can be represented by a convolution over an image?

And that question got even more exciting for me once it clicked in my brain that a 3-color pixel (aka RGB) is just a 3-dimensional vertex within a coordinate space — one that happens to map into colors.

That means a 3d model (consisting of a series of vertices) could be re-interpreted as a PNG! What colors would the vertices of a unit cube have? What kind of shape would a photo of my family have?

What colors would a 3d model of the Statue of Liberty conceal?
What kind of shape would the Mona Lisa have in 3d space?

Encoding model

To concretely translate an STL into a PNG and vice verca, there are a few details I still need to nail down, and I haven’t found an obvious or intuitive answer.

First, I’ve chosen X to be encoded in the Red channel of an image; Y is encoded as Green; and Z is encoded as Blue. Simple enough.

But, a face in an STL is repreesented by 3 vertices and a normal. Should I just represent this as 4 sequential pixels? Maybe 3 RGBA pixels where the normal encodes into the alpha channel of each? Or should I encode one face into a 4x4 square of pixels in the result?

The answer to these questions should rightly be answered by the next few:

Say I convert a 3d model to an image. What then? What useful operations can I now perform that would be meaningful when the image is translated back into a 3d model? Would I even recognise the most useful set of operations?

What kind of operations would be useful on an image once converted to an STL? What would the scale function do?

And that is where I’m stuck. It’s also plenty of excitement for one day. So, I’ll see you in the next; and I’ll let you know when I figure out more.