Results

Physical models of rigid bodies are used for sound synthesis in applications from virtual environments to music production. Traditional methods such as modal synthesis often rely on computationally expensive numerical solvers, while recent deep learning approaches are limited by post-processing of their results. In this work we present a novel end-to-end framework for training a deep neural network to generate modal resonators for a given 2D shape and material, using a bank of differentiable IIR filters. We demonstrate our method on a dataset of synthetic objects, but train our model using an audio-domain objective, paving the way for physically-informed synthesisers to be learned directly from recordings of real-world objects.

Compare the original audio and the generated audio.

Random shape/material
Spectrogram original
Spectrogram predicted
Audio original
Audio predicted

Density	Original	Predicted
\(\rho=500.00\)
\(\rho=2111.11\)
\(\rho=3722.22\)
\(\rho=5333.33\)
\(\rho=6944.44\)
\(\rho=8555.56\)
\(\rho=10166.67\)
\(\rho=11777.78\)
\(\rho=13388.89\)
\(\rho=15000.00\)

Poisson’s ratio	Original	Predicted
\(\nu=0.10\)
\(\nu=0.14\)
\(\nu=0.19\)
\(\nu=0.23\)
\(\nu=0.27\)
\(\nu=0.32\)
\(\nu=0.36\)
\(\nu=0.40\)
\(\nu=0.45\)
\(\nu=0.49\)

Interpolate between two shapes, and see how the audio changes.

In this case, we do not have discretized positions for the coordinates in the shapes (as FEM requires discretization). Because our network acts as a neural field we can obtain sound for continuous coordinate values. We can interpolate between the two shapes and see how the audio changes.

Shape	Predicted sound