Since I had published the diagram of “roughly” the ECG scaling that should be used for the network input, I have wanted to quantify how much that scaling matters to the network.

One thing that was very obvious by looking at the data from the different NSRR studies was that the ECG amplitudes were all over the place, even when using the same equipment.

So, I needed to normalize these amplitudes as best as I could. What I settled on was the following: First, I wanted the median value to be 0 (which makes sense, as the isoelectric line should, roughly, represent zero potential). Second, since most of my other inputs were approximately in the range [-1.0, 1.0], I wanted the ECG to be in the same range. This is because neural networks train better when the inputs are all roughly in the same numerical range, with few large excursions. This also led me to deciding to clip the ECG range to exactly [-1.0, 1.0], as sometimes electrical artifacts in the recordings are orders of magnitude greater than the heartbeat amplitudes. Third, since I knew my range, I wanted to find a scale that would make the best use of this range. Thus, I decided to have at least the 90th percentile of all heartbeat values contained within [-0.5, 0.5]. This also allowed for the natural biological variation in amplitude, with a low likelihood of any heartbeat itself being clipped.

The pipeline was built around this, and worked like a charm. However, since the pipeline is still in MATLAB and divided into several processing steps, I wanted to produce a rough guide for those wanting to get started right away. The question then becomes, if they weren’t using the same pipline as was published, would it lead to differences in results? I didn’t know.

Therefore, I decided to finally evaluate it on the full testing set, by scaling (and clipping, as appropriate) the ECG by values from 0.125 to 8.0. What I found was that in the range 0.5x to 2.0x, the performance impact is negligible. Beyond that range, the performance does start to be meaningfully impacted. This is great, as it means that the network is pretty tolerant to “improper” scaling.

I should note, I kinda expected this to some extent, as I already designed the network to be insensitive to polarity. This means that the features extracted had to tolerate not just a fractional scaling, but a complete flip in the sign of all of the data.

The second interesting finding is on running the model on CUDA vs CPU. I had done all of the training and evaluation on NVIDIA GPUs with the CUDA backend. However, in releasing the model, I realized that I needed to make it accessible to those that don’t have GPUs. Furthermore, if only performing inference, a GPU isn’t really necessary on a model of this size. Therefore, when I published the code, I tweaked it slightly to allow for CUDA or CPU. For those that have not delved into the deep hole that is floating point operations and representations, the following will seem strange: Your computer makes a lot of compromises to store real numbers. There is a standard, IEEE 754, that most will try to follow. However, you can relax these constraints to get better performance. Long story short, the Pytorch backends for CUDA and CPU produce slightly different results. For classifications, this is less likely to be an issue. However, I have now quantified it. Overall 98.2% of the 571,141 scored epochs in the testing set have the same prediction when inference is performed on either CUDA or CPU. This means that about 1.8% of the epochs had outputs that were right on the “border” between being classified as one stage or the other, and that when changing the backend, the prediction switches to the “other” stage.

Originally posted on cardiosomnography.com.