motion segmentation

2024

benchmarking seven imu segmentation algorithms, and testing whether a neural net could match them

pythonsignal-processingimutensorflowscikit-learntime-series

github →korea science service international research program

this was the problem they handed me. at the korea science service international research program, working under professor jong gwan lim of mokwon university, i was given a task in motion segmentation: take raw motion-sensor data and figure out, automatically, where one movement ends and the next begins. i'd never worked with an imu before. i'd never done signal processing before. so the first real obstacle wasn't the code, it was understanding what i was even looking at.

an imu, an inertial measurement unit, is the sensor in your phone or smartwatch that tracks acceleration and rotation. it produces a continuous stream of numbers, and "segmentation" means cutting that stream at the right moments: this is where the person started waving, this is where they stopped. get the cut a few samples too early or too late and you've mislabeled the motion. learning to think in terms of derivatives, windows, and signal noise, after a lifetime of thinking about data as static tables, was the conceptual wall i had to climb before any of the rest made sense.

what i set out to do, and what i actually built

my ambition going in was to invent something. there are seven well-known segmentation algorithms in the literature, baron, benbasat, choi, eps, guenterberg, kim, and bang, and i wanted to combine their ideas into a new, better method of my own. i tried. it didn't work. the novel technique i was reaching for never beat the methods it was built from, and at some point i had to be honest that i'd aimed past what i could pull off in the time i had. that failure is the most useful thing the project gave me, so i'm not going to dress it up.

what i built instead was a rigorous benchmark. i implemented all seven algorithms against the same imu dataset with labeled ground-truth boundaries, then ran a grid search over each method's parameters, its thresholds and window sizes, to find the configuration that performed best. scoring each method fairly was its own problem, because "good" here isn't one number. i scored every configuration on a weighted blend of four things: accuracy, boundary error to the left and right of the true cut (how many samples off the predicted boundary landed), and time delay. i took the top three configurations per algorithm, averaged them, and plotted all seven head to head across those four metrics.

then i went past the classical methods. i built feature vectors out of the signal, the raw value, its first differences at several lags, absolute differences, and fed them to a scikit-learn mlp classifier and then a tensorflow model, to test whether a learned model could match what the hand-tuned algorithms did on the same boundary-detection task.

architecture

the classical benchmark followed the same shape for each of the seven algorithms: load the signal and its ground truth, sweep the parameters, score every configuration.

import motionSegLib as motion

for ii in range(len(dataRepository)):
    data = np.loadtxt(dataRepository[ii]).mean(1)
    dataP, gd = motion.preprocess(data)
    target = np.loadtxt(targetRepository[ii])
    # run the segmentation method, compare predicted
    # boundaries to target, record ACC / UMBR / UMBL / time delay

each method's results got ranked by a single weighted score so the seven could be compared on equal footing:

weights = {'umbR': 0.3, 'umbL': 0.3, 'acc': 0.2, 'td': 0.2}
df['weighted_score'] = (
    df['UMBR'] * weights['umbR'] + df['UMBL'] * weights['umbL'] +
    df['ACC']  * weights['acc']  + df['timedelay'] * weights['td']
)
top3 = df.sort_values('weighted_score').head(3)

for the learned approach, the same preprocessed signal got turned into lagged-difference feature vectors and handed to an mlp classifier and a tensorflow model, scored against the same boundary metrics as the classical seven, though i evaluated both on the same data they were trained on rather than holding out a test set, so these numbers read as a fit check, not a generalization result.

outcome

i implemented and benchmarked seven motion segmentation algorithms on real imu data, built a fair multi-metric scoring system to compare them, and tested whether mlp and tensorflow models could match the classical methods on the same task. i wrote up the work as a formal technical report and presented the findings on model performance and latency to my faculty mentors and program peers.

the novel method i'd hoped to produce isn't in there, because it didn't work. what is in there is a clean comparison of how seven established techniques behave on the same data, and a first honest look at whether a learned model can stand in for hand-engineered ones. for a first encounter with signal processing, ending up with a working benchmark and a clear-eyed read on my own failed idea is an outcome i'll stand behind.

what i took from it

two things stuck. first, that the ambitious version of a problem and the achievable version are often different problems, and knowing when to switch from one to the other is a skill, not a defeat. second, that being able to read a noisy signal, the thing that intimidated me most at the start, became the part i'm now most comfortable reaching for. the phishing work that came later leaned on audio feature extraction, which is the same muscle: turning a messy continuous signal into something a model can learn from.