I used to work at the largest lyrics website on the Internet so I have all kinds of nifty tricks in my PHP wheelhouse for matching artists, song titles, and even lyrics. A book could be written on the different ways to do this!
Lately I've built up a famous quotes website and in order to make sure I don't put duplicate quotes in I strip out every non-alphanumeric character, and lowercase everything. Then I replace all spaces with a single dash. This seems to work fairly fine.
So in my DB I'll have a full quote like:
"I'm involved in some action scenes, so they'll train me for that. I'll be working with my acting coach to prepare for my character."
And it will be sluggified to:
"im-involved-in-some-action-scenes-so-theyll-train-me-for-that-ill-be-working-with-my-acting-coach-to-prepare-for-my-character"
The second one will be the one for matching while the first one will be for displaying. Since it's a 1-direction conversion, unfortunately I have to store each version in the db, but it saves a lot of grief and has already detected many duplicates.
Since dealing with movie titles is a bit different, you could also research some bayesian matching functions that are available in PHP. It uses fuzzy logic and you can set the threshold if I remember correct. That way you could match a string to a movie, even if the string has some extra junk in there.
EDIT:
Here's some simple code I used for doing this conversion. Order is important of course, because the second command will clear out the spaces if done first:
$text = strtolower(str_replace(' ', '-', $text));
$text = preg_replace("/[^A-Za-z0-9\- ]/", '', $text);
Senior Software Engineer - RotoGrinders