Comparing names with SOUNDEX()
Messy strings like 'Ilynois' instead of 'Illinois' can cause problems when analyzing data. That is why it is important to detect them.
When analyzing the flight_statistics table, you realize that some statistician_name and statistician_surname are written in a different way, such as Miriam Smith and Myriam Smyth. You are afraid there are more differences like this, so you want to check all these names.
You think about comparing with SOUNDEX() the names of the statisticians. If the result of SOUNDEX() is the same, but the texts you are comparing are different, you will find the data you need to clean.
Este exercício faz parte do curso
Cleaning Data in SQL Server Databases
Instruções do exercício
- Select the distinct values of
statistician_nameandstatistician_surnamecolumns fromS1. - Inner join the
flight_statisticstable asS2on similar-sounding first names and surnames usingSOUNDEX(). - Filter out values where the
statistician_nameandstatistician_surnamecolumns are different from each other inS1andS2, respectively.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
SELECT
-- First name and surname of the statisticians
DISTINCT S1.___, S1.___
-- Join flight_statistics with itself
FROM ___ S1 INNER JOIN ___ S2
-- The SOUNDEX result of the first name and surname have to be the same
ON ___(S1.___) = ___(S2.___)
AND ___(S1.___) = ___(S2.___)
-- The texts of the first name or the texts of the surname have to be different
WHERE S1.___ <> S2.___
OR S1.___ <> S2.___