|
This post has NOT been accepted by the mailing list yet.
Hello everyone,
I am working on cleaning large panel data set. I have done some cleaning and now have a list of unique company names. However, there are still some inconsistencies in the spelling of company names that lead to separate cases which I would like to match and consolidate under the same name/id.
I initially tried using reclink in order to match the data to itself. However, when doing this, reclink gives a set of perfect matches (each entry matches perfectly with itself). What I would really like to do is to look for the second-best/non-perfect matches in order to identify misspelled duplicates. To my knowledge, reclink doesn't have an option for ignoring the perfect matches.
Is there a way to do this with reclink? Or is there another method that would work better?
Thanks!
S
|