In Depth
Entity Resolution: A Powerful Grasp of the Nonobvious
Entity resolution can help ferret out fraud by identifying hidden links and relationships in your databases.
By Simson Garfinkel
In a similar test for a major banking organization, IBM loaded the Entity Analytic Solutions software with 100,000 customer records and 20,000 "bad guys" from World-Check, a company that tracks individuals and organizations that might pose risks to the financial industry. The database was then seeded with 1,387 fictional records containing data from 572 bad guys. The system found 97 percent of the bad guys—but it also found 127 previously unknown relationships that the bank then investigated. Talk about an effective demo!
IBM's Anonymous Resolution is based on the use of hash functions. A typical application might be to see if the same person is receiving aid from multiple organizations at the same time—presumably something that would be in violation of those organizations' rules. Instead of exchanging the actual names of the people receiving aid, the IBM system lets organizations exchange one-way cryptographic hashes of the names. The system preprocesses the names so that minor variations in spelling won't prevent a match, and it allows the organizations exchanging data to further protect the information with a cryptographic key. In theory, such a system could be used to perform a large-scale medical study based on "anonymized" data from hospitals, pharmacies and insurance companies.
But while this kind of anonymous resolution is relatively easy to understand, it has a potential problem: When there is a match, it's possible for the organizations involved to learn the identity of the matching individual by tracing back the matching hash. This may represent an unacceptable opportunity for personal information to leak in some cases.
An entity resolution system that doesn't suffer from this problem was developed by Carnegie Mellon University professor Latanya Sweeney to track clients of domestic violence homeless shelters. This system (which Dr. Sweeney presented at a recent workshop in Cambridge that I organized) uses a special encryption cipher that allows each homeless shelter to contribute information to an encrypted value without being able to decode the information contributed by the other shelters. As a result, it's possible for all of the shelters in a network to determine the number.
Properly implemented, entity resolution systems are a powerful tool for organizations to manage the flow and use of information about people involved with an enterprise. But these systems need to be carefully designed and audited.
For example, the existence of a possible relationship in a data bank does not imply the existence of intent—in fact, it may not even be a real relationship. Instead, it may be the result of an error in one of the source data banks. That's why it's necessary to keep pedigree of every single information element within the system and have provisions for automatically updating the entire index whenever a data source is refreshed.
Data Center Directions Virtual Conference
Attend this free, 100% online event exploring tools and techniques for making your data center deliver for today and tomorrow.
The Surest Path to Effective and Efficient Compliance
In this webcast, we explore why and how with best practices, practical tips and solutions that work to ease your compliance challenge.




