Chasing ghosts in the wire

Developing an effective strategy to identify & combat synthetic identity fraud

alt text

There has been much discussion in recent years of the threat posed by an emerging class of fraud: synthetic identity theft. Due to its inconspicuous nature and tendency to successfully evade existing threat detection mechanisms, it can be extremely costly to the affected parties and is the fastest growing variety of fraud, according to a Federal Reverse whitepaper published on the subject. There are a number of factors that I can see that contribute to this huge increase.

• Exposed PII

According to the Identity Theft Resource Center, the number of exposed PII records increased by 126 percent between 2017 and 2018, with more than 446 million records exposed due to data breaches. This is to say nothing of the enormous volume of personal information people voluntarily make available on social networks like Facebook, Twitter, and Instagram.

• Risk vs. Reward

Organized crime continues to gravitate towards low-risk, high-reward activities. We can see the shift in card fraud where fewer and fewer criminals are willing to risk arrest by using a physical “cloned” card in a store when they could achieve the same end sitting at home. The explosion of synthetic identity theft can partially be attributed to the same motivation.

• Lack of Effective Countermeasures

It’s very difficult to accommodate legitimate consumers whilst defending against the kind of sophisticated attack described here because the “behavior” is pretty much exactly the same. I have some ideas on how banks could build out better systems to distinguish between the two.

To better understand how big of a threat synthetic identity theft poses to the payments industry, I wanted to approach the problem from a perspective of an adversary. I created a “framework” to automate most or all of the activity involved in the fraud process (I have a lot of experience doing web automation/scraping). Some researchers have already recognized this as the next logical progression for savvy criminals. Much of the work involved is extremely tedious and this is without question the biggest bottleneck. As will become clear later, this automation and scaling I’ve put in place makes the fraud even harder to detect, because the heuristic data collected on each “person” is highly variable and I diversify almost every possible marker collected.

Disclaimer: This is not intended to be a guide on how to organize a criminal operation. To that end, code has been omitted for the most part and is only described qualitatively. This information is already widely available, by demonstrating the real threat this poses we hope to increase the pressure to develop solutions to help solve this problem.

The Process

Synthetic identity theft is a very interesting process from the point of view of a security researcher. For one, it stands out from long-standing traditional attack vectors that everyone is familiar with, such as CNP (Card Not Present) Fraud, CP (Card Present) Fraud, Generic Identity Theft, Phishing Campaigns, etc.

1. Gather Materials

In order to create a synthetic identity “profile”, which I use to refer to the “fake” identity that gets created from various pieces of disconnected information, a number of things are required: an SSN, basic PII like a Name, Address, Age, Gender, State of Residence, Employers, and personal history. For the personal information, I used a number of libraries (see references) to generate some of the fake information that seemed less relevant to any kind of verification procedure (like personal history or employer) but made sure to use real information for the rest.

My suspicion is there is some level of verification going on where if they can see the this name and date of birth does belong to some real person, all of it is looked on as more likely to be legitimate.

Next comes an SSN. Although ID Analytics estimates that nearly 40 percent of synthetic identities use a randomized SSN, it seemed smarter to instead “harvest” the SSNs in a different way: • Based on certain input parameters (age, gender) I can quickly generate all possible SSNs that fit each state’s format. can be very useful in this regard.

• Using a data broker lookup service, indirect queries get run for these SSNs in such a way as to not cause alarm (also using a bot, of course). All the SSNs which did not return any match in the database are recorded. These SSNs are valid and fit the age/gender requirements needed (which is important because conceivably, later a real operative might need to assume ownership of the SSN for use in person). However, note: these SSNs DO NOT belong to a real person. This is a key distinction because (at least in my view) the use of anyone's real SSN (even if other information is changed) still firmly falls under the umbrella of classic identity theft. In this case, no one's credit history is affected and the only impact is on the banks and lenders. This means the fraud is way less likely to ever even get reported or identified because, well, there's not really a victim (unless you look at the morally upstanding big banks as victims- I don't). In any case, these SSNs become the seed that will end up blossoming into a (for all intents and purposes) real identity.

2. Identity Creation

We begin by amassing a huge list of “seeding” targets- sites that we suspect or know collect PII and then sell this information to data brokers and credit agencies. Many of them I found just by asking the data broker’s where they get their information. Using my own web automation toolkit based on headless chrome/puppeteer (puppeteer-theater), I automated the process of signing up the profiles for all sorts of rewards programs, survey sites, and loyalty cards. Basically any site that screams “I’m selling your information!”, I tried to include.

Because of the way I designed the framework, it is extremely hard to flag the activity as automated bot traffic. All actions like mouse movements, clicks, human hesitation, are implemented in such a way so as to make the activity virtually indistinguishable from a human being. Each browser profile is always unique and rotating proxies are also used to obscure the source of the traffic. For more technical specifics on the code, see the documentation: pupeteer-theater. I think the framework accurately reflect the kinds of advanced tactics one would expect from increasingly complex and sophisticated organized crime circles. I’ve also worked with it for so long that plugging in the steps needed to complete the signup/verification/whatever process for these sites doesn’t take very long.

As a precaution I also chose to randomize the process of selecting sites to sign each profile up for which I suspect will minimize any red flags. Towards the end of this step, I also opt-in on behalf of each profile for prescreened credit offers using: Bottom line, I’m just taking advantage of the fact that all of these companies make money off of harvesting people’s data, most of the time in an underhanded sleazy way that people are kept in the dark about. It’s essential to set up all the digital signatures that are associated with a real person before applying for credit.

3. Apply for Credit

At this point, data associated with our profiles has started to trickle in to the major data brokers and credit agencies. Using a huge list of different “check for pre-qualified offers” type sites, the bots complete the applications using the profiles information. I know for a fact that at least one major bank has employed some countermeasures to anticipate this type of automated activity, but once again, in the hands of a technically savvy individual it is trivial to circumvent them. The applications are almost always denied (it’s disturbing any are approved…), but that’s not the point: By applying a ‘hit’ or blank file has been created in the credit agencies system. This file gets paired with all of the other information we already pumped into the system behind the scenes.

4. Build Credit

At this point, most of the profiles are now in the system, which gets verified by a bot that polls from the data broker’s system to check for any hit on each SSN. When they do appear in the system (which is sort of really cool in and of itself), these profiles get added as Authorized Users (aka “piggybacking”) to existing, legitimate accounts- which gives them a tangible credit history. Without going into detail about how it works, basically by adding an authorized user the entire credit history of the primary account holder gets stuck to the new user’s credit. ID Analytics estimates that nearly 50 percent of synthetic identities use piggybacking to build credit. The other way to build credit is just to do it like a normal person. Sure, it does take some investment, but if you’re an organized crime syndicate I’m sure you can afford to make payments on time. The cost is nothing when down the line it’s possible to get approved for a 50,000$ loan. I also included some stuff like where you trick these greedy e-commerce retailers into doing a soft-pull because you like expensive stuff in your cart.

5. Scores

Bots sign-up each profile to major credit monitoring services: CreditKarma & Experian. They can complete all of the sign up and verification process automatically because the kinds of questions asked are always predictable, and the answers are known from the information used to create the profile, so it’s quite simple for bot to choose correctly. Once the account is created, the profile gets set aside for monitoring. Periodically, the bots poll for the latest credit report information for each profile and record any changes to the score and information.

So now the profiles have matured and are more or less in the clear from detection. There are a lot of different things criminals do with them after this point, like getting an EIN from the IRS, starting a business and listing other profiles as employees of that business, applying for loans, luxury cars, anything that excellent credit history can buy you, they’ll buy. When eventually the bank comes to collect, they can’t get in touch with this person and throw them to collections. But yeah, good luck collecting money from someone who doesn’t exist.

Some thoughts about Mass Surveillance aka “Targeted Advertising”

The only reason synthetic identity theft is possible at all is because of the massive effort by powerful companies to squeeze as much personal and private information from online consumers at possible. The moment you visit a site, your browser is fingerprinted, your actions analyzed, and internally an algorithm is busy crunching numbers to determine how the company can best snag your interest and convince you to buy their products. Now, because these companies are so greedy, they’ve gone ahead and tied this information to their credit offerings. America is addicted to cheap credit and the banks and big businesses know this. There is even a technique (which is a small part of the synthetic identity creation process) known as the shopping cart trick where, essentially, you can strategically visit a e-commerce website, add pricy items to your cart while logged in, do this a bit, and you’ll get an offer for the organization’s awful credit card. In fact, you’ll actually be more likely to get approved for that card and potentially others as well. All these guys are sharing data to some degree. Your private personal information is how they turn a profit.

A full technical writeup will be available in the future describing the planning, execution, and results of this research endeavor.