Microsoft Implements URL Keyword Stuffing Spam Filtering For Bing

Microsoft have announced they implemented a specific spam filtering mechanism for their Bing search engine a few months ago that targets a common spam technique known as URL keyword stuffing (KWS).

The announcement by Igor Rondel, Principal Development Manager, Bing Index Quality, came in a posting on the Bing Blog and explains URL KWS as thus:

What is URL KWS?

Like any other black hat technique, the goal of URL KWS, at a high level, is to manipulate search engines to give the page a higher rank than it truly deserves. The underlying idea unique to URL KWS relies on two assumptions about ranking algorithms: a) keyword matching is used and b) matching against the URL is especially valuable. While this is somewhat simplistic considering search engines employ thousands of signals to determine page ranking, these signals do indeed play a role (albeit significantly less than even a few years ago.) Having identified these perceived ‘vulnerabilities’, the spammer attempts to take advantage by creating keyword rich domains names. And since spammers’ strategy includes maximizing impressions, they tend to go after high value/ frequency/ monetisable keywords (e.g. viagra, loan, payday, outlet, free, etc…)

Those are the basic mechanics that comprise the overall URL KWS concept. Looking at it a little closer, spammers employ a variety of approaches to implement this technique, resulting in a number of distinct flavours. These are some of the more common variants (note: some of the URLs mentioned below are fictitious, used to demonstrate the point) –

  • Multiple hosts, with keyword-rich hostnames: http://account.free.online.savings.samedaypaydayloansusa.com
  • Host/ domain names with repeating keywords: http://loan.payday.paydayloanspaydayloansusa.com
  • URL cluster across same domain, but varied hostnames comprised of keyword permutations
  • http://contososhoeswomen.shoesonsale.com/
  • http://bestwomensrunningsneakers.shoesonsale.com/

http://discountrunningapparelforwomen.shoesonsale.com/

URL squatting

This is a little different as the spammer is playing on a human tendency to misspell keywords & in effect syphoning traffic off of existing (typically high profile/ traffic) sites

  • E.g. http://nytime.com(misspelling ofhttp://nytimes.com), http://ebey.com (misspelling of http://ebay.com)

It’s important to note, however, that certainly not all URLs containing multiple keywords are URL KWS spams. In fact, majority are perfectly legitimate non-spam URLs (e.g. http://www.nytimes.com/2011/08/25/opinion/how-to-fix-our-math-education.html.) To ensure high detection precision, this detection technique is typically used in combination with other signals (more on this below.)

Addressing this type of spam is important because a) it is a widely used technique (i.e. significant SERP presence) and b) URLs appear to be good matches to the query, enticing users to click on them.

How do we detect it?

As I mentioned in the previous blog, we will not be giving out specific details on detection algorithms because spammers are likely to use that knowledge to evolve their techniques. I can, however, tell you that we look at a number of signals that suggest possible use of URL keyword stuffing, such as:

  • Site size
  • Number of hosts
  • Number of words in host/ domain names and path
  • Host/ domain/ path keyword co-occurrence (inc. unigrams and bigrams)
  • % of the site cluster comprised of top frequency host/ domain name keywords
  • Host/ domain names containing certain lexicons/ pattern combinations (e.g. [“year”, “event | product name”], http://www.turbotaxonline2014.com)
  • Site/page content quality & popularity signals

To amplify this, we try to cluster sites (by various pivots such as domain, owner, etc…) and then look for patterns of the signals listed above in the same cluster. This helps improve detection precision because spammers often create dozens/ hundreds of similar looking sites.

What has been the impact on the end user & the SEO community?

Users: This update impacted ~3% of Bing queries (on average ~1 in 10 URLs was filtered out per impacted query.)
SEO community: ~5M sites, comprising > 130M urls, have been impacted, resulting in upwards of 75% reduction in traffic to these sites from Bing.

  • Example queries: {hotmail login}, {bestbuy on sale}, {cheap hdtv}
  • Examples of spam sites impacted:
  • www.cheapviagrausa.com
  • www.cheapviagrapharma.com
  • www.buyviagracheapviagraergr.com
  • www.gmailloginsigninup.com

The information in this blog posting original appeared on the Bing Blog at:
blogs.bing.com/webmaster/2014/09/09/url-keyword-stuffing-spam-filtering/

.NL Registry Reaches 5,5 Million Domain Names

SIDN, the Dutch .NL registry, reached the milestone of 5,5 million processed domain name registrations.The company achieved the 5 million milestone in July,2012.

 

.NL is one of the most popular extensions in the world.Individuals are allowed to register .NL domain names for 11 years ,since 2003.

The .NL ccTLD of the Netherlands is the third largest ccTLD in the world after .DE with 15,7 million registrations and .CO.UK with 10 million registrations.

21% of households in the Netherlands own at least one .NL domain name and 83% of firms in the contry own a website.

At the time of writing this article there are 5,500,370 registered .NL domain names.

Check out EuroDNS here to register your .NL domain name.

.XYZ Becomes First New gTLD to Pass 500,000 Registrations

.XYZ is the first new gTLD to reach 500,000 registrations.At the time of writing this article, there are 501,447 .XYZ domain name registrations.

 

377,000 of the 501,447 registrations are from Network Solutions.Onamae.com has 55,000 registrations, while Xin Net Technology Corporation has 26,500 registrations.

According to ntldstats.com, there are now more than 2,2 million new gTLD domain name registrations.Only three new gTLDs ( .xyz, .Berlin and .Club) have more than 100,000 registrations ,five (.xyz, .Berlin, .Club,.Guru and .Wang) have more than 50,000 registrations and  41 have more than 10,000 registrations.

Scottish Independence Referendum Throws Up ccTLD Conundrum

The referendum the Scottish people will vote on 18 September to determine whether the country gains independence from the United Kingdom has a lot of far-reaching implications for the country, one of which that is little discussed is top level domains.

The .scot new gTLD is currently being introduced, but this will not serve as a country code for the newly independent country if its people vote “yes” to independence.

For country codes, Scots like the English, Welsh and Northern Irish have .uk as their country code. But it is likely the Scottish will eventually want to establish their own country code. But which one?

“If Scotland decide to leave, it could start the wheels in motion to have its own two digit ccTLD,” Stuart Fuller, director of commercial operations at NetNames, told Bloomberg. “Still, 22 out of the possible 26 combinations for a .S something are already in use and only .SF, .SP, .SQ or .SW are left — .SC is already assigned to the Seychelles.”

“The timing of the launch of the new [.scot] domain, with general availability due to start just a few days after the referendum result, is no coincidence,” says Fuller.

Some companies have already moved in on .SCOT names, while others may start the process of establishing Scotland’s own ccTLD. Gavin McCutcheon, director of the Dot Scot Registry, said, also speaking to Bloomberg “.SCOT,” launches its General Availability on 23 September.

The country codes are defined by the Swiss-based International Organisation for Standardisation, who develops and publishes international standards. Under ISO 3166, the purpose of these country codes is, the ISO says, to define internationally recognised codes of letters and/or numbers that we can use when we refer to countries and subdivisions. ISO 3166 codes are not only used for domain names, they are also used by all national postal organisations throughout the world for exchanging international mail in containers that are identified with the relevant country code.

Scotland, if it votes yes for independence, will need to have its own codes once it is recognised by the appropriate United Nations bodies, and if so, they will make 250 countries, territories, or areas of geographical interest are assigned official codes in ISO 3166-1.

Duo Of Six-Figure Sales Tops Weekly Chart

Maca.com selling for $150,000 in a private sale and usedcars.ca selling for C$126,000 ($115,920) through iREx.ca topped the Domain Name Journal chart of top reported sales for the week ending 31 August.

The sales easily eclipsed the third placed sale for the week lea.com, which sold for $60,100 through NameJet. And on NameJet, they had a cracker of a week with 12 of the top 20 sales, while Sedo had four and iREx.ca two.

And for TLDs, there were 15 .com sales, two for .ca and one each for .org, .pl and .net.

To check out the chart in more detail, go to:
dnjournal.com/archive/domainsales/2014/20140910.htm