Categories: SEO

How Can We Solve Autogenerated URL Errors Once & For All?

Today’s “Ask An SEO” question comes from Bhaumik from Mumbai, who asks:

“I have a question about automatically generated URLs. My firm had previously used different tools to generate sitemaps. But recently, we started creating them manually by selecting URLs that are necessary and blocking others in robots.txt.

We are facing an issue now with more than 50 auto-generated URLs.

For example, we have a page called “keyword keyword” URL: https://url.com/keyword-keyword/ and we have another page knowledge center URL: https://www.url.com/folder/keyword-keyword.

In coverage issues, we are seeing errors under the 5xx series which created totally new URLs something like https://test.url.com/keyword-keyword/keyword-keyword. We tried many ways but we are not getting the solution for this one.”

Hi Bhaumik,

It’s an interesting situation you’re finding yourself in.

The good news is that 5XX errors tend to resolve on their own, so don’t worry about that one.

The cannibalization issue you’re facing is also more common than most people think.

With ecommerce stores, for example, you could have the same product (or the same collection of products) appear in multiple folders.

So, which one is the official one?

The same goes for your situation in the B2B finance space (I removed your URL above and replaced it with ”keyword keyword.”)

This is why the search engines created canonical links.

Canonical links are a way to tell search engines when a page is a duplicate of another, and which page is the official one.

Let’s pretend you sell pink bunny slippers.

These bunny slippers have their own page, they’re on sale, they appear in footwear, and also in pink.

  • url.com/products/pink-bunny-slippers.
  • url.com/on-sale/pink-bunny-slippers.
  • url.com/products/pink/pink-bunny-slippers.
  • url.com/category/footwear/pink-bunny-slippers.

The first URL above is the “official version” of the URL.

That means it should have a canonical link pointing to itself.

The other three pages are duplicate versions of it. So, when you set up your canonical link, it should reference the official page.

In short, you’ll want to make sure all four pages have rel=”canonical” href=”https://url.com/products/pink-bunny-slippers” as this will deduplicate them for search engines.

Next, you’ll want to make sure that you remove all duplicate versions from your sitemap.

A sitemap is supposed to feature the most important and indexable pages on your website.

You do not want to include non-official versions of a page, pages disallowed by robots.txt, and non-canonicalized URLs in your sitemaps.

Search engines do not crawl your entire website every time – and if you send them to unimportant pages, you’re wasting your ability for proper crawling and indexing.

There is another situation that can occur here.

If you have site search enabled, it can also create URLs that are duplicates.

If I type “pink bunny slippers” into your site’s search box, I’m likely going to get a URL with the same keyword phrase in the URL – and also with parameters on it.

This would further your problem, and your IT team will need to programmatically set the canonical links to the search results along with a meta robots for noindex, follow.

One other thing to look for is: If I click to the pink bunny slippers page from the search result, these parameters may stick.

If they do, take the same steps mentioned above.

Using proper canonical links and ensuring your sitemap doesn’t have non-official pages will help solve the duplicate page problem and help ensure you don’t waste a spider’s visit by having it crawl the wrong pages on your site.

I hope this helps!

More resources:


Featured Image: Leremy/Shutterstock

Editor’s note: Ask an SEO is a weekly SEO advice column written by some of the industry’s top SEO experts, who have been hand-picked by Search Engine Journal. Got a question about SEO? Fill out our form. You might see your answer in the next #AskanSEO post!

FOLLOW US ON GOOGLE NEWS

 

Read original article here

Denial of responsibility! Search Engine Codex is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@searchenginecodex.com. The content will be deleted within 24 hours.

Share
Chris Barnhart

Leave a Comment
Published by
Chris Barnhart

Recent Posts

Daily Search Forum Recap: May 16, 2024

Here is a recap of what happened in the search forums today, through the eyes…

May 17, 2024

Google Ads Restricts Brand Names & Logos From AI Generation

Google has provided details about the capabilities and limitations of its AI image generation tools…

May 16, 2024

Google March 2024 Core Update: Major SEO Changes Explained

On March 5, 2024, Google announced the launch of the March 2024 Core Update. The…

May 16, 2024

Google AI Overviews = Theft? Court Ruling Sets Precedent

Google’s bold new vision for the future of online search, powered by AI technology, is…

May 16, 2024

Chatbots And AI Search Engines Converge: Key Strategies For SEO

A lot is happening in the world of search right now, and for many, keeping…

May 16, 2024

Google Says Sites Hit By Helpful Content Update Could See Improvements With Next Core Update

Google's John Mueller was asked when can a site expect to recover from the September…

May 16, 2024