WixFul Thinking: Mining Hidden Wix Pages

Translations:

This post originally appeared on the AutomatingOSINT.com blog.

Wix is a popular free website hosting platform, that has 73 million users across 180 countries (stats from Wix themselves). Recently while working on an investigation, I was spending time hunting around a website that is hosted on Wix and I discovered an interesting anomaly. Wix has a WYSIWYG editor that allows you to build your entire site, and through that editor you are able to create new pages, hide these pages or delete them. What I discovered is that although you can hide content from a passing user, the content is still accessible to the general public by analyzing the underlying code on the hosted web page. This post will walk you through how I discovered this, and how we can write code to automatically extract the data that is hidden from view. Do note that this isn’t really a vulnerability per se, but is an implementation flaw in how Wix renders pages in the browser.

The Wix Editor

To test out my theory, I simply created a new Wix account and used a default template. If you click on the navigation bar of the site, you’ll get a little menu.

windeditor

Here you can click the Navigate button that will open a sub dialog where you can add a new page. Let’s do that, and set the page to hidden.

wixnewpage

Now if you click the Publish button on the top right of the site, your page should be live and you can go and view it. You will notice that your test page that you added and then hid, is not viewable on the site. The interesting thing is that depending on your target site, Google might have had the opportunity to index these pages before they were hidden, and sometimes it won’t. Let’s see how we can extract the hidden page.

Wix Client Side Code

If you do a view source on the page you will see some interesting things right away:

Screen Shot 2015-10-27 at 11.14.27 AM

Wix is telling you something here (this is why you should spend time reading a site’s source code as well) that is really important. Wix relies heavily on client-side processing, and dynamic content loading much like Twitter or Facebook. In this above HTML comment they are giving you an alternate URL scheme to access a search engine friendly version of the page using the following URL format:

http://yoursite.com/?_escaped_fragment=page_name/page_id

We will keep this little nugget of information in our back pocket for when we want to extract content. Keep scrolling through the source code and eventually you will see a Javascript variable with a big blob of code after it (I have cut the output for brevity):


 

var publicModel = {“domain”:”wix.com”,”externalBaseUrl”:”http:\/\/ts686680.wix.com\/boutique-recruitment”,”unicodeExternalBaseUrl”:”http:\/\/ts686680.wix.com\/boutique-recruitment”,”pageList”:{“masterPage”:[“https:\/\/static.wixstatic.com\/sites\/406c29_71b321500c659bb2b255d2c04d9916c3_8.json.z?v=3″,”https:\/\/staticorigin.wixstatic.com\/sites\/406c29_71b321500c659bb2b255d2c04d9916c3_8.json.z?v=3”,
“https:\/\/fallback.wix.com\/wix-html-editor-pages-webapp\/page\/406c29_71b321500c659bb2b255d2c04d9916c3_8.json”],

“pages”:[{“pageId”:”cce3″,”title”:”Contact”,”urls”:
[“https:\/\/static.wixstatic.com\/sites\/406c29_ad9c7e51443384e9823a4bcca63c33ec_1.json.z?v=3”,
“https:\/\/staticorigin.wixstatic.com\/sites\/406c29_ad9c7e51443384e9823a4bcca63c33ec_1.json.z?v=3”,
“https:\/\/fallback.wix.com\/wix-html-editor-pages-webapp\/page\/406c29_ad9c7e51443384e9823a4bcca63c33ec_1.json”]},

{“pageId”:”rz5e9“,”title”:”Test“,”urls”:[“https:\/\/static.wixstatic.com\/sites\/406c29_4f07e319f667853a8ea619bee7dc003d_8.json.z?v=3″,”https:\/\/staticorigin.wixstatic.com\/sites\/406c29_4f07e319f667853a8ea619bee7dc003d_8.json.z?v=3”,
“https:\/\/fallback.wix.com\/wix-html-editor-pages-webapp\/page\/406c29_4f07e319f667853a8ea619bee7dc003d_8.json”]}]


 

Beautiful! You can see all of the pages have entries in this publicModel variable including our test page that was set to hidden. This was the exact anomaly that I was referring to in the opening paragraph of this blog post. By taking a closer look at the JSON we can see a list of URLs that point to some JSON endpoints. If we visit the first URL in the list:

https://static.wixstatic.com/sites/406c29_4f07e319f667853a8ea619bee7dc003d_8.json.z

You will see a full JSON document that has a bunch of information about the page including some of the text content. By scrolling to the very end of the page you should see a key in the JSON called pageUriSEO. This key contains the search engine friendly title of the page. As well looking at the structure  key we see another key named id. By combining these pieces of information, we can then construct our search engine friendly URL that will allow us to download and store the content of the site as it is seen by search engines. If we use the above example on my test site we can test whether we can access hidden content in our browser:

http://ts686680.wix.com/boutique-recruitment?_escaped_fragment_=test/rz5e9

Now that we can do this, let’s create a script that will mine all pages from a Wix site automatically for us including hidden content.

An Extra Tidbit

While reviewing all of this code, I also discovered a key called timeSincePublish which is a timestamp value of when the owner of the site last clicked the Publish button. This is NOT the original publish date of the site itself however. This timestamp can be useful from an investigation perspective if you are building a timeline of events on a target or if you are looking for a date correlation to other events.

Coding It Up

This is going to be pretty straightforward, we are just going to retrieve the target domain’s Wix page, extract the Javascript code and walk through each URL that we discover.

Pretty straightforward code, we are just adding the necessary imports, putting some argument parsing in place and we create a directory to store our results. Now let’s add the first set of requests:

Let’s take a closer look at this code:

  • Line 17: send off the first request to retrieve the home page that will contain the Javascript code.
  • Lines 21-25: attempt to find the publicModel Javascript variable in the content of the page, and if we can’t we bail out.
  • Lines 27-32: we find the end of the Javascript blob (27) and then extract all of the data (29-30) and parse it as JSON to convert it to a Python dictionary (32).

Now we have retrieved the main page JSON we can iterate over the information and begin retrieving all of the published pages for this Wix site. Let’s implement the code to do that.

This is the final bit of code to make this badboy work, let’s take a look at it:

  • Line 35: we are extracting the list of pages from the pageList key in the publicModel Javascript code.
  • Lines 38-40: we grab the first JSON URL from the page list (38) and retrieve it (40).
  • Line 45: here we are building the search engine friendly version of the page so that we can store it as plain HTML. We are using the pageUriSEO key as well as the pageId to build this URL.
  • Lines 47-55: we retrieve the HTML (47) and then store the page (50) and store the raw JSON as well (54) for the page.

That is it! Find yourself a Wix site to target, and give it a run. I also encourage you to test out hiding content on your own test site and see how it works.

 

Finding Wix Sites

Remember my post that dealt with mining Google Analytics codes? If you are looking to find Wix pages to test, just spin up a test Wix account, and point the Google Analytics mining script at your test page. All Wix sites appear to share a common Google Analytics tracking code so you’ll have no shortage of sites to look at when it returns its results. Be patient, there are a pile of results!