Commentary

Find our newspaper columns, blogs, and other commentary pieces in this section. Our research focuses on Advanced Biology, High-Tech Geopolitics, Strategic Studies, Indo-Pacific Studies & Economic Policy

Prateek Waghre Prateek Waghre

Analysis of whitelisted URLs in Jammu and Kashmir

This post was originally published on MedianamaBy Rohini Lakshané and Prateek WaghreThe Supreme Court gave a judgement on January 10, 2020, directing the Central government to review the total suspension of Internet services in Jammu and Kashmir imposed since August 5, 2019, and to restore essential services. In response, the government of Jammu and Kashmir issued a whitelist comprising 153 entries on January 18 and increased the number of entries to 301 on January 24. What would the experience of an ordinary resident of Jammu and Kashmir be like under the whitelist arrangement? We conducted a preliminary analysis to empirically determine whether the 301 whitelisted websites and services would be practically usable and found that only 126 were usable to some degree.Before we delve further into the analysis, it is pertinent to understand the background and context in which an ordinary resident of Jammu and Kashmir may access the Internet. India has experienced the highest number of intentional Internet shutdowns across the world since 2012. . Kashmir has been facing the longest intentional Internet shutdown ever recorded in a democratic country. Voice and SMS functionality, without Internet connectivity, was reactivated on postpaid mobile connections in Jammu and Kashmir on October 14, 2019. People in the Kashmir valley can access the Internet only through the 844 kiosks run by the government.

Under three orders (dated January 14, 18, and 24) issued by the government of Jammu and Kashmir:
  1. 2G Internet connectivity would be reinstated on postpaid mobile connections in 10 districts of Jammu Division and 2 of Kashmir Division.
  2. “The internet speed shall be restricted to 2G only.”
  3. 400 additional Internet kiosks are to be installed in Kashmir.
  4. Social media websites, peer-to-peer (P2P) communication apps, and Virtual Private Networks (VPNs) services have been explicitly prohibited.
  5. ISPs are to provide wired broadband to companies engaged in “Software (IT/ ITES) Services”.
  6. For wired connections, Paragraph II of the order dated January 24 states, “For fixed-line Internet connectivity: Internet connectivity shall be [made] available only after Mac-binding.
  7. Voice and SMS functionality would be restored on prepaid mobile connections across all districts of Jammu and Kashmir.
  8. For providing internet access on locally-registered pre-paid mobile connections, telecom service providers or “TSPs shall initiate a process of verification of credentials of these subscribers as per the norms applicable for postpaid connections”.
  9. “The ISPs shall be responsible for ensuring that access is allowed to whitelisted sites only.”
  10. The order dated January 14 states that it “may be subject to further revision” after which the department would conduct “a review of the adverse impact, if any, of this relaxation on the security situation.” According to the order released on January 24, “the law enforcement agencies have reported no adverse impact so far. However, they have expressed apprehension of misuse of terror activities and incitement of general public…”
  11. “Whitelisting of sites shall be a continuous process,” which could be interpreted to mean that the government would periodically update the list.

Thus, an ordinary internet user in Jammu and Kashmir accessing the Internet under this whitelist arrangement would be doing so via 2G mobile connections or Internet kiosks placed inside government offices.

Questions raised by a selection of entries in the whitelist

  1. In the orders dated January 14 and 18, the Government of Jammu and Kashmir cites the use of the Internet for the following activities as some of the reasons for implementing the total Internet blackout in Kashmir: “terrorism/terror activities”, activities of “anti-national elements”, “rumour-mongering”, “spread of propoganda/ ideologies”, “targeted messaging to propagate terrorism”, “fallacious proxy wars”, “causing disaffection and discontent” among people, and the “spread of fake news”. In light of this explanation, what were the process and criteria applied to select these specific URLs/ services/ websites to be on the whitelist?
  2. What were the process and criteria, if any, to reject websites and services that are similar to those whitelisted and those that provide the same or comparable services? For example, some travel aggregator websites (MakeMyTrip, Goibibo, Cleartrip, Trivago, Yatra, etc) have been included but not others (Agoda, Expedia, Kayak, Hotels.com). Online shopping/e-commerce websites Flipkart, Amazon, Myntra, and Jabong feature in the whitelist but not Snapdeal, Ebay, and others.
  3. How were the residents of Jammu and Kashmir informed about this whitelist, that these specific services/ websites had become accessible? News websites and social media websites are still blocked. The orders will appear in an issue of the gazette, which is just one source of information and not accessible by everybody.
  4. In view of all the above questions, how do the authorised government officers “ensure implementation of these directions in letter and spirit”, as stated in paragraph 7 of the order dated January 14?

Role of Internet Service Providers (ISPs)

The whitelist and its accompanying orders raise some concerns about ISPs’ implementation of the whitelist.

  1. In the case of the entries that contain neither URLs nor qualifying information about including subdomains or about permitting mobile applications, it should not be left to the discretion of an Internet Service Provider (ISP) to determine the appropriate URLs or the appropriate mode of access (mobile or desktop application, mobile or desktop version) of a whitelisted service or website. ISPs are intermediaries and are not authorised to take a judgement call on the orders they receive from the government. Moreover, the whitelist orders explicitly state that the onus of ensuring that sites outside the whitelist remain inaccessible is on the ISPs  (“The ISPs shall be responsible for ensuring that access is allowed to whitelisted sites only.”)
  2. In the case of invalid or indeterminate URLs, how are whitelisted entries to be implemented? What are the options for an ISP to seek clarifications about these from the government?
  3. ISPs have been directed to provide wired broadband to companies in Jammu and Kashmir engaged in “Software (IT/ ITES) Services”. In view of the fact that the terms IT (information technology) and ITES (information technology-enabled services) cover a broad range of commercial activities, how is this directive going to be operationalised?
  4. In a recently published paper analysing how ISPs in India block websites, researchers at the Centre for Internet and Society (CIS) found that ISPs and governments were not willing to disclose the URLs that were blocked. The study also found that less than 30% of blocked URLs were common across the ISPs included in the study, and different ISPs used different techniques to implement blocklists. This is indicative of arbitrary action on the part of individual ISPs. It is also likely that Internet users have limited recourse owing to the lack of transparency in censoring websites. When combined with the need for ISPs to exercise their own discretion/ judgement in implementing these orders (as argued in 1), there is plenty of potential for inconsistent enforcement by ISPs.
  5. It is unclear how ISPs will actually implement this whitelist. If the filtering is done at the DNS layer, then the number of practically unusable websites will likely be higher than what we encountered, since the DNS resolution process itself is likely to be broken for any website that returns anything other than an A record/ IP Address.

Findings and Analysis

1. Entries with no URL

1. Media service providers/streaming services: There are 7 streaming services on the list: Amazon Prime, Netflix, Sony Liv, Zee 5, Hotstar, Voot, and Airtel TV. They support viewing on desktop browsers and mobile apps. This may be a reason why the whitelist only states their names and not the corresponding URLs. Assuming that these services are enabled for use on both desktop and mobile applications, they will still be practically unusable because:

  1. Only 2G speeds are currently permitted in Jammu and Kashmir. 2G speeds are too slow for streaming audio-video and multimedia content.
  2. Streamed content is delivered over CDN (content delivery network) URLs, none of which are present on the current whitelist.

2. JioChat: JioChat is an iOS and Android instant messaging app that supports voice and video calling. It is the only service on this whitelist that supports these functionalities. It is unlikely that this app would be practically usable for video/voice calls because 2G speeds are too slow for it.

2. Government-owned eTLDs

The whitelist includes three entries for government-owned eTLDs (effective top-level domains, also known as “public suffixes”): “Gov.in”, “Nic.in”, and “Ac.in”. The entries do not contain URLs or qualifying information about including subdomains. It should be explicitly stated if ISPs are expected to allow gov.in, nic.in, ac.in, and all their subdomains. For example, gov.in houses four levels of subdomains. Currently, it is unclear how ISPs will interpret and implement this since the entries in the whitelist do not contain adequate information. The directory of Indian government websites is available at http://goidirectory.nic.in.

3. Banking and Finance Services

Log-in pages are on domains or subdomains different from those listed in the whitelist, which is why these services are not practically useful regardless of whether the actual whitelisted URL is accessible/usable. For example,

  1. The website of ICICI Bank https://www.icicibank.com is whitelisted. However, the URL to log-in to personal banking at ICICI is on a subdomain of the website, https://infinity.icicibank.com, which is not whitelisted. So, individuals with an account at ICICI Bank, will not be able to access their accounts online.
  2. While https://www.hdfc.com has been whitelisted, HDFC Bank’s personal banking services are on a different domain, https://www.hdfcbank.com, which will also remain inaccessible.

VPNs and proxy services are prohibited, so an ordinary user would be unable to circumvent restrictions imposed by the whitelist.Of the 15 websites categorised under “Banking” in the whitelist, only 2 (www.jkbankonline.comand www.westernunion.com) had accessible log-in pages/sections and all 15 had at least one identifiable issue when they were accessed with the whitelist restrictions in place.

4. CDN, Sub-Domains, and Third-Party Content

The State of the Web maintained by http Archive indicates that the median number of requests on a webpage for mobile devices is approximately 70. These requests are spread across subdomains of the website, domains owned by content delivery networks (CDNs) such as akamaized.net, cloudfront.net, cloudflare.net, etc., and third-party domains such as Google Analytics, tag managers, real user monitoring tools, advertisers, and so on. The whitelist approach interferes with these requests and more often than not, results in an adverse impact on the functioning of the website itself. In our analysis, we observed that this affected websites to varying degrees:

  1. Minimal visible impact
  2. Some images don’t load
  3. All images don’t load
  4. Critical functions become unresponsive, such as search in the case of some OTAs (online travel agents)
  5. The entire layout scheme breaks

Example 1: Consider www.amazon.in. The request map shows that a significant number of requests are made to domains other than www.amazon.in. Since these requests will be blocked, the website will barely function for the user accessing behind the whitelist. This is evident from the screenshot of the landing page.

Amazon request map

Request map for www.amazon.in

Amazon Screenshot

Screenshot of www.amazon.in

 Example 2: In the case of the website of the Indian Railways, www.irctc.co.in, once again, the request map indicates a large number of requests to other domains. This results in breaking the layout of the page (as is evident in the screenshot), as well as the operation of the website.

IRCTC Request Map

Request map for www.irctc.co.in

IRCTC Screenshot

Screenshot of www.irctc.co.in

Example 3: The website of the Public Works Department of the Government of Jammu and Kashmir, www.jkpwdrb.nic.in, sends no requests to other domains as indicated by the request map and thus the whitelist restrictions have no visible impact. It should be noted that this kind of website setup is uncommon.

JKPWD Request Map

Request map for www.jkpwdrb.nic.in

JKPWD Screenshot

Screenshot of www.jkpwdrb.nic.in

5. Search Engines

The updated list in the January 24 order contains 10 hostnames classified as search engines and www.bing.com classified under utilities.

  1. The whitelist did not include Indian subdomains (google.co.in, in.search.yahoo.com) which means that users may not be able to access them, whether they type it manually or get redirected to the Indian domain of the search engine based on language or browser settings.
  2. The list included Canadian and UK subdomains for Google. It also included the Canadian and French-Canadian versions of Yahoo Search. There was also no justification provided for the exclusion of Indian locales while including non-Indian locales.
  3. We also found that while conducting a search was possible, a user could only successfully navigate to results from websites that were on the whitelist (subject to how they worked as determined by our testing). For websites not on the whitelist, the information contained in the snippets was readable on the search results page, but not beyond it.

So we have categorised search engines as ‘partially usable’.

6. News/Technology Updates

The updated list in the January 24 order also contains 74 websites categorised as “ews”  (60) and “Technology Updates” (14).

  1. There was a mix of regional, national and international websites.
  2. Audio/podcast and video content for all of these sites were either delivered from subdomains/CDN domains or YouTube and hence did not work.
  3. International publications such as The Washington Post, Wall Street Journal, and The New York Times allow limited views before enforcing a paywall. However, their sign-in pages were not accessible. In such cases, even if the websites were minimally visually affected, they were categorised as ‘practically not usable’.
  4. For the remaining, we observed that the impact to page layout varied in degrees:
    1. All pages and UI elements were broken.
    2. Only the Home page was broken.
    3. Only subsection pages were affected.
    4. Only article pages were not affected.

The categorisation between usable, partially usable, and not usable was done on the basis of how easy or difficult it was to consume content and navigate within each website.

Broken Page Screenshot

Screenshot indicating broken page layout

7. Additional Observations

  1. Mail: The whitelist included 4 webmail services. However, none were usable since the sign-in pages required navigating to domains that were not on the whitelist. They have been categorised as ‘practically not usable’.
  2. Entertainment: The updated list from the January 24 order also included 7 entertainment sites along with URLs which made testing them possible (this in contrast to the 6 listed in the January 18 order that did not include URLs and only named the services). Only one (https://wynk.in) of these was able to stream content successfully. It was categorised as ‘practically usable’ even though it may be difficult to stream content on a 2G network. 6 out of 7 have been categorised as ‘practically not usable’. It should be noted that such content is typically consumed on apps that were not tested as a part of this exercise. Apps generally use different hostnames to request resources.
  3. Official websites of apps: The whitelist includes Gingerlabs.com, the official website for the note-taking mobile app Notability. Another entry, Kinemaster.com is the official website of the eponymous video-editing app for Android and iOS. The website enables users to get user support and interact with the community of users. For the purpose of this analysis, the websites were tested and categorised as per their usability. It should be noted that new downloads would not be possible since the Apple App Store and Google Play Store are not included in the whitelist. It is also unclear if users who already have these apps installed will be able to use them since the apps may not use the same domain(s) to make requests.
  4. URLs that contain paths: Two URLs on the whitelist contain specific paths (www.marutisuzuki.com/MarutiSuzuki/Car and https://www.heromotocorp.com/en-in/). It is unclear how ISPs could whitelist these two entries without whitelisting the domains Marutisuzuki.com and Heromotocorp.com.

Summary of Findings

Number of entries in the whitelist 301
Number of duplicate entries 13
Number of invalid URLs 4
Number of entries with no specified URL and no qualifying information about the website/service 8
Number of inconclusive/indeterminate entries 6
Number of URLs after validation and de-duplication 270
Number of websites that are practically usable 58 Most of these websites are largely comprised of textual information.
Number of websites that are practically partially usable 68 Some important features are adversely affected.
Total number of websites usable to some degree 126
Number of URLs in the list (no protocol or http) that default to https 94 out of 270 These may not work in actual use cases because of the redirect to https.

 

Usability by ‘Field’ Practically Usable?
Field (as specified in the whitelist) Could Not Test No Partially Yes
Automobiles 1 1 1 1
Banking 8 7
Education 25 14 7
Employment 1 1 1
Entertainment 7 8 1 2
Mail 1 3
News 6 18 17 19
NGOs 1 4
Search Engines 1 4 5
Services 4 5 1 3
Technology Updates 8 4 2
Travel 3 13 1 3
Utilities 8 49 15 15
Weather 1
Web Service 1 1
Total 31 144 68 58

*The detailed results from testing all entries in the first version of the whitelist as recorded on January 22 and 23, IST is available here. We updated the set of results on 26 January to reflect the next version of the whitelist, available here. This version carries over all entries of the previous one unchanged.

Method

Testing URLs on an Unrestricted Internet Connection

To test if all entries in the list were functioning, we first accessed them using an India IP address on an unrestricted 4G connection. The ones that were not functional were categorised as:

  1. Invalid URL: 4 URLs are invalid. One (www.hajcommitee.gov.in) contains a typographical error. 3 others are badly formed (https://www.google.com > Gmail; https://oppo-in; www.google.com > chrome [sic]).
  2. Duplicate URL: 13 URLs were found to be duplicates of other entries. 3 URLs are present on the list along with their respective redirected versions. For instance, www.trivago.com redirects to https://www.trivago.in, both of which are present on the whitelist. We excluded the former from our analysis and considered the redirected version. The other two instances are Airtel.in and Cleartrip.com.
  3. Entries with no URL specified: We have excluded 8 entries that are names of services and not URLs. 7 of these are media services providers such as Netflix and Amazon Prime.
  4. Inconclusive entry/indeterminate URL: 6 URLs returned an error message and were excluded. 3 of those — Gov.in, Nic.in and Ac.in ⸺ did not include a protocol (http:// or https:// or the www. prefix). The DNS registration for Gov.in and Nic.in had also expired as indicated by WHOIS at the time of writing this analysis.

The results have been logged and categorised according to this schema in the detailed analysis (available here):

Is the URL accessible? This column logs the results of a preliminary check for URLs that lead to error messages, such as broken links and websites/ webpages that are misconfigured. The results are categorised as:Yes: The URL is accessible invalid URL; Duplicate URL; No URL specified; Inconclusive entry/ Indeterminate URL: The URL or whitelist entry is not accessible for reasons described above.
Does the URL redirect to another? The column indicates whether a URL redirects to another URL by default. Categorised as: Yes/ No
Redirects to This column specifies the redirect target URL if it exists. Categorised as:No redirect https: The initial URL on the whitelist either contains http or no protocol is specified. It redirects by default to its https version, with the rest of the URL being identical.For example, www.moneycontrol.com on the whitelist redirects by default to https://www.moneycontrol.com.<URL>: The initial URL on the whitelist redirects by default to a URL with a different path or prefix. In such cases, the redirect target URL is specified here.For example, https://www.icicidirect.com redirects to https://www.icicidirect.com/idirectcontent/Home/Home.aspx
Remark/Observation Observations based on the testing so far.

Whitelist Testing

The 270 URLs that remained were put through whitelist testing via a Chrome browser extension called Whitelist Manager, via a 10 Mbps connection. This extension can be configured to restrict users from accessing any URLs except whitelisted ones.The results have been logged and categorised according to this schema (available here):

Page Layout This column logs how the page appears visually to the viewer. Classified as either Intact or Broken.
    1. Intact: The website was visually identical with and without the whitelist restriction in place.
    2. Broken: Its appearance was significantly altered when accessed with the whitelist restrictions.
    3. Inaccessible due to redirects: The website automatically redirected to another domain that was not on the whitelist. No further analysis was possible in such cases.
Images loading? Categorised as Yes/ No/ Partial.
  1. Yes: All Images appeared on the website even with the whitelist restrictions.
  2. No: No images appeared on the website with the whitelist restrictions.
  3. Partial: Some images on the website loaded with the restrictions.
Has sign-in? This column logs whether the website provides its users with an option to sign-in for its services or for personalised content. Categorised as Yes/No.
Sign-in section visible? This records if the sign-in page accessible or the sign-in section on the website is functional with the whitelist restrictions in place.
  1. Yes: Sign-in page was under the whitelisted domain OR sign-in section of the website was responsive even if the page layout was broken
  2. No: Sign-in required accessing a non-whitelisted domain OR sign-in section of the website was non-responsive.
  3. Partial: The website also provided 3rd Party authentication options via Facebook/Google etc. which were not accessible.

Note: The actual sign-in process was not tested for every website. There is potential for additional website failures if this relies on calls to non-whitelisted domains.

Other functions affected? A subjective assessment of whether other parts of the website were impacted by the whitelisting restrictions. If any were found, these were listed in the ‘Specify’ column. This assessment should be considered indicative and not exhaustive.
Practically usable? A subjective assessment of whether the website could still be used or not.
  1. Yes: Main features were not affected OR the website offered limited functionality, to begin with that wasn’t impacted.
  2. No: Website is unusable as some key features are not functional OR visual elements were missing/ broken to such an extent that it could not be used in any meaningful way.
  3. Partial: Some features (mainly textual information) were still functional.

Limitations of Our Method

  1. We tested the whitelisted entries for usability via a whitelist management extension for the Chrome browser. Results may differ if another whitelist management software were used on a different browser. However, the difference will not be large and significant enough to change our final assessment of whether the website was usable or not.
  2. We conducted the tests on a 10 Mbps connection. We did not use the bandwidth throttling feature on Chrome since the primary intent was to determine whether the sites were accessible or not. In the actual use case, people will visit the whitelisted entries via 2G connections with which the websites that we were able to access may not be reachable in a reasonable amount of time.
  3. We did not sign-in to any of the websites, try to write and send an email, carry out a financial transaction or upload a document such as a tax filing. Doing these activities may significantly alter the final assessment regarding their usability.
  4. 94 URLs (http or no protocol specified) redirect by default on an unrestricted connection to their https version. We have thus tested the https versions only. This was done due to a limitation of the Chrome browser extension we used for the testing. (Refer to Column E entitled “Does the URL redirect to another?” in the spreadsheet containing detailed analysis.) However, these 94 URLs may not function in the actual use case in Kashmir depending on the ISPs’ implementation of the whitelist.
  5. We focused on visual elements and usability only. We ignored the impact on analytics, monitoring tools as long as it did not impact the ability of an end-user to navigate the website. This is, however, bound to be a matter of concern for website operators.

*Rohini Lakshané is a researcher and technologist. She is Director (Emerging Research), The Bachchao Project.Prateek Waghre is a Research Analyst at The Takshashila Institution, a centre of education and research in public policy.

Read More