Trivial Botherations: 2013

Sunday, December 15, 2013

A Christmas Gift to Myself : Raspberry Pi

Here are some ideas me and my friend Mohit are going to implement when our Raspberry Pi's reach us -

Personal search engine : I have a big collection of ebooks and publications I constantly need to query. So, putting its search index on a webserver would make it accessible to me everywhere and thus fasten up my research quite a lot. I'm going to use recoll for the PSE.
Bit Torrent Sync - Personal data back-up server.
Social network data mining - Using facebook and gmail API this could be easily implemented. Although, I need to find more open source algorithms to run. I am also open to implementing research algorithm [1]. Looking for suggestions here.
Mint Server (mint.com) - My bank sends me alerts after every successful transaction. This makes it easy to implement my own mint.com server and also a splitwise.com.
Bitcoin mining
Tor server - This was Mohit's idea. This reminds me of Oscar Wilde's quote.
A personal home music server (Mohit's idea)
A live twitter feed monitoring and LED lighting - something on these lines.

I'm also going to get a Raspberry Pi camera module. It would be nice to create a compressed time-lapse video. I need more suggestions here too.

PS : comments and suggestions required.

Friday, December 13, 2013

Hacking Dropbox

Having spent a lot of time finding tools for file syncing without a cloud and those that would work behind proxies, I decided to write my own Dropbox app to synchronize data across computers.

But Dropbox already does that. The point is to be able to do that using the free 2GB space that Dropbox provides. The motivation is that this way the synchronization should work behind every network routers (which must not block dropbox) that blocks torrent data. (ofcourse you can also avoid that using SSH tunneling).

So I decided to register a Dropbox developer app and use the Dropbox API to synchronize files using only the free 2GB storage. The only constraint now is that the individual files cannot be more than 2GB, but that is easily solved by breaking up the file using rar. Note that, the same could be done using Google Drive which provides 15GB of free space.

The Dropbox API documentation is very neat. I used the core Dropbox API with full access to user data for this app. The code below just authenticates the app to allow access to the user account. Once provided the authentication is then saved to a file, which when detected is used again.

import dropbox
import email, imaplib, os
import urllib2

##sign in to DROPBOX
app_key = '<key>'
app_secret = '<secret>'

flow = dropbox.client.DropboxOAuth2FlowNoRedirect(app_key, app_secret)
app_auth = False # set to false initially
if os.path.exists('accesstoken'):
 app_auth = True

if (app_auth == False):
 ### authorize server
 authorize_url = flow.start()
 print '1. Go to: ' + authorize_url
 print '2. Click "Allow" (you might have to log in first)'
 print '3. Copy the authorization code.'
 code = raw_input("Enter the authorization code here: ").strip()
 access_token, user_id = flow.finish(code)
 ## save access token once
 f = open( 'accesstoken', 'w' )
 f.write( access_token )
 f.close()
else:
 f = open( 'accesstoken', 'r' )
 access_token = f.read()
 f.close()

client = dropbox.client.DropboxClient(access_token)
print 'linked account: ', client.account_info()

The server and client communicate uses gmail. When the server uploads a file it emails the client the share link for the data. The client continuously monitors it's mailbox. As soon as it receives an email from the server, it downloads the data and sends back an acknowledgement mail. The server receives the acknowledgement, deletes the previous file, uploads another and sends the share link to the client again.

##sign in to GMAIL
user = "<server>"
pwd = "<password>"

m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user,pwd)
m.list()
m.select("inbox")

Here's how I implemented the rest of it (thanks to these webpages 1, 2, 3) . The script for the client side would be similar

##main for loop
while True:
 resp, items = m.search(None, 'TO', '"<server>+python@gmail.com"')
 items = items[0].split()
 print items

 emailid=items[-1]
 resp, data = m.fetch(emailid, "(RFC822)")
 email_body = data[0][1]
 mail = email.message_from_string(email_body)

 if mail['Subject'] == "done"
  ##delete file
  client.file_delete('/'+os.path.basename(uploadfilepath))
  
  ##ask for another file (or just read from a file list) to upload directory in dropbox = main
  uploadfilepath = raw_input("enter (absolute) path to file").strip()
  uploadfile = open(uploadfilepath)
  response = client.put_file('/'+os.path.basename(uploadfilepath), uploadfile)
  print "uploaded:", response

  ##create share link
  sharelink = client.share("/"+os.path.basename(uploadfilepath), False)
  print sharelink['url']+"   "+sharelink['expires']

  ##send email
  from send_email import mail
  mail("client+python@gmail.com",   sharelink['url'],    "")
  print "email sent"
  
 time.sleep(25000)

The send_email script called above is the following :

import smtplib
from email.MIMEMultipart import MIMEMultipart
from email.MIMEBase import MIMEBase
from email.MIMEText import MIMEText
from email import Encoders
import os

gmail_user = "<id>@gmail.com"
gmail_pwd = "<passwd>"

def mail(to, subject, text):
 msg = MIMEMultipart()

 msg['From'] = gmail_user
 msg['To'] = to
 msg['Subject'] = subject

 msg.attach(MIMEText(text))

 #part = MIMEBase('application', 'octet-stream')
 #part.set_payload(open(attach, 'rb').read())
 #Encoders.encode_base64(part)
 #part.add_header('Content-Disposition','attachment; filename="%s"' % os.path.basename(attach))
 #msg.attach(part)

 mailServer = smtplib.SMTP("smtp.gmail.com", 587)
 mailServer.ehlo()
 mailServer.starttls()
 mailServer.ehlo()
 mailServer.login(gmail_user, gmail_pwd)
 mailServer.sendmail(gmail_user, to, msg.as_string())
 # Should be mailServer.quit(), but that crashes...
 mailServer.close()

Wednesday, December 11, 2013

File Sync without the Cloud

I had to share some large data files with a friend recently. As I was going through the list of usual options :
1. Opera Unite
2. Torrent sharing
3. Dropbox
4. Google Drive
5. wetransfer.com
6. justbeamit.com

I found no fast and cheap option. The painful thing about the good (fast) services is that they do not provide only sync across computers, but require you to upload data to their servers, which is both expensive and insecure.

I found an awesome software, Bitsync, that just works on any device. Yes, even my phone running android. No need to setup port forwarding or go through any tech hassles. And the best feature of all, it's FREE. It's also very fast. Usually when I had to resort to torrent-sharing the speed of download was very slow, but bitsync somehow finds the optimal path.

There are other companies that provide the same service. The best among them is aerofs . You wouldn't require it's paid services for most personal use. Here's a list of all such softwares on how-to-geek .

Saturday, November 23, 2013

Counting

Much of information and coding theory involves counting and combinatorial reasoning. For example how many non-touching balls of unit radius can fit in an n-dimensional space ${\mathcal{F}_2}^n$. Combinatorial arguments come in very handy in these situations. The probabilistic method is another example of the kind of arguments I came across in information theory literature. One good non-technical example that changed the way I count is -

Q. Consider a group of $2^6 = 64$ tennis players. These players are now matched pairwise and the losers are eliminated from the pool. From the remaining pool of victors, players are paired again and the process is continued. How many games are required to find a single winner?

A. 63

Method 1. One way (the straightforward method atleast before reading the alternate way) to compute the answer is to count the number of matches in each round i.e. 32 + 16 + 8 + 4 + 2 + 1 = 63.

Method 2. Realize that each match eliminates 1 player, hence to eliminate 63 players you need 63 matches.

Tuesday, November 19, 2013

Interesting Arguments

An interesting line of reasoning that I came across in my Combinatorial Theory course is
the minimal element method.

The general outline of a proof using this method is as follows : Suppose you want to prove that a given set of elements satisfy a given property $P$. Let the set of bad elements not satisfying that property be denoted by $\mathcal{B}$. Assume that $b$ is the minimal element of $\mathcal{B}$ in some sense. Then using the assumptions of the claim, show that there exists another element $b^\prime \in \mathcal{B}$ such that $b^\prime < b$.

Another interesting style of argumentation which is generally seen in information theoretic arguments is the probabilistic method in which a deterministic solution to a given problem is shown to exist by constructing a random candidate for a solution, and showing that this candidate solves all the requirements of the problem with positive probability.

Saturday, October 26, 2013

Typesetting math equations in Latex

While writing any document in $\LaTeX$ using the amsmath package many wishes need to fullfilled. The most cumbersome task that troubles me frequently is typing the right and left delimitors (say \lvert \rvert) seperately. Why can't there be a command that does that on its own. Turns out that somebody did write a package to improve upon amsmath : mathtools .

The most useful feature I found is, allowing the user to define paired delimitors For example, instead of typing,

\left\{ \frac{a}{b+c} \right\}

You can just define paired delimitors and use them as follows,

\usepackage{mathtools}
\DeclarePairedDelimiter{\set}{\lbrace}{\rbrace}
\set*{ \frac{a}{b+c} }

The output is the same as before,
$$\left\{ \frac{a}{b+c} \right\}$$

Using the * after the command resizes the delimitors to fit the vertical length of the expression.

Another interesting tool in this package is allowing tags for equations to have user-defined labels. See section 3.2.2 in mathtools .

Another great tool for publishing documents on the web is mathjax. Recently mathjax allowed users to use mathjax scripts through their plugins on popular web platforms.

Wednesday, October 23, 2013

Subadditivity, Limits and Lim Inf

Consider a sub-additive sequence $a_n$ for $n \geq 1$, where sub-additivity is defined as follows :
$$a_{n+m} \leq a_n + a_m$$

Fekete's Lemma says that for any such sequence $\lim_{n \rightarrow \infty} a_n/n = \lim \inf_{n \rightarrow \infty} a_n/n $, which can be easily proved. The proof is as follows-

Let $\lim \inf_{n \rightarrow \infty} a_n/n = l$. Therefore, $\exists K$ s.t.
$$\left\vert \frac{a_K}{K} - l \right\vert = \epsilon/2$$

Now consider a large enough $L$ such that $ \frac{a_r}{KL} < \epsilon/2, \forall r < K$. Thus, for every $n \geq KL$ we can write $n$ as $n = Kq + r$, where $q \geq L$ and $r<K$. Therefore,

\begin{align}
\frac{a_n}{n} & \leq \frac{q a_K}{Kq+r} + \frac{a_r}{Kq+r} \\
& \leq \frac{q a_K}{Kq} + \frac{a_r}{Kq} \\
& \leq \frac{a_K}{K} + \frac{a_r}{Kq} \\
& \leq l + \frac{\epsilon}{2} + \frac{\epsilon}{2}\\
\end{align}

Since this is true for all $n > KL$ the limit exists and is equal to $l$. An alternate but similar proof is given here.

Thursday, October 17, 2013

Pairwise Independence, Mutual Independence and Coding Theory

Consider a set of variables $\{X_1, X_2, ..., X_n\}$. A pair of variables $X_i, X_j$ are pairwise independent iff $P(X_i,X_j) = P(X_i) P(X_j)$. And any subset $A_k =\{X_{i_1},X_{i_2},...X_{i_k}\}$ of random variables is mutually independent if

$P(X_{i_1},X_{i_2},...,X_{i_k}) = \prod_j P(X_{i_j}) $

Bernstein gave an example of a set of random variables, of which any two are pairwise independent, but the set as a whole is not. A general class of examples which can be easily constructed from this set is the following -

$X_1 = A_1$
$X_2 = A_2$
$X_3 = A_3$
$\vdots$
$X_k = A_k$
$X_{k+1} = A_1+A_2+...+A_k$

where $A_i$'s are a set of i.i.d. mutually independent random variables, uniformly distributed in $\mathbb{F}_2$. Now it can be easily seen that any size $k$ subset $S_k$ of $\{X_1, X_2, ..., X_{k+1}\}$ is a mutually independent set, but the $k+1$ set $\{X_1, X_2, ..., X_{k+1}\}$ has a degenerate probability distribution given $A_k$ i.e.
$P(X_1, X_2, ..., X_{k+1} | A_k) \in \{0,1\}$

Now one can observe from the example that this is an example of parity check codes (wiki). Also it is easy to observe that parity check codes are MDS codes. (wiki)

Now, given a general set of $n$ random variables which is $k$ independent (any $k$ subset is mutually independent) is it possible to construct an MDS code from these?

I give below conditions in which MDS code are constructable :

The random variables $X_i$ are uniformly distributed in a field $\mathbb{F}_q$.
The set $A_n = \{X_1, X_2, ..., X_n\}$ is $k$ independent i.e. any $k$ subset $A_k$ is mutually independent
The probability distribution of $A_n$ given $A_k$ is degenerate i.e.

$P(A_n | A_k) \in \{0,1\}$

Now, given these conditions it can be easily seen from the proof of the Singleton bound (for general non-linear codes) that $\{X_1, X_2,...,X_n\}$ is an MDS code [length, information bits, min. distance] = $[ n, k, n-k+1]$.

Wednesday, October 9, 2013

Pirates, treasure, and data security

Suppose you have a file that you want to divide between $n$ users. Now the division should be such that whenever any $k$ of the $n$ persons collaborate they know the complete file otherwise they must not have any information about the file.

Interestingly the solution to this problem is related to the solution to the following problem :

Thirteen pirates go on an extended voyage, pillaging and plundering from Africa to Asia. By the end they have quite a stash--too much to take back with them. They decide to lock it in a chest, leave the chest on an island, and come back for it a year later. Of course, not being terribly trusting, they want to ensure that none of them can come back early and claim the treasure for himself. They could just put 13 locks on it and each take a key, but a pirate's life is dangerous--they may not all be around in a year. What they want is for any majority of the original thirteen to be able to open the chest, while any fewer will be locked out. How many locks will it take for them to achieve this? The locks they are using are quite simple. Each key opens only one lock (no master keys), but keys can be duplicated, so multiple pirates can have keys to the same lock.

Solution1:

For the pirate problem you should choose ${n \choose {k-1}}$ locks and distribute the keys such that for every group of ${k-1}$ users there is a lock that they do not have the key for.

This solution easily extends to the first problem. Let the data file be $x \in \mathbb{F}_q$. Now generate ${n \choose {k-1}}$ random variables $R_i, i \in \{1, \ldots, {n\choose {k-1}}\}$ and give each user the information $x + \sum_{i=1}^{n \choose {k-1}} R_i$. Now, the values of the random variables can be distributed like the keys of the lock in the previous puzzle, and we are done.

Solution2:

This is the solution proposed by Adi Shamir (the guy behind Bitcoins), here.

Basic Idea : If you are given any $k$ points of a $k-1$ degree polynomial $f(x)$, you can determine the polynomial. So,

secret := polynomial coefficients
user data := $n$ points of the polynomial

Ofcourse, the polynomial is over a finite field. You need only $k$ random variables in this scheme to share one bit ($a_0 = [x^0] f(x)$) instead on ${n \choose (k-1)}$ in the previous scheme.

Saturday, July 20, 2013

A chess post

To improve one's chess play, reading books and famous games is an established but boring way to improve your skill. So I have resorted to playing continuously instead which is far more entertaining but also far slower. Since the last month I started playing I have played 500 (!!) 10 min games on chess.com .

Occasionally I look at famous games. My amateur chess expert has advised me to look at the games of Paul Morphy first. I started looking at the games but then I realized that although improving is only possible when you analyze games, it is far more rewarding and easier to analyze your own games. I played this interesting chess game recently. I am going to analyze my own and my opponent's mistakes from the principles I have gathered. Please post improvements/suggestions in comments.

Also since its turning out to be very difficult to include a dynamic chess board on blogger, I am just including the link to the game here : game?id=561308873 .

I am playing white in the above game. The winner makes the last mistake here, as always.

Black's second move seems intimidating. The bishop on b4 pins my d4 pawn. I don't know what the best defense to this is, so I move my knight and hope to quietly slip my bishop in d2 as in move 5. (I don't want the bishop to take my knight and destroy my pawn structure).

Normal development follows till move 9. White's 9. o-o is weakening. Black could have played 9. .. Bxc3, and after that 10 .. Bxe2. To defend that white should have played 9. Nd5 forking the Queen and the bishop.

Black's Queen remains on the open e-file for too long for white to play tactics such as 11. Bxa6 . Black's move 16 is also restricting. Qf5 would be better since it maintains pressure on the d4 square while providing the Queen room to move. Similarly white 17. Qe1 should have been Qe2, since it backs both the white knight and bishop.

A better move for white instead of 18. d5 would be Bd5.

Now 24. Ne5 leaves black with little choice. The only move to protect a mate would be Qf5 or Qg5. Both of which is countered by g4.

Sunday, July 7, 2013

Experiments with Neodymium magnets

I bought a set of neodymium to play with. When you are bored, without a computer, they come in handy.

The set consisted of 216 small 4-5mm balls of magnets. The magnets came in a 6x6x6 shaped cube. That shape was very unstable and stuck to the metallic box. On trying to break apart the cube the cube disintegrated into chains.

Trying to put back the magnets into a cube is not very easy. I can say that, because my friend, a physicist, also failed at the first attempt. Here's my approach and the difficulty I face along with the tricks involved.

My approach was to create a stack of 6 rings of the spheres each ring consisting of twelve balls.
Three stacks are required to make a cube. Here what the stacks looked like.

Now I flattened out all the stacks and formed sheets of 6x6x2,

Now I just stuck the stacks together and I had my cube.

The problem

The problem was to get a stack to align properly. Actually, there are two possible ways,

Proper (left) and improper (right) alignment

The one on the left is the desired one.

Proper (top) and improper (bottom) alignment

Also all the three stacks must align properly, although you don't have to combine those stacks. Because otherwise the 6x6x2 sheets would not stick together to form a cube. One of the sheets would slide over the other similar to the bottom chains in the above figure.

The trick

Suppose you are trying to join two stacks of rings and just would not align properly. The trick is to flip one stack and then try again. There's a catch here. If both of the stacks consist of even number of rings, flipping won't work. So adding rings one by one to a stack always works.

Suppose now you have formed three stacks of rings, all properly aligned. But all the three stacks do not align properly and so you cannot stick the respective sheets together. Here, atleast two stacks must be aligning properly and the third one would not align with any of the other two. So instead of trying to re-create the third stack ad hoc, if all the rings of the third stack are inverted, it works.

Here's the underlying diagrams.

Magnets aligning to form rings

The spheres colored regions inside the spheres represent the north and south poles of the magnet. So if two different configurations are superimposed they align properly, otherwise flipping one of them works.

But suppose instead of a ring we have stacks of even number of rings ( both aligned properly ), flipping does not change the configurations of the stack and hence they will always align wrong. So inverting a ring, as follows, is required.

Inverting a ring (or stack of rings)

Here are some more configurations. Interestingly, the hexagon is more stable than the cube.

Monday, June 24, 2013

Reverting to google chrome's old "new tab" page

This happened to me after a chrome update. The following new tab page was enabled by default.

The problem was also that my omnibox was not in focus every time I opened a new tab. Fortunately I found a solution online. This feature can be disabled by going to "chrome://flags" and selecting the "disabled" option for Enable Instant extended API

Monday, May 13, 2013

Speeding up MATLAB code using matlab distributed computing toolbox

MATLAB's distributed computing toolbox is a great tool to fully utilize your multi-core processors. If you have the distributed computing toolbox installed, the setup is quite easy. The steps to start parallel processing on your localhost are given below. To use a cluster of computers through the LAN, the setup is similar.

The steps given are for Windows 7, though the steps for linux are the same.

First open a command prompt using "Run as Administrator"
Go to %matlabroot% directory which is usually "C:\Program Files\MATLAB\R2011a"
Now cd to "toolbox\distcomp\bin"
Install the mdce service using "mdce install"
run admincenter
Add hosts (your computer is localhost)
start mdce service
start jobmanager
start workers

Now open matlab go to "Parallel" menu. Manage Configurations and create a new job manager.

In the properties, in the Jobs tab, job manager hostname should be "localhost". Job manager's name should be the name you used earlier.

In the Jobs tab, input the minimum and maximum number of workers. The number of workers should be less than the number of processors in your computer. Click OK and you are done.

Now that the distributed computing toolbox is set up you can use the "matlabpool" command and the "parfor" loop to speed up code.

If the mdce service, the jobmanager and the workers keep running, the system gets slow. Therefore, to stop the workers, in the command prompt run :

%matlabroot%\toolbox\distcomp\bin\admincenter

Now stop the workers and the job manager. To stop the mdce service use the command :

mdce stop

Note that you must have the mdce service, the job manager and the workers running to be able to use matlabpool effectively.

Tuesday, April 30, 2013

Equivalent definitions of continuity of functions

This may seem trivial but, the following definitions of continuity of a function $f(x)$ on a metric space are equivalent -

$\forall \epsilon > 0$, $\exists \delta > 0$ such that for every point $x$, $d(x,y) < \delta \implies d(f(x), f(y)) < \epsilon $
$x_n \rightarrow x \implies f(x_n) \rightarrow f(x)$

I will just give the proof for $2) \implies 1)$, since the other way round is easier.

To find the $\delta$ for any $\epsilon > 0$ in $1)$, Consider the set $\mathcal{A}$ of all sequences $x_n \rightarrow x$ . Now, from the definition of continuity in $2)$ we know that for all sequences $f(x^i_n) \rightarrow f(x)$ for $\{x^i_n\} \in \mathcal{A}$. Hence, $\exists N^i$ such that $\forall n>N^i$ , $d(f(x^i_n), f(x)) < \epsilon$. Let

$ \delta = \arg_{x^i_N} \min_{x^i_n \in \mathcal{A}} d(x,x^i_N) $

Thus this value of $\delta$ can be used to find the neighbour-hood in $2)$.

Saturday, April 27, 2013

Windows equivalents for Linux

One of the major factors that prevented me from using ubuntu regularly was the absence of a software as versatile as IDM for speeding up content download from the internet. Recently I found a great equivalent to IDM in ubuntu : axel

Here's how to use the axel command :

axel -N -n <number of connection> "<url>"

where the -N option is for not using the proxy server.

To download videos in linux its easy to use vlc and record the stream. A great pdf editor for linux is described here.

Saturday, March 23, 2013

Web Scraping using Python

Python is a powerful and versatile programming language. Its versatility is increased by the various modules developed for wide applications. It even has a MATLAB library module NumPy.

Many times I needed to download multiple files from a webpage, say wallpapers, and all the usual methods don't work. Here I list the "usual" methods :

wget : download webpage content from the linux terminal
Download them all plugins in firefox and their counterparts in google chrome

Many times I tried to extract the jpg links in a webpage using python regular expression (regexp) . Recently I needed to download a large database provided on a webpage which would not yield to any of the above methods. The webpage contained a javascript form to provide the download link and it used some validation methods which rendered creating the download links using simple regexp useless.

Here is a powerful alternative to avoiding the drudgery of manually downloading files, using a python module : Mechanize. Below is the code I developed to download a webpage.

First, this is the webpage that I wanted to download. Note that the download link is provided using a javascript form. Also note the onSubmit attribute in the form tag which is the validation mechanism. So if you simply create the download link using the data provided in the html page, the downloaded file would be corrupted.

The following code first sets the appropriate variable to their appropriate values and then uses mechanize to download the file.

Now, to create the list of pages containing the download links, I used the BeautifulSoup and the re (regexp) module to parse HTML.

To save the data structure (lists) used to store the download links pickle is a useful library. It can also be made searchable as shown here.

Mechanize has some major disadvantages that I quickly realized. The biggest is that you cannot download webpages that are dynamically loaded i.e. downloaded in multiple tries using javascript i.e. When you first download a webpage you do not download the complete webpage but HTML+javascript code. The browser then executes that javascript code to download the rest of the webpage. Now mechanize does not automatically execute the javascript code to complete the download, so you may not get the content that you want using mechanize.

Another option is easy to use is windmill . Windmill is a powerful library but it is not properly documented and hence difficult to use. The limited documentation can be found here .

Another python module for web scraping is scrapy . This one I have yet to use.

A few good beginner tutorials for python :

MIT OCW : Introduction to programming
Python for beginners

Usually, its faster to just dive in, choose a project and complete it using online documentation and user forums. In this regard stackoverflow.com is a particularly good place to post your queries.

Saturday, February 16, 2013

Points on a semi-circle

Suppose you choose $N$ points uniformly randomly on a circle. What is the probability that all the points lie on a semi-circle.

Random walk on a circle

Let $x_1, x_2, ..., x_n$ be n points (in that order) on the circumference of a circle. Dana starts at the point $x_1$ and walks to one of the two neighboring points with probability $1/2$ for each. Dana continues to walk in this way, always moving from the present point to one of the two neighboring points with probability 1/2 for each. $P_i$ is the probability that the point $x_i$ is the last of the n points to be visited for the first time.

1. Show that $P_i$ is the same for all i.

2. Find $P_i$