Selenium WebDriver: getting Twitter handles using Bing
The video below shows a Java program I wrote to convert names into Twitter handles. Given an array of names, it does a site:twitter.com [name] search in Bing for each name, and then looks up the search results on the Twitter mobile site.
The location and bio information from each Twitter profile are written into a text file.
Selenium WebDriver is used to automate Firefox. The Java program has a series of commands, which the WebDriver extension in Firefox then executes: visit a site, fill out a textfield, click a button, etc. Every action on the video was performed programmatically.
Technical details:
A BingSearch object is given the name to search for. It loads bing.com in the browser, fills out the search form, and clicks the button. It can detect no results at all, as well as cases where the only results are spelling suggestions. If there are real results, it returns an array of BingSearchResult objects.
The BingSearchResult object is currently just set to retrieve the plain, unlinked URL in a search result (the div with class sb_meta). That's all that's needed in this case. Doing a getText() on that div can include "Translate this page"; rather than drilling down further than the sb_meta div to just get the plain URL, that text is just replaced with an empty string.
TwitterProfile objects are given a Twitter profile URL; they load that page and get the contents of the .bio and .location divs. The constructor of that object throws an exception if the URL is to just Twitter.com or to an individual status as is sometimes the case.
Notes:
* The names in the video are just for example purposes. If I were looking for Verified accounts I would have built that in to reduce the amount of unnecessary information in the output.
* The video was sped up 2x to make it less boring than it inherently is. The program could have been optimized if necessary, but for what I use it for speed isn't an issue.
* There isn't much that can be done about unnecessary information in the output. For some names, the desired handle is a few results down from the top.