Scanning the web for ALL available French one-word domains

A while ago, I set out for quite an ambitious project. I needed a list of all domains containing just one French noun that are available as both .com and .de. The process seemed easy enough. Basically, I would have to

  1. Get a list of all French words
  2. Sort that list so it only contains nouns
  3. For each noun, check if the .de and .com domains are available

As it turns out, the project is not too hard once you’ve found the right tools. So in this article, I am going to share my experiences and the method I used for everyone who is either interested in this process or just likes reading about crazy projects.
Just one more note before I start: Many of the scripts I used are not very nice and clean from a coding perspective, they are mainly quick-and-dirty implementations of what I needed. For my purposes, this was more than enough. Quite frankly, I don’t care if the scripts used in this projects are nice and high-performance C++ applications or just quickly hacked together PHP scripts. They both do their jobs just fine and originally, I didn’t intend to share them. But as I think this turned out to be quite a fun project, I changed my mind. Enjoy! ūüôā

Getting a list of all French words

Of course, obtaining a list of all French (or whatever language you might be interested in) words is the most fundamental step. After searching around for a bit, I found that most UNIX-based distributions have a /usr/share/dict directory that contains word lists. This also applies to my BSD-based Mac but, unfortunately, mine only had English word lists. And, even though, the accompanying README file told me a French word list is also available, the mentioned FTP server apparently seems to be offline:

Dictionaries for other languages, e.g. Afrikaans, American, Aussie, Chinese, Croatian, Czech, Danish, Dutch, Esperanto, Finnish, French, German, Hindi, Hungarian, Italian, Japanese, Latin, Norwegian, Polish, Russian, Spanish, Swahili, Swedish, Yiddish, are available at ftp://ftp.ox.ac.uk/pub/wordlists.

So, I had to look around a bit more to eventually stumble across the wfrench Debian package. Obviously, Linux distributions also come with wordlists and as opposed to my Mac, they luckily have a well-managed package system that allows everyone to download the packages and extract the wordlist as a nice plain text file.
Note: These packages are of course also available for other languages. So, if you’re looking for a German wordlist, for example, you can use the wngerman package (a wogerman package is also available but that one follows old spelling rules).

French Wordlist

Extracting nouns from the wordlist

Now this step seems very hard as it’s of course not easy for a computer to¬†know word classes and you don’t want to manually sort a list of several hundred thousand French words for nouns. Luckily, there is a tool called TreeTagger that is able to tag sentences is a variety of languages (including French).

Using a node.js wrapper, I wrote the following simple script that takes our wordlist and outputs all nouns in that list to the console:

Now, all that was left to do was run the script and print the results to a text file: node treetagger.js > french_nouns.txt Three things to note on that one:

  • The tagging of word classes is different for each language. In my example, I had to use NOM but in English, you would have to use NN.
  • Automatically doing the tagging is of course not going to be perfectly accurate.¬†You will be missing some nouns while some¬†non-nouns¬†will also slip into your list.¬†It is, however, safe to assume that the results are still going to be a lot better than if we were to manually tag the list.
  • The script does not filter out inflected nouns, so e.g. you will have both abjection and abjections on your list.

Checking for available domains

After all that work, we still have to do the main part: Check whether or not the nouns we’ve found are actually available for registration. While there are some domain availability APIs offered by domain registrars, you would generally use whois for that purpose. If you query a whois server for a domain that is not registered, you will get a response like that (Note that the response is different depending on the server):

Benjamins-MBP:~ benni$ whois thisdomainisnotatallregistered.de
Domain: thisdomainisnotatallregistered.de
Status: free

This solution would be easy to implement in PHP via a socket connection, but there is one problem. As our initial requirements, we said that the domain had to be available as .de and .com. While querying the .com whois server appears not to have a limit on the amount of queries in a certain time frame, whois.denic.de most certainly has and that limit is unfortunately quite strict. Luckily, I stumbled across freedomainapi.com. They offer a free API that outputs JSON, so it is easy for us to query that from PHP. Here is the simple script I wrote:


function checkdomain($domain) {
$api_req = file_get_contents('http://freedomainapi.com/?key=YOURAPIKEY&domain=' . $domain);
$api_req = json_decode($api_req, true);
if($api_req['available'] == 'true') return true;

return false;
}

The final thing left to do was to remove all accents etc. as I didn’t want to query for IDNs. For that, I stole a function from WordPress. Conveniently, that function also works for languages other than French, so you can reuse my code more easily.
The entire PHP script can be found here.

I will have to add two notes on this one as well:

  • This approach is incredibly inefficient. Before querying the API (or whois server, whichever method you choose) you should at least run a DNS query. If that returns any results, you can be sure that the domain is registered and can discard it immediately.
  • I have since found that some whois servers (like¬†whois.crsnic.net and whois.nsiregistry.net) seem to also respond for .de domains and not have a limit on the number of requests. Might be worth investigating as an alternative to the API.

And there you have it. That’s all you need to get a list of all domains containing just a single French noun that are available as .com and .de. Unfortunately, there’s not many good ones among them but it was still a fun project ūüôā

Backup Mavericks to AFP share on Debian with Time Machine

I love Time Machine. For years, I have been backing up my MacBook to an external USB 3.0 hard drive and I have never experienced any problems, neither with the backup nor the restore. Time Machine runs conveniently in the background, distracting me only when necessary (e.g. when there hasn’t been a backup in the past 10 days).
Personally, I use Time Machine mainly for backups but on a few occasions, it has also served me well for restoring a previous version of a file I accidently changed or deleted.

At the moment, I am setting up a backup server for my network and, of course, I wanted to use Time Machine here, as well.
To give you a quick overview of my prerequisites:

  • Debian server running Debian 7.6 wheezy, 4 TB hard drive mounted to /storage/three
  • MacBook Pro running Mac OS 10.9.4, Time Machine backup configured to external USB drive

After a bit of googling, I found that it should be possible to configure Time Machine to backup to a samba share after enabling a hidden setting.¬†However, for me this didn’t work, so I decided to use Apple’s AFP. In the following, I am going to describe this process. If you have any questions or suggestions, please feel free to leave a comment.

Setting up the server side

On the server, we mainly need two packages.

  • netatalk is going to emulate the AFP protocol (which is similar to SMB but supposedly superior in Mac-only environments)
  • avahi to emulate the Apple Bonjour or Zeroconf service which enables automatic discovery of our AFP share

To install both of them, run the following command:

aptitude install avahi-daemon avahi-discover libnss-mdns netatalk

Next, you will have to add a new user who can access the AFP share. To make things a little easier, I recommend using the same username as on your Mac.
Run the following command after replacing the bold values with the ones appropriate for you.

useradd -d /home/yourusername -s /bin/bash -c "Your full name" yourusername
mkdir /home/yourusername
chown yourusername /home/yourusername
passwd yourusername

After running the last command, you will be prompted to enter a password for the user twice. Note, that you will not see the letters you type.

If it doesn’t exist already, create a folder for your Time Machine backups. In my case, this is:

mkdir /storage/three/TimeMachine

Finally, you will have to configure the actual AFP share. Use your favorite editor to edit /etc/netatalk/AppleVolumes.default. If you want to use nano, for example, this is the command:

nano /etc/netatalk/AppleVolumes.default

At the end of this file, you will find the following lines:

# By default all users have access to their home directories.
~/     "Home Directory"

This enables every user on your server to access their home folder (/home/yourusername) via AFP. We don’t need this for Time Machine, so you might want to¬†remove this line but you can also leave it in there if you wish.

You will, however, need to add the following line:

/storage/three/TimeMachine "Time Machine backup" allow:yourusername options:tm

This creates a Time Machine-enabled AFP share for the directory /storage/three/TimeMachine and allows the user yourusername to access it.

You can also limit the amount of disc space Time Machine uses by appending volsizelimit. The following example allows Time Machine to take up 500 GB.

/storage/three/TimeMachine "Time Machine backup" allow:yourusername options:tm volsizelimit:500000

Once you are done, save the file and close your editor (for nano: Ctrl + O, Enter, Ctrl + X).

The final thing to do on the server is to restart the netatalk service to apply the changes you made to the config file.

service netatalk restart

Setting up your Mac

You should already see your server in Finder (Mine is called backup1).

AFP backup server in Finder
AFP backup server in Finder

Connect to it and mount your share Time Machine backup. You will be prompted to enter the credentials for the user you created a minute ago. I recommend saving the details to your Keychain.

As your Debian server is not an officially supported Time Machine backup destination, it might not show up in the Time Machine preferences. To fix that, run the following command in the Terminal. This might not be necessary on every Mac and you can try and continue without it but in case it doesn’t work, try running the command.

defaults write com.apple.systempreferences TMShowUnsupportedNetworkVolumes 1

Now, all that’s left to do is to configure a regular Time Machine backup to your AFP share.

Configure Time Machine to backup to your AFP share
Configure Time Machine to backup to your AFP share

And that’s all there is. I hope this tutorial helped you. As mentioned earlier, feel free to use the comments for any questions or suggestions.

Encrypting email

Introduction

Hi there!
You are probably reading this because of one of the following options:

  • You received an email from me and the signature contained a notice about encrypting email. You are interested in this and would like to find out more. Great! Just keep reading.
  • You googled (other search engines are available) for information on encrypting email. Congrats! You found the right place. Just keep reading to learn more.
  • You stumbled upon this page while browsing through my website and this article seemed interesting. Obviously, I advise you to keep reading as well ūüôā

No matter how you came across this page, you are here now and this page is likely to contain information relevant to you if you care about your privacy.

Why would I encrypt email?

As you may have heard in the news recently, Edward Snowden has leaked information about intelligence services (like the NSA or the GCHQ) spying on everyone of us. But it doesn’t stop there. Companies (like Google, Yahoo or Microsoft) can also read your email, e.g. to use it for personalized ads. And I haven’t even mentioned hackers yet.
This is possible due to the limitations underlying our email protocols. Back in the day, when “email” was designed, it wasn’t intended to be used on such a large scale as today. It was designed mainly for scientists to be able to share their discoveries. In many cases, early SMTP servers were only accessible from within a specific organization. Therefore, not a lot of attention was put into security. A popular example is the ability to set the FROM-email address to any email address (including of course ones from other people).[1]

Now, you may ask yourself: Well, there may be quite a few entities that¬†can potentially intercept my emails. But why would they choose me? Also, even if they did, ¬†I wouldn’t care. If they really want to read those “interesting” mails of mine, they are free to do so. I have nothing to hide…
This is (unfortunately) how many people think. But however much I would like them to be right, they just aren’t. Let me give you some examples to think about.
Do you really want literally anyone to be able to read the private mails you sent to your friend? Do you want big companies to know exactly what you do, like or think? Do you want these companies to sell this information to others, making you the product?
It is an open secret that companies do make profiles of their users. We have known for many years that Google tracks the pages their users visit and a lot more, claiming to only use this information to enhance the user’s experience. But they aren’t the only ones doing this. In fact, many companies even admit spying on their users in their terms of service (even though hidden inside of many pages of legal stuff).

How do I actually encrypt my mails?

I hope by now I have convinced you and you agree that you should at least encrypt some of your mails. Now the question is how you actually do this. Many people think it is a hard thing to do and requires a lot of knowledge regarding computers. However, in reality it is really easy to do. There is just one thing to keep in mind: In order to be able to exchange encrypted mails with someone, both sides need to have a so-called public and private key. Furthermore, they should, of course, both know how to read and send encrypted mails (which, again, is really simple).

While there are technically quite a few standards on email encryption out there, most of them are build upon PGP (Pretty Good Privacy). Almost everyone uses a program called GnuPG (Gnu Privacy Guard) which is an Open Source implementation of the OpenPGP standard based on the original PGP. By using this program, you will be able to exchange encrypted mails with virtually anyone who is also encrypting mails.

In this article, I am not going to go into how you actually set up GnuPG on your computer. This article is only intended to be a starting point. I might write articles on that in the future, but for now I really recommend the “Email Self-Defense” guide published by the Free Software Foundation (FSF). Here are some other great links that will get you started with GnuPG:

Exchanging encrypted mail with me (Benjamin Altpeter)

If you are here because of the first reason I mentioned at the beginning of the article, this is probably what you care about the most. I encourage anyone who wants to send me and email to encrypt (and sign) it. Here you will find my public key that is needed in order to do so.

You can download it right here: https://benjamin-altpeter.de/00EB2372.asc

As the main reason of email encryption is to make sure that only the desired recipient (in this case me) can read an email, it is important that the public key you downloaded is not compromised. For example, a hacker might gain access to this website and upload another key, therefore being able to read the emails you encrypt for that key.

There are several ways of verify the integrity of my public key. One of them is just sending me an email. I will then provide you with further information on how to prove you actually have the right way.
If you are looking for a quick and easy way and are willing to trust a third-party, you can take a look at my Keybase profile.

As a last thing, I would like to note that neither of these methods is not entirely secure, either. If you really want to make sure you have my key, you will, unfortunately, have to talk to me in person…

Conclusion

As you see, encrypting mails is not a hard task, but it does most certainly provide a lot of value to you. Anyone who is concerned about their privacy on the internet should look into that.

If you want to, please feel free to share this article with your friends and family to inform them about email encryption and start exchanging encrypted email with them!

My thoughts, opinion and some useful tutorials