Posts Tagged Research

LaTeX and PRISM: syntax highlighting in the listing environment

Posted by on Wednesday, 19 September, 2012

You may be using the PRISM Model Checker for your research. You may also want to put some code in your text, for instance in your PhD thesis. Using and the Listings environment is probably your best bet. Read the rest of this entry »

Google Scholar citation email verification does not work

Posted by on Friday, 3 August, 2012

Anyone who has ever done anything even remotely related to science is probably aware of Google Scholar. There’s this feature where you can keep track of your citations (e.g. how many and which papers by which authors cite your publications). A very nifty feature and quite easy to set up. Your data will only show up after your email address has been verified. And that’s where stuff borked.

The verification link did not work, giving a 404 Not Found error. After inspecting the url I noticed that the dots in my email address had been removed. I figure they probably have some scripts there like php’s
htmlspecialchars() and addslashes() to prevent such things as SQL injection, but apparently this also removes dots from the email address being verified, and then obviously the url does not coincide with what I entered, yielding 404.

Solution: simply add the dots in the url and it works.

spaces in MiXiM makemakefiles

Posted by on Friday, 20 July, 2012

MiXiM uses a ‘makemakefiles’ script, which makes the Makefiles by means of the following command:

make -f makemakefiles

However, after some modifications (adding own stuff) it stopped making new Makefile files. From visual inspection, nothing is wrong with the file. However, there’s a subtle thing you may want to check: the TAB in front of each line. DO NOT make the mistake of adding a few spaces, this is not just there to prettify things, opp_makemake apparently delimits on this TAB and gets severely upset from spaces there.

Removing any spaces and adding a TAB fixes the problem.

Clustercomputing with mpirun and torque

Posted by on Wednesday, 18 July, 2012

Some time ago I wrote about Clustercomputing with Torque, with a focus on discrete event simulations (using the method of independent replications). Turns out that method was not really efficient, so we try something else. Read the rest of this entry »

Finding duplicate files

Posted by on Wednesday, 23 May, 2012

You have a bunch of files (for instance, jpegs). Over time they get moved around, you get a bunch from some family members and before you know it you have a situation where you have the same file multiple times. Of course you can manually sort out these duplicates, but you can also automate duplicate detection.

After a short search, I found this solution on LinuxQuestions.org:

tmp=$(mktemp)
find . -type f |xargs md5sum > $tmp
awk '{ print $1 }' $tmp |sort |uniq -d |while read f; do 
    grep "^$f" $tmp
    echo ""
done

This outputs a list of duplicate files once it has run to completion.

However, it borks when it encounters whitespaces, special characters like the apostrophe etc. A solution:

#!/bin/bash
tmp=$(mktemp)
find . -type f | sed -e "s/'/\\\'/g" |xargs  -I{} md5sum {} > $tmp
awk '{ print $1 }' $tmp |sort |uniq -d | while read f; do 
    grep "^$f" $tmp
    echo ""
done

Use the -I{} and {} to make sure the input to md5sum is not terminated by whitespaces, but only by endlines. Also, the “| sed -e “s/’/\\\’/g”” part replaces every occurence of the apostrophe “‘” with its escaped version “\'” as you would when entering it on the commandeline.

This is able to traverse deep into directory structures, and also accepts any filename I encountered in my dataset. It is however quite CPU intensive, as it calculates the MD5 hash for every file. If you only want to compare based on filename, the whole operation becomes a lot more lightweight.

Duplicate detection with locate/mlocate.db

Actually, it is not necessary to manually index all files, good chance this is already being done by the updatedb cronjob. For instance,

skidder@@spetznas:~$ locate fstab
/etc/fstab

and it also finds some other files containing the string fstab. Unfortunately, mlocate.db is a very simple list of filenames only – a file size and an MD5 would greatly ease the detection of duplicates. So far I have not found a way to do this more efficiently than the shellscript posted above.