Theodoros Emmanouilidis

Notes & Thoughts
Browsing Computing

Install Yahoo! LDA In Ubuntu 11.04 Server

August9

The following tutorial guides you through installing Yahoo! LDA code to a newly installed Ubuntu 11.04 server. Apart from the default installation, the only package that is assumed to be installed from the installation menu is open-ssh server.

1) Install JAVA

1
2
3
4
sudo apt-get install python-software-properties
sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
sudo apt-get update
sudo apt-get install sun-java6-jdk

JDK can now be found in /usr/lib/jvm/java-6-sun witch is actually a symlink on Ubuntu.

2) Download source code

In order to continue with the installation make a directory to your home folder (or wherever you like). This folder will be the folder that the application will reside after the installation. I have an apps folder inside my home so i created an LDA folder inside apps.

1
2
mkdir ~/apps/LDA
cd ~/apps/LDA

Download source code from Github and extract.

1
2
3
4
5
6
7
8
wget https://github.com/shravanmn/Yahoo_LDA/tarball/master
extract code
tar -xzf master
cd shravanmn-Yahoo_LDA-*
mv * ../
cd ..
rm -rf shravanmn-Yahoo_LDA-*
rm master

3)Install system prerequisites

Install build-essential

1
sudo apt-get install build-essential

Install emacs

1
sudo apt-get install emacs

Install ant

1
sudo apt-get install ant

4) Make

Inside the LDA directory we created

1
make

5)Install Ice

The only problem with make is Ice, this must be compiled seperatelly.

Install system prerequisites.

Install libbz2-dev package.

1
sudo apt-get install libbz2-dev

Install xml parser.

1
sudo apt-get install expat libexpat-dev

Also install libssl-dev.

1
sudo apt-get install libssl-dev

Install mono-develop and some needed packages.

1
sudo apt-get install mono-complete

Finally install mandotory dev packages.

1
sudo apt-get install python-dev ruby ruby-dev php5-dev

Download third party sources for Ice.

1
2
3
4
5
wget http://www.zeroc.com/download/Ice/3.4/ThirdParty-Sources-3.4.2.zip
sudo apt-get install unzip
unzip ThirdParty-Sources-3.4.2.zip
rm ThirdParty-Sources-3.4.2.zip
cd ThirdParty-Sources-3.4.2

Install derby db.

1
2
3
4
5
6
unzip db-4.8.30.NC.zip
cd db-4.8.30.NC
cd build_unix
../dist/configure --prefix=/usr/local/berkeleydb --enable-compat185 --enable-cxx --enable-debug_rop --enable-debug_wop --enable-java
make
sudo make install

Locate the db.jar file in order to have the extra java classes. It should be here:

1
/usr/local/berkeleydb/lib/db.jar

Copy the jar file to your classpath.

1
sudo cp /usr/local/berkeleydb/lib/db.jar /usr/lib/jvm/java-6-sun/lib

Also will need classes from these packages included in third party sources directory.

1
2
3
4
5
6
7
8
9
10
11
12
13
cd ../../
unzip jgoodies-common-1_2_0.zip
cd jgoodies-common-1.2.0
sudo cp jgoodies-common-1.2.0.jar /usr/lib/jvm/java-6-sun/lib
cd ../
unzip jgoodies-forms-1_4_1.zip
cd jgoodies-forms-1.4.1
sudo cp jgoodies-forms-1.4.1.jar /usr/lib/jvm/java-6-sun/lib
cd ../
unzip jgoodies-looks-2_4_1.zip
cd jgoodies-looks-2.4.1/
sudo cp jgoodies-looks-2.4.1.jar /usr/lib/jvm/java-6-sun/lib
cd ../

Manually have to compile manually some slice libraries in the build/Ice-3.4.1/cpp/src directory. These are slice2php, slice2cs, slice2freezej and slice2java.

1
2
3
4
5
6
7
8
9
10
11
12
13
cd ~/apps/LDA/build/Ice-3.4.1/cpp/src
cd slice2php/
make
cd ../
cd slice2cs/
make
cd ../
cd slice2freezej/
make
cd ../
cd slice2java/
make
cd ../../../

Point your classpath to the directory you copied the needed jar files

1
export CLASSPATH=/usr/lib/jvm/java-6-sun/lib/jgoodies-common-1.2.0.jar:/usr/lib/jvm/java-6-sun/lib/jgoodies-forms-1.4.1.jar:/usr/lib/jvm/java-6-sun/lib/db.jar:/usr/lib/jvm/java-6-sun/lib/jgoodies-looks-2.4.1.jar

Now build Ice

1
make

When this ends copy all created files from the lib folder to systems lib folder

1
2
cd ../../
sudo cp lib/* /usr/lib

This is all you need to run Yahoo! LDA code in a single machine.

6) Test installation (Batch Mode)

Follow the example referred in the documentation that accompanies code (inside the docs folder). Commands are a little bit altered in order to work for the single node example.

Phase 1 – Tokenization and Formatting
1
2
3
4
cd ut_out
cp ../Tokenizer.java .
javac Tokenizer.java
cat ydir_1k.txt | java -classpath . Tokenizer | ../formatter
Phase 2 – Learning the topic mixtures
1
../learntopics --topics=100 --iter=500

If everything is ok, you will be able to see the word mixtures for each topic

1
cat lda.topToWor.txt

and the topic assignments

1
cat lda.worToTop.txt
Phese 3 – Testing

Go to ut_test directory.

1
cd ../ut_test/

Copy Tokenizer class.

1
cp ../ut_out/Tokenizer.class

Format test data.

1
cat ydir_1k.tst.txt | java -classpath . Tokenizer | ../formatter --dumpfile=../ut_out/lda.dict.dump

Learn test topics.

1
../learntopics -test --dumpprefix=../ut_out/lda --topics=100

Output files are created inside ut_test.

Search Engine Friendly Redirection

August3

You already have a working site with tons of content and good search engine rankings. What if you need to change your domain name or change the URL base of the site? That would mean that you will loose all current rankings and practically start over building your search engine rank. This would be the case, par example, if you have a forum residing to the path www.yourdomain.com/forum and you want to move your forum to to the root of your domain www.yourdomain.com.

Fear not, you can do this and keep all current search engine rankings using a “301” redirect. “301” redirect is the most efficient and search engine friendly method for webpage redirection. The code “301” is interpreted as “moved permanently” informing the Internet that the original web address has moved permanently to a stated new web address.

Many ways exist to implement a “301” redirect and the choice of the best one depends greatly on the programming language one is familiar with and the exact case that the redirection is needed. I prefer two ways, using PHP or an .htaccess file, depending on the case.

PHP method

Just put an index.php file to the root directory of your old domain containing the following :

1
2
3
4
5
6
<!--?
header( "HTTP/1.1 301 Moved Permanently" );
header( "Status: 301 Moved Permanently" );
header( "Location: http://www.yournewdomain.com" );
exit(0); // This is suggested to avoid any accidental output
?-->

This method will also work if you haven’ t moved your site to another domain but just changed the URL base of your site.

Newer Entries »