Install Yahoo! LDA In Ubuntu 11.04 Server
The following tutorial guides you through installing Yahoo! LDA code to a newly installed Ubuntu 11.04 server. Apart from the default installation, the only package that is assumed to be installed from the installation menu is open-ssh server.
1) Install JAVA
1 2 3 4 | sudo apt-get install python-software-properties sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner" sudo apt-get update sudo apt-get install sun-java6-jdk |
JDK can now be found in /usr/lib/jvm/java-6-sun witch is actually a symlink on Ubuntu.
2) Download source code
In order to continue with the installation make a directory to your home folder (or wherever you like). This folder will be the folder that the application will reside after the installation. I have an apps folder inside my home so i created an LDA folder inside apps.
1 2 | mkdir ~/apps/LDA cd ~/apps/LDA |
Download source code from Github and extract.
1 2 3 4 5 6 7 8 | wget https://github.com/shravanmn/Yahoo_LDA/tarball/master extract code tar -xzf master cd shravanmn-Yahoo_LDA-* mv * ../ cd .. rm -rf shravanmn-Yahoo_LDA-* rm master |
3)Install system prerequisites
Install build-essential
1 | sudo apt-get install build-essential |
Install emacs
1 | sudo apt-get install emacs |
Install ant
1 | sudo apt-get install ant |
4) Make
Inside the LDA directory we created
1 | make |
5)Install Ice
The only problem with make is Ice, this must be compiled seperatelly.
Install system prerequisites.
Install libbz2-dev package.
1 | sudo apt-get install libbz2-dev |
Install xml parser.
1 | sudo apt-get install expat libexpat-dev |
Also install libssl-dev.
1 | sudo apt-get install libssl-dev |
Install mono-develop and some needed packages.
1 | sudo apt-get install mono-complete |
Finally install mandotory dev packages.
1 | sudo apt-get install python-dev ruby ruby-dev php5-dev |
Download third party sources for Ice.
1 2 3 4 5 | wget http://www.zeroc.com/download/Ice/3.4/ThirdParty-Sources-3.4.2.zip sudo apt-get install unzip unzip ThirdParty-Sources-3.4.2.zip rm ThirdParty-Sources-3.4.2.zip cd ThirdParty-Sources-3.4.2 |
Install derby db.
1 2 3 4 5 6 | unzip db-4.8.30.NC.zip cd db-4.8.30.NC cd build_unix ../dist/configure --prefix=/usr/local/berkeleydb --enable-compat185 --enable-cxx --enable-debug_rop --enable-debug_wop --enable-java make sudo make install |
Locate the db.jar file in order to have the extra java classes. It should be here:
1 | /usr/local/berkeleydb/lib/db.jar |
Copy the jar file to your classpath.
1 | sudo cp /usr/local/berkeleydb/lib/db.jar /usr/lib/jvm/java-6-sun/lib |
Also will need classes from these packages included in third party sources directory.
1 2 3 4 5 6 7 8 9 10 11 12 13 | cd ../../ unzip jgoodies-common-1_2_0.zip cd jgoodies-common-1.2.0 sudo cp jgoodies-common-1.2.0.jar /usr/lib/jvm/java-6-sun/lib cd ../ unzip jgoodies-forms-1_4_1.zip cd jgoodies-forms-1.4.1 sudo cp jgoodies-forms-1.4.1.jar /usr/lib/jvm/java-6-sun/lib cd ../ unzip jgoodies-looks-2_4_1.zip cd jgoodies-looks-2.4.1/ sudo cp jgoodies-looks-2.4.1.jar /usr/lib/jvm/java-6-sun/lib cd ../ |
Manually have to compile manually some slice libraries in the build/Ice-3.4.1/cpp/src directory. These are slice2php, slice2cs, slice2freezej and slice2java.
1 2 3 4 5 6 7 8 9 10 11 12 13 | cd ~/apps/LDA/build/Ice-3.4.1/cpp/src cd slice2php/ make cd ../ cd slice2cs/ make cd ../ cd slice2freezej/ make cd ../ cd slice2java/ make cd ../../../ |
Point your classpath to the directory you copied the needed jar files
1 | export CLASSPATH=/usr/lib/jvm/java-6-sun/lib/jgoodies-common-1.2.0.jar:/usr/lib/jvm/java-6-sun/lib/jgoodies-forms-1.4.1.jar:/usr/lib/jvm/java-6-sun/lib/db.jar:/usr/lib/jvm/java-6-sun/lib/jgoodies-looks-2.4.1.jar |
Now build Ice
1 | make |
When this ends copy all created files from the lib folder to systems lib folder
1 2 | cd ../../ sudo cp lib/* /usr/lib |
This is all you need to run Yahoo! LDA code in a single machine.
6) Test installation (Batch Mode)
Follow the example referred in the documentation that accompanies code (inside the docs folder). Commands are a little bit altered in order to work for the single node example.
Phase 1 – Tokenization and Formatting
1 2 3 4 | cd ut_out cp ../Tokenizer.java . javac Tokenizer.java cat ydir_1k.txt | java -classpath . Tokenizer | ../formatter |
Phase 2 – Learning the topic mixtures
1 | ../learntopics --topics=100 --iter=500 |
If everything is ok, you will be able to see the word mixtures for each topic
1 | cat lda.topToWor.txt |
and the topic assignments
1 | cat lda.worToTop.txt |
Phese 3 – Testing
Go to ut_test directory.
1 | cd ../ut_test/ |
Copy Tokenizer class.
1 | cp ../ut_out/Tokenizer.class |
Format test data.
1 | cat ydir_1k.tst.txt | java -classpath . Tokenizer | ../formatter --dumpfile=../ut_out/lda.dict.dump |
Learn test topics.
1 | ../learntopics -test --dumpprefix=../ut_out/lda --topics=100 |
Output files are created inside ut_test.