Friday, August 29, 2014

C++ MNIST Dataset Parser

C++ Parser for MNIST Dataset (specification can be found in http://yann.lecun.com/exdb/mnist/)

Friday, August 22, 2014

Volunteer work Summer 2014: Defending Indonesia's Democracy with election vote counter - pilpres2014.org

Silicon Valley coder wants to defend Indonesia’s democracy with election vote counter and neat data visualizations

Tomorrow is a monumental day in Indonesia, when the Elections General Commission (KPU) will be announcing who’s the next president of Indonesia based on the official voting tally. In the two weeks since the vote took place, both candidates declared that they have won based on different quick count results, and neither of them are backing down from their claims today. Because of this, many people in the country have turned to tech, creating initiatives such as online crowdsourced vote counts that aim to make the contested count more transparent. The most “open source” initiative of them all is Pilpres2014.org 1.

As with the other vote counting sites that have popped up since the July 9 general election, Pilpres2014 lets you see the counting results based on the vote tally documents released on KPU’s website. Furthermore, visitors can also see data visualizations based on the tallies, like bubble graphs and deep bar hierarchies (which I personally love; see the video below). The data is updated every two hours.

Read more: Silicon Valley coder wants to defend Indonesia’s democracy with election vote counter and neat data visualizations http://www.techinasia.com/pilpres2014-open-source-indonesia-president-election-vote-counting-site/

Media coverages:

http://www.techinasia.com/pilpres2014-open-source-indonesia-president-election-vote-counting-site/
http://tekno.kompas.com/read/2014/07/23/10405767/Bikin.Bangga.Semangat.Kolaborasi.Teknologi.untuk.Pilpres.2014
http://tekno.kompas.com/read/2014/07/20/15310027/Peneliti.Microsoft.ikut.Awasi.Hitung.Suara.Pilpres.2014
http://www.pilpres2014.org/AboutUs.html

Saturday, June 28, 2014

Machine Learning Resources

Video Lectures:
One of the best lecture on Machine Learning:
Neural Networks for Machine Learning - by Geoffrey Hinton

Joseph Turian on Word Representation

Deep Learning:
Learning Deep Architectures for AI - by Yoshua Bengio
Deep Learning Resources

Top Three Researchers:
Yoshua Bengio homepage
Geoffrey Hinton homepage
Yann LeCun homepage

Leading Researchers:
Joseph Turian Word Representation

Word Representation: A simple and general method for semi-supervised learning

Mikolov Word2Vec

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013.
Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013.

Andriy Mnih

Other Interesting Publications:

Brown Clustering

Sunday, June 22, 2014

Friday, April 11, 2008

List of Publications

Book Publications:

Mining of Data with Complex Structures

Series: Studies in Computational Intelligence, Vol. 333 February 2011

Authors: Fedja Hadzic, Henry Tan, Tharam S. Dillon

The primary audience is 3rd year, 4th year undergraduate students, Masters and PhD students and academics. The book can be used for both teaching and research. The secondary audiences are practitioners in industry, business, commerce, government and consortiums, alliances and partnerships to learn how to introduce and efficiently make use of the techniques for mining of data with complex structures into their applications. The scope of the book is both theoretical and practical and as such it will reach a broad market both within academia and industry. In addition, its subject matter is a rapidly emerging field that is critical for efficient analysis of knowledge stored in various domains.

Conference/Journal Publications:

0. Risvik, KM, Chilimbi, T, Tan, H, Anderson, C, and Kalyanaraman, K. 'Maguro, a system for indexing and searching over very large text collections', Proceeding of the 6th International Conference on Web Search and Data Mining (WSDM 2013), Rome Feb 4-8, 2013.

1. Tan, H, Dillon, TS, Feng, L, Chang, E & Hadzic, F 2005, ‘X3-Miner: Mining patterns from XML database’, in A Zanasi, CA Brebbia & NFF Ebecken (eds), Proceedings of the 6th International Conference on Data Mining (Data Mining’05), Skiathos, Greece, WIT Press, pp. 287-297.

2. Tan, H, Dillon, TS, Hadzic, F, Feng, L & Chang, E 2005, ‘MB3-Miner: Mining eMBedded subTREEs using tree model guided candidate generation’, Proceedings of the 1st International Workshop on Mining Complex Data (MCD’05), Houston, TX, USA, pp. 103-110.

3. Tan, H, Dillon, TS, Hadzic, F, Chang, E & Feng, L 2006, ‘IMB3-Miner: Mining induced/embedded subtrees by constraining the level of embedding’, In WK Ng, M Kitsuregawa & J Li (eds), Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’06), Singapore, pp. 450-461.

4. Tan, H, Dillon, TS & Hadzic, F 2006, ‘Razor: Distance constrained mining of embedded subtrees’, in Tsumota & Shusaku (eds), Proceedings of the International Conference on Data Mining (ICDM’06), Hongkong, pp. 8-13.

5. Tan, H, Dillon, TS, Hadzic, F, Feng, L & Chang, E 2007, ‘Tree model guided candidate generation for mining frequent subtrees from XML’, accepted for publication in Transactions on Knowledge Discovery from Data (TKDD).

6. Tan, H, Dillon, TS, Hadzic, F, Chang, E & Feng, L 2007, ‘Mining induced /embedded subtrees using the level of embedding constraint’, submitted to Fundamenta Informaticae.

7. Tan, H, Hadzic, F, Dillon, TS & Chang, E 2008, ‘State of the art of data mining of tree structured information’, Computer System Science and Engineering, vol. 23, no. 4, July 2008 (pending publication).

8. Tan, H, Dillon, TS, Hadzic, F & Chang, E 2006, ‘SEQUEST: Mining frequent subsequences using DMA strips’, in A Zanasi, CA Brebbia & NFF Ebecken (eds), Proceedings of the 7th International Conference on Data Mining and Information Engineering (Data Mining’06), Prague, Czech Republic, WIT Press, pp. 315-328.

9. Hadzic, F, Dillon, TS, Sidhu, AS, Chang, E & Tan, H 2006, ‘Mining substructures in protein data’, Proceedings of the 6th International Conference on Data Mining Workshop (ICDMW’06) - Invited, Hong Kong, pp. 213-217.

10. Hadzic, F, Tan, H & Dillon, TS 2007, ‘UNI3 - efficient algorithm for mining unordered induced subtrees using TMG candidate generation’, Proceedings of the Computational Intelligence and Data Mining (CIDM’07), Hawaii, USA, pp. 568-575.

11. Hadzic, F, Tan, H, Dillon, TS & Chang, E 2008, ‘U3: Unordered subtree mining using TMG candidate generation and the level of embedding constraint’, (pending publication).

12. Hadzic, F, Tan, H, Dillon, TS & Chang, E 2007, ‘Implications of frequent subtree mining using hybrid support definition’, in A Zanasi, CA Brebbia & NFF Ebecken (eds), Proceedings of the 8th International Conference on Data Mining & Information Engineering (Data Mining’07), The New Forest, UK, WIT Press, pp. 13-24.

13. Hadzic, F, Dillon, TS & Tan, H 2007, ‘Outlier detection strategy using the self-organizing map’, in X Zhu & I Davidson (eds), Knowledge Discovery and Data Mining: Challenges and Realities, Information Science Reference, Hershey, PA, USA, pp. 224-243.

14. Hadzic, F, Dillon, TS, Tan, H, Feng, L & Chang, E 2007, ‘Mining frequent patterns using self-organizing map’, in D Taniar (ed.), Research and Trends in Data Mining Technologies and Applications: Advances in Data Warehousing and Mining, IGI Global, Hershey, PA, USA, pp. 121-135.

15. Sidhu, AS, Dillon, TS & Setiawan, H 2004, ‘XML based semantic protein map’, in A Zanasi, NFF Ebecken & CA Brebbia (eds), Proceedings of 5th International Conference on Data Mining, Text Mining and their Business Applications (Data Mining’04), Malaga, Spain, WIT Press, pp. 51-60.

16. Sidhu, AS, Dillon, TS & Setiawan, H 2004, ‘Comprehensive protein database representation’, in A Gramada & PE Bourne (eds), Proceedings of the 8th International Conference on Research in Computational Biology (RECOMB’04), ACM Press, San Diego, CA, USA, pp. 427-429.

17. Sidhu, AS, Dillon, TS, Sidhu, BS & Setiawan, H 2004, ‘Protein knowledge meta model’, Molecular & Cellular Proteomics, pp. 262-263.

Curriculum Vitae

Henry Tan was born in a small town, Sukabumi, Indonesia, on December 7th, 1979. He obtained his Bachelor of Computer System Engineering with first class honour from La Trobe University, VIC, Australia in 2003. During his undergraduate study, he was nominated as the most outstanding Honours Student in Computer Science. Additionally, he was the holder of 2003 ACS Student Award. After he finished his Honour year at La Trobe University, on August 2003, he continued his study pursuing his doctorate degree at UTS under supervision Prof. Tharam S. Dillon. He obtained his PhD on March 2008. His research interests include Data Mining, Computer Graphics, Game Programming, Neural Network, AI, and Software Development. On January 2006 he took the job offer from Microsoft Redmond, USA as a Software Design Engineer (SDE).

Henry Tan Setiawan

Curriculum Vitae

Friday, August 29, 2014

C++ MNIST Dataset Parser

Friday, August 22, 2014

Volunteer work Summer 2014: Defending Indonesia's Democracy with election vote counter - pilpres2014.org

Silicon Valley coder wants to defend Indonesia’s democracy with election vote counter and neat data visualizations

Saturday, June 28, 2014

Machine Learning Resources

Sunday, June 22, 2014

Related Blog

Friday, April 11, 2008

List of Publications

Book Publications:

Conference/Journal Publications:

Curriculum Vitae