Tarek Amr's Homepage

About Me

I completed my postgraduate degree in Data Mining and Information Retrieval at the University of East Anglia in UK. I am trying to challenge the old saying, "Jack of all trades, master of none". I have about 10 years experience in Software Development. In previous lives I used to work as an Information Security Consultant and Presales Manager. I also have been volunteering in Global Voices Online (GVO) since 2007, and currently I am local ambassador of the Open Knowledge Foundation (OKFN) in Egypt. Words like Machine Learning, Natural Language Processing, Open Data, Government 2.0, Data Visualisation, Data Journalism, New Media, Startups, Sociology and Philosophy are like music to my ears, whereas I do not like the term Big Data.

My C.V. via Linkedin

Practical D3

New! You can pre-order my latest book. It is on data visualization using d3.js.

Practical D3 Book

This is your guide to mastering the efficient use of D3.js in professional-standard data visualization projects.

You will learn what data visualization is, how to work with it, and how to think like a D3.js expert, both practically and theoretically. You will learn how to get the data, how to clean and refine it, and how to display it in the best charts and layouts. The book is for experience Front End JavaScript Developers, as well as for Data Journalists who who have basic knowledge of HTML/CSS and some JavaScript.

I co-authored the book with Rayna Stamboliyska. You can order it from any of the following:

Apress Amazon UK Amazon US

Projects and Tools

List of some of my software projects and micro-projects on github:

  • IRLib: Information retrieval library, it allows you to represent documents into a Vector Space Model (VSM), do feature selection, classify, apply TF.IDF and calculate similarity measures between them. Visit the project's page .
  • DYSL: Do you speak London? Command line tool for naturla language identification, also known as langID. Currently supporting 4 languages only, English, Spanish, Portuguese and Arabic. Visit the project's page .
  • GitZicht: A command-line tool, and Python library, for analyzing your git commits. You pivot on different mentions, counting different metrics, export the output of your analysis into a CSV file. Visit the project's page .
  • Python NLP: Slides used for my session about Natural Language Processing using Python in Google Developer Group's #DevFest-2013. Visit the project's page , and the slides are available here.
  • Mwazna: Mwazna (budget in Arabic) the brainchild of a data scientist, Tarek Amr, and a web developer and hacker, Amr Sobhy, aims to inform Egyptians on how their money is being spent through an easy to use and site.

URL Classification

Del.icio.us Tag CloudsIn today’s world, millions of web links are being shared every day in emails or on social media web sites. Thus, there is a number of contexts in which it is important to have an efficient and reliable way to classify a web-page by its Uniform Resource Locators (URLs), without the need to visit the page itself. For example, a social media website may need to quickly identify status updates linking to malicious websites to block them. Additionally, they can use the classification results in marketing researches to predict users’ preferences and interests. Thus, the target of this research is to be able to classify web pages using their URLs only.

Dissertation » Slides »

Regression Analysis

Using Scikit-Learn

Predict the Future with Regression Analysis using ScikitLearn and a little bit of Python

Introduction to regression analysis and very basic machine learning using Scikit-Learn and a little bit of Python

Learn Regression Analysis

Open Government

Transparency International's School of Integrity Slides presented to the students of Transparency International's School of Integrity in Tunisia. The presentation discusses the role of Open Data and how it is used for more transparency and better governance.

Open Government »

Read Me

List of places online where I write every now and then. I normally mix between social, technical, political and other miscellaneous topics.

Data Science Tutorials

My ML Notebooks

Jupyter (ipython) machine learning notebooks I am going to write and upload machine learning and text mining related tutorials; all in Jupyter notebooks. Jupyter (formerly IPython) notebooks allows me to write Python code, inline with explanation and graphs. These notebooks are hosted on githib, so feel free to use the code there, or fork and add (or fix) anything, if you want.

Learn Data Science with Tarek