Center for Data Science and Artificial Intelligence


Data science is revolutionizing the world around us. Data science is an interdisciplinary field composed of computer science, math and statistics, and domain knowledge that seeks to derive insight from data. Data science is the intersection of these three respective disciplines. Another way to think about it is that data science is the intersection of data engineering and the scientific method. With data science, we're using large-scale data systems to drive the scientific method. The goal of data science is to transform data into knowledge, knowledge that can be used to make rational decisions so that we can take actions that help us achieve our goals. We refer to this process as transforming data into actionable insight.

Data Science- Solving Problems in Various Sectors

Data Science methods and tools can solve some of the world's greatest challenges in sectors including:

  • Defense and national security
  • Medicine and Health
  • Imaging and optics
  • Energy and the environment
  • Food and agriculture
  • Economics and finance

Research Challenges in Data Science

  • Storing and Processing Terabytes,  petabytes of data generated each day;
  • Almost every discipline is facing big data analysis problems, including medical sciences, life sciences, bio-informatics, law school, civil engineering and government;
  • Data comes in different forms, such as free text, structured data, audio/video, images;
  • Analysis tasks performed over the data are becoming more and more sophisticated;
  • High performance computing platforms are advancing fast (e.g., cloud computing, Parallel Computing, multi-core machines, GPU, mobile-computing);
  • Communication and feedback needs to be established between machine, algorithms and people.

Skills of Data Science

In general, the skills commonly associated with data science are programming computers using programming languages, like SQL, Python, and R, working with data, that is collecting, cleaning, and transforming data, creating and interpreting descriptive statistics, that is numerically analyzing data, creating and interpreting data visualizations, that is visually analyzing data, creating statistical models and using them for statistical inference, hypothesis testing, and prediction, handling big data, data sets that are of volume, velocity, or variety beyond the limitations of conventional computing architecture, automating decision-making processes using machine learning algorithms, and deploying data science solutions into production or communicating results to a wider audience.

Data Science Ecosystem

Algorithms for Data Science

  • Methods for organizing data, e.g. hashing, trees, queues, lists, priority queues.
  • Streaming algorithms for computing statistics on the data. Sorting and searching.
  • Basic graph models and algorithms for searching, shortest paths, and matching. Dynamic programming. Linear and convex programming.
  • Floating point arithmetic, stability of numerical algorithms, Eigenvalues, singular values, PCA, gradient descent, stochastic gradient descent, and block coordinate descent. Conjugate gradient, Newton and quasi-Newton methods.
  • Large scale applications from signal processing, collaborative filtering, recommendations systems, etc.