CSIT Blog: January 2019

Tuesday, January 29, 2019

Week 19: Into to Database Systems

This week, we learned about data warehousing and business intelligence systems (DW/BI) from the Data Warehouse Toolkit by Kimball, Ralph, and Margy Ross. These systems consider the needs of the business and then design the system with technology in mind. In organizations, record keeping and analytical decision making are important. The performance of the business can be analyzed with these tools. The information should be easily accessible and comprehensible to the business user and not in developer jargon. It should be simple and fast for the user. The system should be labeled consistently or if they measure different things, labeled differently. The system should adapt to change and protected.

A DW/BI manager should understand their user's job responsibilities, goals, and objectives. They should know who the best users are and find new users. They should give useful information to their users that can be trusted and is accurate. The system they make should be reliable, trustworthy, updated, and keep everyone happy.

Dimensional modeling is the preferred technique for presenting data that is understandable to the user and fast. These are often in the third normal form (3NF) which are free of redundancies. They are often called entity-relationship diagrams (ERD or ER diagrams) who describe the relationship between tables.

Star Schemas are dimensional models in relational database management systems. If they are multidimensional database environments, they are online analytical processing cubes (OLAP) where data is stored and indexed like performance aggregations or summary tables. They give better query performance because they are optimized and have more functions than SQL but require more computation. OLAP cubes have good security options, richer analysis, support type 2 changes, support fact tables, and ragged hierarchies.

Tuesday, January 22, 2019

Week 18 CST 363: Intro to Database Systems

This week, in chapter 6 we worked with aggregate functions to do averages, sums, mins, maxes, and counts on selected rows. All of these ignore null values except for COUNT(*). The DISTINCT keyword can be used to omit duplicate values. GROUP BY is used to group a row based on an expression in ascending order and can accept multiple expressions with a comma. The HAVING clause is used to specify a search condition. The general order to use these in is SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY. With WHERE the search condition is applied before it calculates; in HAVING, it is after. WHERE refers to any column and can't contain aggregate functions. HAVING can only refer to a column in the SELECT clause and can contain aggregate functions. WITH ROLLUP is used to add a summary row to the end specified by the GROUP BY clause. One cannot use the DISTINCT keyword or ORDER BY WITH ROLLUP.

In chapter 7, we learned about subqueries. Subqueries are a SELECT statement in another SQL statement. Subqueries can be introduced with WHERE, HAVING, FROM and SELECT. They cannot have an ORDER BY clause but can be nested in other subqueries. Subqueries are similar to joins and can replace them. JOINS are useful when columns from different tables are used referencing an existing relationship. Subqueries are good for passing an aggregate value to the main query and with informal relationships in long, complex subqueries. The IN operator is used to give a list that is tested against the test expression and returns one column of values. NOT IN can replace an outer join. The WHERE clause is used to compare an expression with the results of a subquery. IF the keywords ANY, SOME and ALL are used, they will return a single value or a list of values. The ANY/SOME keyword is used to test if a condition is true for one or more of the values, similar to OR. A correlated subquery is executed for each row in the main query and an uncorrelated subquery is executed one time. The EXISTS operator is used to test one or more rows returned by the subquery, NOT EXISTS tests that no rows are returned. You can use an * in these to specify any column. The SELECT clause subquery needs to return one value in a correlated subquery. Using a join is a more legible option. The FROM clause returns a set that is called inline view. An alias must be used to perform calculations and can be used as column names. A complex subquery can be made but it is difficult and confusing.

Tuesday, January 15, 2019

WEEK 17 CST 363 Intro to Database Systems

This week, we had an introduction to SQL which included client/server systems, the relational database model, SQL and SQL based systems, and SQL statements. In Chapter 2, we learned about using the MySQL workbench, running SQL statements, and how to use the command line client. In Chapter 3, we learned about the select, where, order by, and limit clauses to process data. We used inner and outer joins to get data from two or more tables and practiced using unions. In Chapter 5, we made new rows, updated existing rows and deleted existing rows.

Initially, I had some trouble connecting to the server in chapter 2. I later found out that I needed MySQL server to do that. After reading more online and at the end of chapter 2, I downloaded the necessary programs and was able to continue with the homework.