CSIT Blog

Friday, March 1, 2019

Week 23: Intro to Database Systems

Over the course of the past 8 or so weeks, we learned many things. We learned about the reasons for using a database, parts, characteristics, and the software we needed to use databases namely the MySQL workbench. We used HTML to enter data into a database using Python. Which was convenient because we learned Python in the previous class. We learned the basics of retrieving data from one table with joins and subqueries to retrieve data from more than one table. We learned how to modify tables using insert, update, and remove. Our class used predicates like where to filter information and order by, group by, and having to sort data rows. We studied E-R diagrams and made them from existing database designs with normalization rules to make our databases more functional. We studied data warehousing, extract transform load, and star schemas. We learned about space management in databases using indexes and balanced trees to make database search faster. We studied ACID features of databases, using locks, versioning, and strategic locks to balance trade-offs. We learned how to do rollbacks and recovery when mistakes are made. In the end, we learned about scaling up databases, sharding, and database clusters when working with large amounts of data. The most important thing I learned was to ask for help when I needed it or I don't learn anything.

Tuesday, February 19, 2019

Week 22 CST 363 Intro to Database Systems

· Know what are ACID features of a database and why are they important

o Atomicity

§ Transactions are all or nothing

o Consistency

§ Only valid data is saved

o Isolation

§ Transactions do not affect each other

§ locks

o Durability

§ Written data will not be lost

· Know that locks, versioning and optimistic locking are used to achieve ACID properties of concurrent transactions

o Concurrency is the ability of the system to support two or more transactions working with the same data at the same time. Concurrency problems are prevented using locks which stop other transactions from executing if another is already running.

· Know how to begin and terminate transactions in application code and SQL scripts

o Use START TRANSACTION to turn off autocommit mode

o Use COMMIT to commit the changes

o Use ROLLBACK to roll back the changes

· Use the correct ANSI standard isolation level to make trade-off between concurrency and ACID properties

o The simplest way to prevent concurrency problems is to change the default locking by SET TRANSACTION ISOLATION LEVEL

· Use SESSION for the session

· USE GLOBAL for all sessions

§ Use SERIALIZABLE isolation level to isolate it from every other transaction and restrict concurrency, every resource is locked and transactions cannot access it

· it requires more overhead to manage the locks

§ Use READ UNCOMMITTED isolation level by just using SELECT queries and not setting any locks

§ Use READ COMMITTED to prevent transactions from seeing data that has been changed by other transactions but not committed

§ Use REPEATBLE READ to be read consistently in the same transaction

· Know how rollback and forward recovery work in a database

o Use SAVEPOINT to create a savepoint with a specific name

o USE ROLLBACK TO SAVEPOINT to roll back to a specified save point

· MySQL supports and uses both locks and MVCC

· MVCC is used so that reader will not be blocked by a writer holding a lock on a row

o row locking is used so that multiple writers must serialize when updating the same row

· A transaction that reads a row with the intention of later updating the row should issue SELECT ... WITH UPDATE to acquire a lock on a row when the row is read. This prevents another transaction from modifying the row between the time of select and time of update

· InnoDB Transaction Model

o combines the best of multi-versioning database with two-phase locking at the row level and run queries

o The lock info is stores so lock escalation is not needed

Tuesday, February 12, 2019

Week 21 CST 363 Intro to Database Systems

This week, I learned how to properly import from github to pycharm using this video. I initially downloaded the file from github then unzipped it into my dropbox like I usually do. Here, the import statements were not accessing the other files. I changed them from it's previous version to the import version and that did not fix it. I looked at the forums and another classmate mentioned that we need to append the folder name to the from statements to access the other files. Fortunately, that worked after I setup git on my desktop and then logged into github on pycharm and downloaded the source code that way following the video. This is a cleaner way of accessing files and I plan on using it in the future.

Tuesday, February 5, 2019

Week 20: CST 363 Intro to Database Systems

This week, I learned about the importance of time management and getting assignments in on time especially if they are Exams. This exam will be late, but I hope I can still turn it in. Some parts took longer than others. I reviewed OLAP and OLTP tables this week, learning the differences between the two which were written into my notebook and was not typed. I also learned about star schemas, which was my main part in the group project.

Tuesday, January 29, 2019

Week 19: Into to Database Systems

This week, we learned about data warehousing and business intelligence systems (DW/BI) from the Data Warehouse Toolkit by Kimball, Ralph, and Margy Ross. These systems consider the needs of the business and then design the system with technology in mind. In organizations, record keeping and analytical decision making are important. The performance of the business can be analyzed with these tools. The information should be easily accessible and comprehensible to the business user and not in developer jargon. It should be simple and fast for the user. The system should be labeled consistently or if they measure different things, labeled differently. The system should adapt to change and protected.

A DW/BI manager should understand their user's job responsibilities, goals, and objectives. They should know who the best users are and find new users. They should give useful information to their users that can be trusted and is accurate. The system they make should be reliable, trustworthy, updated, and keep everyone happy.

Dimensional modeling is the preferred technique for presenting data that is understandable to the user and fast. These are often in the third normal form (3NF) which are free of redundancies. They are often called entity-relationship diagrams (ERD or ER diagrams) who describe the relationship between tables.

Star Schemas are dimensional models in relational database management systems. If they are multidimensional database environments, they are online analytical processing cubes (OLAP) where data is stored and indexed like performance aggregations or summary tables. They give better query performance because they are optimized and have more functions than SQL but require more computation. OLAP cubes have good security options, richer analysis, support type 2 changes, support fact tables, and ragged hierarchies.

Tuesday, January 22, 2019

Week 18 CST 363: Intro to Database Systems

This week, in chapter 6 we worked with aggregate functions to do averages, sums, mins, maxes, and counts on selected rows. All of these ignore null values except for COUNT(*). The DISTINCT keyword can be used to omit duplicate values. GROUP BY is used to group a row based on an expression in ascending order and can accept multiple expressions with a comma. The HAVING clause is used to specify a search condition. The general order to use these in is SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY. With WHERE the search condition is applied before it calculates; in HAVING, it is after. WHERE refers to any column and can't contain aggregate functions. HAVING can only refer to a column in the SELECT clause and can contain aggregate functions. WITH ROLLUP is used to add a summary row to the end specified by the GROUP BY clause. One cannot use the DISTINCT keyword or ORDER BY WITH ROLLUP.

In chapter 7, we learned about subqueries. Subqueries are a SELECT statement in another SQL statement. Subqueries can be introduced with WHERE, HAVING, FROM and SELECT. They cannot have an ORDER BY clause but can be nested in other subqueries. Subqueries are similar to joins and can replace them. JOINS are useful when columns from different tables are used referencing an existing relationship. Subqueries are good for passing an aggregate value to the main query and with informal relationships in long, complex subqueries. The IN operator is used to give a list that is tested against the test expression and returns one column of values. NOT IN can replace an outer join. The WHERE clause is used to compare an expression with the results of a subquery. IF the keywords ANY, SOME and ALL are used, they will return a single value or a list of values. The ANY/SOME keyword is used to test if a condition is true for one or more of the values, similar to OR. A correlated subquery is executed for each row in the main query and an uncorrelated subquery is executed one time. The EXISTS operator is used to test one or more rows returned by the subquery, NOT EXISTS tests that no rows are returned. You can use an * in these to specify any column. The SELECT clause subquery needs to return one value in a correlated subquery. Using a join is a more legible option. The FROM clause returns a set that is called inline view. An alias must be used to perform calculations and can be used as column names. A complex subquery can be made but it is difficult and confusing.

Tuesday, January 15, 2019

WEEK 17 CST 363 Intro to Database Systems

This week, we had an introduction to SQL which included client/server systems, the relational database model, SQL and SQL based systems, and SQL statements. In Chapter 2, we learned about using the MySQL workbench, running SQL statements, and how to use the command line client. In Chapter 3, we learned about the select, where, order by, and limit clauses to process data. We used inner and outer joins to get data from two or more tables and practiced using unions. In Chapter 5, we made new rows, updated existing rows and deleted existing rows.

Initially, I had some trouble connecting to the server in chapter 2. I later found out that I needed MySQL server to do that. After reading more online and at the end of chapter 2, I downloaded the necessary programs and was able to continue with the homework.