Archive for the ‘Database’ Category

What the heck is redacting a database?

Wednesday, October 21st, 2009

A good friend of mine sent me the following link:

[http://www.codersrevolution.com/index.cfm/2009/10/21/Sequoia-Voting-System-Witch-Hunt-err-Study-Project"]

The learning we can take away from this is if you don’t adequately cleanse then you can expect the data to become available! While its an interesting concept they apparently tried (and not too successfully) to do. The best way to clean a database is to create a new one and just copy in the data you want exposed. Don’t trust the handy dandy DROP/DELTE :-)

If they wanted to expose/publish the 88 tables, then they should have created a new DB, copied in the tables and released it. Anything less than that you have to be VERY careful! And for the more security conscious it would be created on a recently wiped drive on a recently rebooted computer!

Datawarehousing news and nice approach for partitioned data

Wednesday, September 30th, 2009
  • [Kickfire Offers Data Warehouse Appliance for the Masses]
    • Kickfire supports a MySQL based data-warehouse appliance targeting 500Gb -5Tb range, starting at $32K.
    • Will have to start monitoring this one. They appear to use similar concept to Netezza by utilizing SQL in hardware for speed, not exactly the same – but interesting to see the appliance trend.
  • [Building the Data Warehouse for bandwidth tracking]
    • This is a worthy read if you need to load and handle lots of naturally partitioned data
    • For those not willing to read, I’ll pose a question – how would you handle 683,460 tables :-)

SQLite for C# – Part 8 – Loading CSV/Pipe into SQLite via command line

Saturday, September 19th, 2009

Ever wondered how hard it would be to load a CSV file into a SQLite database. I know how I would do it in code, no rocket science needed there! However in this case I wanted to really know the speed of doing this natively and really didn’t want to code anything!

So looking at what SQLite3.exe has too offer it pretty much supports it out of the box. Very nice :-)

Requirements:

  • Loading speed
  • Making the data to consuming applications available asap

While I love C# and frankly its hard to go back to C or C++, sometimes performance trumps the creature comforts we have become accustomed to.

Note: I did this without circling back to a C# implementation as I know the data and performance requirements  are tight and in this case I wanted max performance with no code! The biggest factor to a successful implementation is to ensure you use the tools best for the job, not just the ones you favor in that specific year.

So first things first – create a table to take the input

DROP TABLE IF EXISTS BookSales;
CREATE TABLE IF NOT EXISTS BookSales
(
   Store    int
  ,Date     varchar
  ,OrderReference varchar
  ,Line     int
  ,BookISBN varchar(14)
  ,Quantity int
  ,Price    int
, Primary Key (OrderReference,Line)
);

Next is the magic. We need to load the CSV into the table:

.separator "|"
.import BookSales.txt BookSales

Wow that was easy :-) . You can see we set the separator to be a pipe rather than comma in this case, then the import.

.IMPORT [FileName] [Table]

Now the database is ready to be queried! But if we want to take it just one stage further:

.output SummaryBookSales.csv
SELECT Store, Date, BookISBN, SUM(Quantity), SUM(Price)
FROM BookSales
GROUP BY Store, Date, BookISBN;

Now we output the results of our simple aggregation into a pipe separated output file.

Tying this all together in a single configuration file, which we will call “BookAnalysisLoader.sql”, gives us:

DROP TABLE IF EXISTS BookSales;
CREATE TABLE IF NOT EXISTS BookSales
(
   Store    int
  ,Date     varchar
  ,OrderReference varchar
  ,Line     int
  ,BookISBN varchar(14)
  ,Quantity int
  ,Price    int
, Primary Key (OrderReference,Line)
);

.separator "|"
.import BookSales.txt BookSales

.output SummaryBookSales.csv
SELECT Store, Date, BookISBN, SUM(Quantity), SUM(Price)
FROM BookSales
GROUP BY Store, Date, BookISBN;
.exit

The last piece of the puzzle is the final execution:

sqlite3.exe BookSalesAnalysis.db3 < BookAnalysisLoader.sql

Now we have a newly created database with our analysis data in it, and we have a summary CSV file generated from the output. So we can load the CSV into Excel or another DB, or directly interrogate the DB for more analytical information – and all without coding!

Related Links:

SQLite 3.6.18 has been offically released!

Friday, September 11th, 2009

There are a number of good changes here!

  • Improved query planner:
    • Through better use of statistics
    • Compile time option enables Analyze to better handle the index histograms
    • Additionally it was just plain improved as well!
  • Recursive triggers
  • Delete triggers fire during a REPLACE/MERGE
  • More precise use of caching approaches with the
    • Shared Cache – basically cache the results within the application rather can on each thread. Now configurable on a per-thread basis rather than global.
    • Private – The old way

Well done to the SQLite team! Good stuff

More details can be found [SQLite Release 3.6.18 On 2009 Sep 11 (3.6.18)]