CSV-based database?

I’m building a long CSV file with ~10k rows of filenames of images with many labels (from image recognition services). With this I plan to do image compositions based on these labels. I was wondering if I could use plain CSV files for the queries (high speed is not a concern, usually is just to choose and preload a bunch of images from these queries). The question are: what could be the best approach in this case? (I would need to do queries like contains/begin with/equals/etc and compare or match different fields) any addon that could be helpful? I should consider some light DB?

Thanks for any light.

For ~10k rows, you should be able create a simple data structure like:

struct CSVRow {
  std::string column2;
  int column1;
};

// read CSV using ofBuffer
// parse rows using ofSplitString, 
// convert column tokens from string to number using ofToInt(), etc

std::vector<CSVRow> rows;

Once this is loaded into memory, searches should be very quick. You can optimize searches by sorting the vector in various ways. Then use for loops or std::search, std::find, etc to quickly search the rows vector.

1 Like

CSV is just comma separated lines, so accessing each without loading the entire file should be possible with istream - I think ofBuffer (getLines) does this, and for more complicated structures there is serialisation - via something like ofxCereal

Thanks a lot @bakercp and @autr --I’m trying this:

struct CSVRowDate {
		int month;
        int day;
		std::string labels;
		std::string folders;		
	};

std::vector<CSVRowDate> rowsDates;

void ofApp::loadYearDays() {
	ofBuffer buffer = ofBufferFromFile("YearDays.csv");

	int c = 0;
	for (auto line : buffer.getLines()) {
		date.push_back(line);
		vector<string> splitItems = ofSplitString(line, ";");

		rowsDates[c].month= ofToInt(splitItems.at(0));
		rowsDates[c].day = ofToInt(splitItems.at(1));
		rowsDates[c].labels = ofToInt(splitItems.at(2));
		rowsDates[c].folders = ofToInt(splitItems.at(3));
		c++;

	}
}

But I’m getting index errors that I think should have to do with the index of rowsDates “c”. In any case, this is a correct approach with vectors and structures? To look for any value I just loop over the whole array again?

Thanks a lot for any hint --and sorry for the newbieness.

Can you provide some raw sample data?

The CVS file as it is now:

YearDays.zip (900 Bytes)

The usual query here will be to get the labels or folders for month=N and day=M

Here is a mini-database class:

class Database {
public:
    struct Entry {
        int month;
        int day;
        std::vector<std::string> labels;
        std::vector<std::string> folders;

        // This operator will sort by month and day.
        // Change the < to change ascending vs. descending.
        bool operator < (const Entry& rhs) const
        {
            return std::tie(month, day) < std::tie(rhs.month, rhs.day);
        }
    };

    void load(const std::string& csvPath)
    {
        // Clear any old entries.
        _entries.clear();

        ofBuffer buffer = ofBufferFromFile(csvPath);

        for (auto line: buffer.getLines()) {
            // Split each token in the line.
            std::vector<std::string> tokens = ofSplitString(line, ";");

            // Make sure the row has the expected number of tokens.
            if (tokens.size() == 4)
            {
                Entry entry;
                // Convet the first token to a month.
                entry.month= ofToInt(tokens[0]);
                // Convet the second token to a day.
                entry.day = ofToInt(tokens[1]);
                // Collect the comma seperated labels into a vector.
                entry.labels = ofSplitString(tokens[2], ",", true, true);
                // Collect the comma seperated folders into a vector.
                entry.folders = ofSplitString(tokens[3], ",", true, true);

                // Add the entry to the database.
                _entries[std::make_pair(entry.month, entry.day)] = entry;
            }
            else
            {
                ofLogError("Database::load") << "Malformed row: " << line << ", skipping.";
            }
        }
    }

    std::vector<std::string> getLabelsFor(int month, int day) const
    {
        std::pair<int, int> searchKey = std::make_pair(month, day);

        auto iter = _entries.find(searchKey);

        if (iter != _entries.end())
            return iter->second.labels;

        // Otherwise return nothing.
        return std::vector<std::string>();
    }

    std::vector<std::string> getFoldersFor(int month, int day) const
    {
        std::pair<int, int> searchKey = std::make_pair(month, day);

        auto iter = _entries.find(searchKey);

        if (iter != _entries.end())
            return iter->second.folders;

        // Otherwise return nothing.
        return std::vector<std::string>();
    }

    /// \returns the number of entries.
    std::size_t size() const
    {
        return _entries.size();
    }

private:
    // Here we use an unordered with a month / day pair as a key for very fast lookup.
    std::map<std::pair<int, int>, Entry> _entries;

};

Here is how you might use it:

    std::cout << "# Entries: " << db.size() << std::endl;

    // These functions return collections of labels.
    std::cout << ofToString(db.getLabelsFor(12, 1)) << std::endl;
    std::cout << ofToString(db.getFoldersFor(12, 1)) << std::endl;
    std::cout << ofToString(db.getLabelsFor(10, 14)) << std::endl; 
    /// Entries: 367
    /// {any}
    /// {any}
    /// {any, human}
2 Likes

In this I am using a std::map and search the std::map with iterators. This could be done with a std::vector too and you could sort a std::vector<Entry> with something like:

    std::sort(_entries.begin(), _entries.end())

This would use the

        bool operator < (const Entry& rhs) const
        {
            return std::tie(month, day) < std::tie(rhs.month, rhs.day);
        }

method to sort by month and day.

1 Like

Amazing, this works just perfect. Thanks a lot for your help.

There should be some kind of plain text database framework, isn’t?
It could be helpful for many applications that need some kind of data processing but don’t want to jump to a DB engine just yet…

Where can I declare this class in my oF project?
I have to create new .cpp and .h files for the new Class?

I’m trying to edit it as oF clasess but is just a mess. I’m missing something?
Database.cpp

#include "Database.h"


Database::Database() {

}

void Database::setup() {
}

void Database::load(const std::string& csvPath)
	{
		// Clear any old entries.
		_entries.clear();

		ofBuffer buffer = ofBufferFromFile(csvPath);

		for (auto line : buffer.getLines()) {
			// Split each token in the line.
			std::vector<std::string> tokens = ofSplitString(line, ";");

			// Make sure the row has the expected number of tokens.
			if (tokens.size() == 4)
			{
				Entry entry;
				// Convet the first token to a month.
				entry.month = ofToInt(tokens[0]);
				// Convet the second token to a day.
				entry.day = ofToInt(tokens[1]);
				// Collect the comma seperated labels into a vector.
				entry.labels = ofSplitString(tokens[2], ",", true, true);
				// Collect the comma seperated folders into a vector.
				entry.folders = ofSplitString(tokens[3], ",", true, true);

				// Add the entry to the database.
				_entries[std::make_pair(entry.month, entry.day)] = entry;
			}
			else
			{
				ofLogError("Database::load") << "Malformed row: " << line << ", skipping.";
			}
		}
	}

std::vector<std::string> Database::getLabelsFor(int month, int day) const
	{
		std::pair<int, int> searchKey = std::make_pair(month, day);

		auto iter = _entries.find(searchKey);

		if (iter != _entries.end())
			return iter->second.labels;

		// Otherwise return nothing.
		return std::vector<std::string>();
	}

std::vector<std::string> Database::getFoldersFor(int month, int day) const
	{
		std::pair<int, int> searchKey = std::make_pair(month, day);

		auto iter = _entries.find(searchKey);

		if (iter != _entries.end())
			return iter->second.folders;

		// Otherwise return nothing.
		return std::vector<std::string>();
	}

	/// \returns the number of entries.
std::size_t Database::size() const
	{
		return _entries.size();
	}


	// Here we use an unordered with a month / day pair as a key for very fast lookup.
	std::map<std::pair<int, int>, Entry> _entries;

};

Database::~Database()
{
}

Database.h

#pragma once

#include "ofMain.h"
#include <fstream>



class Database {
public:
	struct Entry {
		int month;
		int day;
		vector<string> labels;
		vector<string> folders;


		// This operator will sort by month and day.
		// Change the < to change ascending vs. descending.
		bool operator < (const Entry& rhs) const
		{
			return std::tie(month, day) < std::tie(rhs.month, rhs.day);
		}
	};

	void setup();
	void load(const std::string& csvPath);
	

	std::vector<std::string> getLabelsFor(int month, int day) const;

	std::vector<std::string> getFoldersFor(int month, int day) const;

	/// \returns the number of entries.
	std::size_t size() const;

private:
	// Here we use an unordered with a month / day pair as a key for very fast lookup.
	std::map<std::pair<int, int>, Entry> _entries;

};

What you have there looks ok to me – what errors are you getting?