[FIXED] Bulk inserts with Flask-SQLAlchemy


I’m using Flask-SQLAlchemy to do a rather large bulk insert of 60k rows. I also have a many-to-many relationship on this table, so I can’t use db.engine.execute for this. Before inserting, I need to find similar items in the database, and change the insert to an update if a duplicate item is found.

I could do this check beforehand, and then do a bulk insert via db.engine.execute, but I need the primary key of the row upon insertion.

Currently, I am doing a db.session.add() and db.session.commit() on each insert, and I get a measly 3-4 inserts per second.

I ran a profiler to see where the bottleneck is, and it seems that the db.session.commit() is taking 60% of the time.

Is there some way that would allow me to make this operation faster, perhaps by grouping commits, but which would give me primary keys back?

This is what my models looks like:

class Item(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    title = db.Column(db.String(1024), nullable=True)
    created = db.Column(db.DateTime())
    tags_relationship = db.relationship('Tag', secondary=tags, backref=db.backref('items', lazy='dynamic'))
    tags = association_proxy('tags_relationship', 'text')

class Tag(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    text = db.Column(db.String(255))

My insert operation is:

for item in items:
    if duplicate:
        x = Item()
        x.title = "string"
        x.created = datetime.datetime.utcnow()
        for tag in tags:
            if not tag_already_exists:
                y = Tag()
                y.text = "tagtext"


Perhaps you should try to db.session.flush() to send the data to the server, which means any primary keys will be generated. At the end you can db.session.commit() to actually commit the transaction.

Answered By – wouter bolsterlee

Answer Checked By – Gilberto Lyons (Easybugfix Admin)

Leave a Reply

(*) Required, Your email will not be published