Forum » Feedback and Ideas

Automated & manual Artist disambiguation: How to drastically improve this site,…

 
    • [Raderad användare] sa...
    • Användare
    • 26 aug 2009, 01:54
    I think you hit the nail just right!

  • JLizard79 said:
    I think you hit the nail just right!


    From a data/biz rule modeling and architecture standpoint, it's one of the more interesting problems to tackle. Like I said, it'll be interesting to see what they've come up with as well as watch the transformation take place. Looking forward to it!

    • [Raderad användare] sa...
    • Användare
    • 26 aug 2009, 10:46
    You're not the only one:)

  • Question for Russ...

    Is this something you plan on attacking, implementation wise, "before" you had outward and onward?

  • JustSomeOldJoe said:
    Following the Discogs, MusicBrainz, cddb, or any other model is not the solution to LFM's data problems. It might be a piece of the puzzle, but it's definitely not the solution.

    LFM has the quintessential data cleansing problem... there is none bigger, IMO. Not only do they not have a true system of record, they also have multiple vendors (labels & artists) supplying potentially overlapping data (multiple labels per artist, multiple releases of the same work across diff labels, multiple releases of the same work across multiple countries, multiple artists with the same name, etc.), several possible data lookup integration points (Discogs, MB, etc.), every man, woman, and child on the face of the earth a potential data provider, and the requirement to implement licensing restrictions per track regardless of data source.

    It's a complex and interesting problem that requires a great deal of analysis in order to yield an appropriate, working design. IMO, the very first (and key) thing that I'd look at doing (assuming data is captured in a DB) is abstracting the accessing identification system for artists, albums, and tracks. You do so by implementing identifier types (LFM, Label Defined, User Defined, Musicbrainz, Discogs, etc.), modeling/implementing the notion of external identifiers, populating/integrating baseline external id data, then mapping those external ids/types to internal id's identifying the actual data. Do so and you can browse data by any number of methods (Discogs, LFM, User Defined, etc.). More important, you can filter what's viewed on a user by user basis.

    For instance, a user might want to browse and see only data that maps up to Discogs plus his/her scrobbles that don't match to anything (i.e., self-scrobbled, erroneous ID3 tags). If you've abstracted the access identification system, you can do so. A user can choose to view only baseline "clean" data filtering out all un-cleansed user scrobbles, a Label/Artist can choose to view only data mapping up to their own identification system, LFM can compare their baseline dataset to Discogs, MB, or any other external source... there's a great deal of flexibility provided, an automatic implicit map created between id types, and a system that provides the means to maintain a data system of record within a sea of garbage data and also allows "cleanse over time" and explicit user control over their own scrobbles.

    Yet, that's still only a small part of the overall solution. You need processes in place for maintaining the baseline dataset (data associated with LFM external IDs), specifically, propagating non-cleansed data to cleansed status (probably via appointed user community data managers, mods, etc.), processes allowing user's to correct their own data scrobbles without impacting charts (not that difficult), and processes for abstracting and enumerating ambiguous artists (again, not that difficult once you've abstracted the identification system and provided user's the ability to correct their own scrobbles). Devil's always in the details ;-)

    Russ already mentioned that they have a plan in place to the attack the problem. I'd be willing to bet it's probably the single most thought about architectural issue they have... and, it's really not an issue they could attack until "after" they built up a baseline dataset given the nature of their primary data source (scrobbles). It will be interesting to watch the change once it takes place.



    Best post in this thread so far. In my defense I never said this was a silver bullet, just a major step in the right direction.

  • ooh good thought, what about reaching out to the music genome project people.

  • Anyone?

  • this is still relevant, more so now, than when I posted it.

    • Rick85 sa...
    • Användare
    • 17 nov 2009, 13:47
    iTranscendence said:
    ooh good thought, what about reaching out to the music genome project people.


    i think last.fm and pandora see themselves as competitors...

  • I thought the MGP was a separate entity to pandora and pandora was just utilizing their work. Regardless it would be beneficial to some how adapt some of the genome techniques for the scrobbler like the song attributes list.

    http://en.wikipedia.org/wiki/List_of_Music_Genome_Project_attributes_by_type

  • WichitaQ said:
    i've said it before, and i'll say it now, if the artists were to be separated (and they have to be, sooner or later) they'll have to do it by artist ID's, without any extra complications.


    And there you have the answer. The problem is finding all of these artists that have already signed up under the current method and are already stuck with a bad situation. There are over 12,000,000 artists already on LastFM and somebody would need to sort through the code for the entire listing, or make everyone re-submit their profiles and music, while somehow retaining their previous stats.

    Yes, it is bad, in my opinion as well. I seriously love music, the business, the entire scene, but I just find it easier to skip the mixed up artists and not mark them as a loved track. So far, I have only had a handful show up at random times, so it's not so bad. However, if someone like me is skipping these artists, how many other members of LastFM, just give in and pass the artists by?

  • No one has to re-submit anything. You just move all the artist information to the new pages. Send out notifications to all those affected so they can change any links they have to the old page; this would primarily be artist who joined last.fm after another artist with their name had already joined. Then you let the meta-data do the rest of the work, the chances of an artist, song and album name being exactly the same, is astronomical.

    • srf21c sa...
    • Användare
    • 6 dec 2009, 21:23

    status update

    Just curious to know if last.fm is any closer to creating disambiguation pages for duplicate artist names.

    • jbg3 sa...
    • Användare
    • 23 dec 2009, 13:22
    i'm also curious on the status of this. it seems like a pretty huge glaring problem with the simple answers detailed easily enough here

    • DFA1979 sa...
    • Abonnent
    • 23 dec 2009, 13:50
    jbg3 said:
    i'm also curious on the status of this. it seems like a pretty huge glaring problem with the simple answers detailed easily enough here
    Which 'simple answers' detailed here? I can't find any which don't need a huge amount of clever stuff being done in the background to turn them into a solution which would actually work...

    • jbg3 sa...
    • Användare
    • 23 dec 2009, 23:23
    sorry, maybe i'm missing something and exaggerated saying it would be simple to fix, but i do think it would at least be simpler than it would seem. i thought this post summarized it pretty well:

    iTranscendence said:
    No one has to re-submit anything. You just move all the artist information to the new pages. Send out notifications to all those affected so they can change any links they have to the old page; this would primarily be artist who joined last.fm after another artist with their name had already joined. Then you let the meta-data do the rest of the work, the chances of an artist, song and album name being exactly the same, is astronomical.

  • I know it's all about the "scrobbling" -- but I'm really weary of this mixing of musicians on one page. It's my biggest gripe (of a few)!

    I just experienced a "Raven" nightmare: Raven (Celtic musicians -- the "Raven" I was seeking); Raven (heavy metal); Raven-Symone; etc.

  • DFA1979 said:
    jbg3 said:
    i'm also curious on the status of this. it seems like a pretty huge glaring problem with the simple answers detailed easily enough here
    Which 'simple answers' detailed here? I can't find any which don't need a huge amount of clever stuff being done in the background to turn them into a solution which would actually work...


    That was a helpful and constructive post.

    • DFA1979 sa...
    • Abonnent
    • 7 jan 2010, 12:51
    My sarcasometer's beeping.

    I was pointing out that the scenario the poster described (this is a problem which has a simple answer, detailed in this thread) was vastly different from reality (this problem does not have a simple answer, and none of the 'solutions' in this thread live up to that name under close examination). What would you have preferred me to do? Leave the user thinking (inaccurately) a large problem was being ignored even though a solution is simple?

    (Alternatively, if sarcasm is your thing, how about "As helpful and constructive as yours"?..)

  • /facepalm

    Dear DFA,

    Please tell me where in this thread I ever stated that the task was going to be a simple one?

    That seems to be the consensus statement from any mod or staff on this subject, if they even address it all; rather than ignoring the proverbial 800 pound conversation in the room; that the problem is incredibly complex and no simpleton in this forum has even the slightest idea what the db reformatting would entail.

    When in reality, many of us in fact do understand it. But what we also seem to understand, that you are failing to, is that NOTHING in this life is an easy fix, even if the concept to fix it is. If you want things to work properly and to be successful, you need to arch your back and bear the load.

    I don't hear solutions, I don't even hear concepts, all I hear is pessimism about how hard and complex it's going to be to fix it, and no goal orientated talk about how to solve it.

    Poetically ironic, considering I'm in the 'feedback and ideas' forum. This is why your spidey sense was tingling. Come with your A game on the subject, give some GOOD feedback, and share idea's the staff has about fixing this. You never know, someone here may actually stumble upon the brainstorming session and solve your complex problem with an e=mc².

    • dankine sa...
    • Användare
    • 7 jan 2010, 13:43

    Re: /facepalm

    iTranscendence said:
    Please tell me where in this thread I ever stated that the task was going to be a simple one?.


    if only you read it properly you would see that that quote isnt attributed to you.

    "Those who can make you believe absurdities can make you commit atrocities"
    "I don't want to believe, I want to know"

    Auto Corrections Group
    • DFA1979 sa...
    • Abonnent
    • 7 jan 2010, 13:53

    Re: /facepalm

    iTranscendence said:
    Dear DFA,

    Please tell me where in this thread I ever stated that the task was going to be a simple one?
    You didn't. The guy I quoted did. That's why I quoted him (and not you) when saying it's not simple.

    If you're going to criticise my posts, at least take the time to read them properly first.

    If you want discussion about how it can be solved, there's plenty here. As for your claim that mods and staff are acting like "no simpleton in this forum has even the slightest idea" about the issue, that's simply not the case and I'd appreciate if you don't make that kind of accusation. When I said your - or other people's - ideas won't work, that's because they won't, not because I have some superiority complex. I don't think anybody criticised the posts which address the issues properly [see those by JustSomeOldJoe for good examples, amongst others], and had the user pointed to the solutions he thought were workable I'd have been happy to explain where the issues were.

    [edited to expand on points a little]

    Redigerad av DFA1979 den 7 jan 2010, 15:24
  • Re: /facepalm

    iTranscendence said:
    You never know, someone here may actually stumble upon the brainstorming session and solve your complex problem with an e=mc².
    WOW, that's incredibly close to the actual problem at hand. last.fm definitely needs a physicist. no sarcasm.

  • Re: Re: /facepalm

    DFA1979 said:
    iTranscendence said:
    Dear DFA,

    Please tell me where in this thread I ever stated that the task was going to be a simple one?
    You didn't. The guy I quoted did. That's why I quoted him (and not you) when saying it's not simple.

    If you're going to criticise my posts, at least take the time to read them properly first.

    If you want discussion about how it can be solved, there's plenty here. As for your claim that mods and staff are acting like "no simpleton in this forum has even the slightest idea" about the issue, that's simply not the case and I'd appreciate if you don't make that kind of accusation. When I said your - or other people's - ideas won't work, that's because they won't, not because I have some superiority complex. I don't think anybody criticised the posts which address the issues properly [see those by JustSomeOldJoe for good examples, amongst others], and had the user pointed to the solutions he thought were workable I'd have been happy to explain where the issues were.

    [edited to expand on points a little]



    How can none of the concepts presented here work, when they are the very working concepts at other websites as we speak.

    • DFA1979 sa...
    • Abonnent
    • 8 jan 2010, 12:04
    Because the problem of how to deal with multiple artists of the same name on those sites is completely different - their data comes from direct user input, not from background processes generating and submitting info based on ID3 tags. When data is added on those sites, there's a person sat there manually typing (/c&p-ing) the information. If somebody's adding a new album on discogs, they'll find the artist page and see "oh, this is a different artist with the same name, a second (/third/fourth/etc) artist page under this name is needed". But a computer with ID3 tags as its sole source of information can't know that.

    You've repeatedly suggested artist IDs throughout this thread - but that doesn't solve the problem at hand. Making more than one page for the same artist name is easy, the staff could set that up in five minutes if they wanted. The difficulty is in knowing which page scrobbles should go to.

    My post in the thread I linked you to explains how creating multiple pages is not a solution, without knowing which of those pages scrobbles should be assigned to. Other users in this very thread have also pointed this out (1 2 3 and - you even quoted this one - 4).

    One of Last.fm's strengths is that it's automatic. You don't need to create an artist in the database before you can scrobble them - just go scrobble, if the site's never heard of the artist, a page will be created. The closest you've come to addressing the issue of assigning scrobbles to the correct page eradicates that by making the first listener fill in info (for which you suggest wikipedia and discogs are good sources. If the artist isn't already on Last.fm you can bet your life they're not on either of those), and I can't see how it would work with multiple artists who aren't somehow already known by the site to exist anyway.

Anonyma användare kan inte skriva inlägg. Vänligen logga in eller skapa ett konto för att göra inlägg i forumen.